Re: python math problem
On Feb 16, 6:39 am, Kene Meniru kene.men...@illom.org wrote: x = (math.sin(math.radians(angle)) * length) y = (math.cos(math.radians(angle)) * length) A suggestion about coding style: from math import sin, cos, radians # etc etc x = sin(radians(angle)) * length y = cos(radians(angle)) * length ... easier to write, easier to read. -- http://mail.python.org/mailman/listinfo/python-list
[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.
John Machin sjmac...@lexicon.net added the comment: @Ezio: Comparison of the behaviour of \letter inside/outside character classes is irrelevant. The rules for inside can be expressed simply as: 1. Letters dDsSwW are special; they represent categories as documented, and do in fact have a similar meaning outside character classes. 2. Otherwise normal Python rules for backslash escapes in string literals should be followed. This means automatically that \a - \x07, \A - A, \b - backspace, \B - B, \z - z and \Z - Z. @Georg: No need to read the source, just read my initial posting: It's compiled as a zero-length matcher (at) inside a character class (in) i.e. a nonsense, then at runtime the illegality is deliberately ignored. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13899 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.
John Machin sjmac...@lexicon.net added the comment: Whoops: normal Python rules for backslash escapes should have had a note but revert to the C behaviour of stripping the \ from unrecognised escapes which is what re appears to do in its own \ handling. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13899 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.
New submission from John Machin sjmac...@lexicon.net: Expected behaviour illustrated using C: import re re.findall(r'[\C]', 'CCC') ['C', 'C', 'C'] re.compile(r'[\C]', 128) literal 67 _sre.SRE_Pattern object at 0x01FC6E78 re.compile(r'C', 128) literal 67 _sre.SRE_Pattern object at 0x01FC6F08 Incorrect behaviour exhibited by A (and by B and Z): re.findall(r'[\A]', 'AAA') [] re.compile(r'A', 128) literal 65 _sre.SRE_Pattern object at 0x01FC6F98 re.compile(r'[\A]', 128) in at at_beginning_string FAIL _sre.SRE_Pattern object at 0x01FDF0B0 Also there is no self-checking at runtime; the switch default has a comment to the effect that nothing can be done, so pretend that the unknown opcode matched nothing. Zen? -- messages: 152194 nosy: sjmachin priority: normal severity: normal status: open title: re pattern r[\A] should work like A but matches nothing. Ditto B and Z. type: behavior versions: Python 2.7, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13899 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.
John Machin sjmac...@lexicon.net added the comment: @ezio: Of course the context is inside a character class. I expect r'[\b]' to act like r'\b' aka r'\x08' aka backspace because (1) that is the treatment applied to all other C-like control char escapes (2) the docs say so explicitly: Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13899 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13782] xml.etree.ElementTree: Element.append doesn't type-check its argument
New submission from John Machin sjmac...@lexicon.net: import xml.etree.ElementTree as et node = et.Element('x') node.append(not_an_Element_instance) 2.7 and 3.2 produce no complaint at all. 2.6 and 3.1 produce an AssertionError. However cElementTree in all 4 versions produces a TypeError. Please fix 2.7 and 3.2 ElementTree to produce a TypeError. -- messages: 151210 nosy: sjmachin priority: normal severity: normal status: open title: xml.etree.ElementTree: Element.append doesn't type-check its argument type: behavior versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13782 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: unicode by default
On Thu, May 12, 2011 4:31 pm, harrismh777 wrote: So, the UTF-16 UTF-32 is INTERNAL only, for Python NO. See one of my previous messages. UTF-16 and UTF-32, like UTF-8 are encodings for the EXTERNAL representation of Unicode characters in byte streams. I also was not aware that UTF-8 chars could be up to six(6) byes long from left to right. It could be, once upon a time in ISO faerieland, when it was thought that Unicode could grow to 2**32 codepoints. However ISO and the Unicode consortium have agreed that 17 planes is the utter max, and accordingly a valid UTF-8 byte sequence can be no longer than 4 bytes ... see below chr(17 * 65536) Traceback (most recent call last): File stdin, line 1, in module ValueError: chr() arg not in range(0x11) chr(17 * 65536 - 1) '\U0010' _.encode('utf8') b'\xf4\x8f\xbf\xbf' b'\xf5\x8f\xbf\xbf'.decode('utf8') Traceback (most recent call last): File stdin, line 1, in module File C:\python32\lib\encodings\utf_8.py, line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xf5 in position 0: invalid start byte -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode by default
On Thu, May 12, 2011 8:51 am, harrismh777 wrote: Is it true that if I am working without using bytes sequences that I will not need to care about the encoding anyway, unless of course I need to specify a unicode code point? Quite the contrary. (1) You cannot work without using bytes sequences. Files are byte sequences. Web communication is in bytes. You need to (know / assume / be able to extract / guess) the input encoding. You need to encode your output using an encoding that is expected by the consumer (or use an output method that will do it for you). (2) You don't need to use bytes to specify a Unicode code point. Just use an escape sequence e.g. \u0404 is a Cyrillic character. -- http://mail.python.org/mailman/listinfo/python-list
Re: urllib2 request with binary file as payload
On Thu, May 12, 2011 10:20 am, Michiel Sikma wrote: Hi there, I made a small script implementing a part of Youtube's API that allows you to upload videos. It's pretty straightforward and uses urllib2. The script was written for Python 2.6, but the server I'm going to use it on only has 2.5 (and I can't update it right now, unfortunately). It seems that one vital thing doesn't work in 2.5's urllib2: -- data = open(video['filename'], 'rb') opener = urllib2.build_opener(urllib2.HTTPHandler) req = urllib2.Request(settings['upload_location'], data, { 'Host': 'uploads.gdata.youtube.com', 'Content-Type': video['type'], 'Content-Length': '%d' % os.path.getsize(video['filename']) }) req.get_method = lambda: 'PUT' url = opener.open(req) -- This works just fine on 2.6: send: open file 'file.mp4', mode 'rb' at 0x1005db580 sendIng a read()able However, on 2.5 it refuses: Traceback (most recent call last): [snip] TypeError: sendall() argument 1 must be string or read-only buffer, not file I don't use this stuff, just curious. But I can read docs. Quoting from the 2.6.6 docs: class urllib2.Request(url[, data][, headers][, origin_req_host][, unverifiable]) This class is an abstraction of a URL request. url should be a string containing a valid URL. data may be a string specifying additional data to send to the server, or None if no such data is needed. Currently HTTP requests are the only ones that use data; the HTTP request will be a POST instead of a GET when the data parameter is provided. data should be a buffer in the standard application/x-www-form-urlencoded format. The urllib.urlencode() function takes a mapping or sequence of 2-tuples and returns a string in this format. 2.6 is expecting a string, according to the above. No mention of file. Moreover it expects the data to be urlencoded. 2.7.1 docs say the same thing. Are you sure you have shown the code that worked with 2.6? -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode by default
On Thu, May 12, 2011 11:22 am, harrismh777 wrote: John Machin wrote: (1) You cannot work without using bytes sequences. Files are byte sequences. Web communication is in bytes. You need to (know / assume / be able to extract / guess) the input encoding. You need to encode your output using an encoding that is expected by the consumer (or use an output method that will do it for you). (2) You don't need to use bytes to specify a Unicode code point. Just use an escape sequence e.g. \u0404 is a Cyrillic character. Thanks John. In reverse order, I understand point (2). I'm less clear on point (1). If I generate a string of characters that I presume to be ascii/utf-8 (no \u0404 type characters) and write them to a file (stdout) how does default encoding affect that file.by default..? I'm not seeing that there is anything unusual going on... About characters that I presume to be ascii/utf-8 (no \u0404 type characters): All Unicode characters (including U+0404) are encodable in bytes using UTF-8. The result of sys.stdout.write(unicode_characters) to a TERMINAL depends mostly on sys.stdout.encoding. This is likely to be UTF-8 on a linux/OSX/platform. On a typical American / Western European /[former] colonies Windows box, this is likely to be cp850 on a Command Prompt window, and cp1252 in IDLE. UTF-8: All Unicode characters are encodable in UTF-8. Only problem arises if the terminal can't render the character -- you'll get spaces or blobs or boxes with hex digits in them or nothing. Windows (Command Prompt window): only a small subset of characters can be encoded in e.g. cp850; anything else causes an exception. Windows (IDLE): ignores sys.stdout.encoding and renders the characters itself. Same outcome as *x/UTF-8 above. If you write directly (or sys.stdout is redirected) to a FILE, the default encoding is obtained by sys.getdefaultencoding() and is AFAIK ascii unless the machine's site.py has been fiddled with to make it UTF-8 or something else. If I open the file with vi? If I open the file with gedit? emacs? Any editor will have a default encoding; if that doesn't match the file encoding, you have a (hopefully obvious) problem if the editor doesn't detect the mismatch. Consult your editor's docs or HTFF1K. Another question... in mail I'm receiving many small blocks that look like sprites with four small hex codes, scattered about the mail... mostly punctuation, maybe? ... guessing, are these unicode code points, yes and if so what is the best way to 'guess' the encoding? google(chardet) or rummage through the mail headers (but 4 hex digits in a box are a symptom of inability to render, not necessarily caused by an incorrect decoding) ... is it coded in the stream somewhere...protocol? Should be. -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode by default
On Thu, May 12, 2011 1:44 pm, harrismh777 wrote: By default it looks like Python3 is writing output with UTF-8 as default... and I thought that by default Python3 was using either UTF-16 or UTF-32. So, I'm confused here... also, I used the character sequence \u00A3 which I thought was UTF-16... but Python3 changed my intent to 'c2a3' which is the normal UTF-8... Python uses either a 16-bit or a 32-bit INTERNAL representation of Unicode code points. Those NN bits have nothing to do with the UTF-NN encodings, which can be used to encode the codepoints as byte sequences for EXTERNAL purposes. In your case, UTF-8 has been used as it is the default encoding on your platform. -- http://mail.python.org/mailman/listinfo/python-list
Re: unicode by default
On Thu, May 12, 2011 2:14 pm, Benjamin Kaplan wrote: If the file you're writing to doesn't specify an encoding, Python will default to locale.getdefaultencoding(), No such attribute. Perhaps you mean locale.getpreferredencoding() -- http://mail.python.org/mailman/listinfo/python-list
codecs.open() doesn't handle platform-specific line terminator
According to the 3.2 docs (http://docs.python.org/py3k/library/codecs.html#codecs.open), Files are always opened in binary mode, even if no binary mode was specified. This is done to avoid data loss due to encodings using 8-bit values. This means that no automatic conversion of b'\n' is done on reading and writing. The first point is that one would NOT expect conversion of b'\n' anyway. One expects '\n' - os.sep.encode(the_encoding) on writing and vice versa on reading. The second point is that there is no such restriction with the built-in open(), which appears to work as expected, doing (e.g. Windows, UTF-16LE) '\n' - b'\r\x00\n\x00' when writing and vice versa on reading, and not striking out when thrown curve balls like '\u0a0a'. Why is codecs.open() different? What does encodings using 8-bit values mean? What data loss? -- http://mail.python.org/mailman/listinfo/python-list
Re: codec for UTF-8 with BOM
On Monday, 2 May 2011 19:47:45 UTC+10, Chris Rebert wrote: On Mon, May 2, 2011 at 1:34 AM, Ulrich Eckhardt ulrich@dominolaser.com wrote: The correct name, as you found below and as is corroborated by the webpage, seems to be utf_8_sig: uFOøbar.encode('utf_8_sig') '\xef\xbb\xbfFO\xc3\xb8bar' To complete the picture, decoding swallows the BOM: '\xef\xbb\xbfFO\xc3\xb8bar'.decode('utf_8_sig') u'FO\xf8bar' -- http://mail.python.org/mailman/listinfo/python-list
Re: Snowball to Python compiler
On Friday, April 22, 2011 8:05:37 AM UTC+10, Matt Chaput wrote: I'm looking for some code that will take a Snowball program and compile it into a Python script. Or, less ideally, a Snowball interpreter written in Python. (http://snowball.tartarus.org/) If anyone has done such things they are not advertising them in the usual places. A third (more-than-) possible solution: google(python snowball); the first page of results has at least 3 hits referring to Python wrappers for Snowball. -- http://mail.python.org/mailman/listinfo/python-list
[issue7198] Extraneous newlines with csv.writer on Windows
John Machin sjmac...@lexicon.net added the comment: Can somebody please review my doc patch submitted 2 months ago? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7198 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7198] Extraneous newlines with csv.writer on Windows
John Machin sjmac...@lexicon.net added the comment: Skip, The changes that I suggested have NOT been made. Please re-read the doc page you pointed to. The writer paragraph does NOT mention that newline='' is required when writing. The writer examples do NOT include newline=''. The examples have NOT been enhanced by using a with statement and not using space as an example delimiter. PLEASE RE-OPEN THIS ISSUE. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7198 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10954] No warning for csv.writer API change
John Machin sjmac...@lexicon.net added the comment: The doc patch proposed by Skip on 2001-01-24 for this bug has NOT been reviewed, let alone applied. Sibling bug #7198 has been closed in error. Somebody please help. -- nosy: +skip.montanaro ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10954 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10954] No warning for csv.writer API change
John Machin sjmac...@lexicon.net added the comment: Terry, I have already made the point the docs bug is #7198. This is the meaningful-exception bug. My review is changing 'should' to 'must' is not very useful without a consistent interpretation of what those two words mean and without any enforcement of use of newline=''. I was patient enough to wait 2 months for a review of my doc patch on #7198. My issues are that the 3.2 docs have NOT been changed (have a look at the csv.writer paragraph: do you see the word newline anywhere??), #7198 has been closed without any action, and BOTH of these two issues (which have in effect been lurking about since Python 3.0.0alpha) appear to have been abandoned. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10954 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: getting text out of an xml string
On Mar 5, 8:57 am, JT jeff.temp...@gmail.com wrote: On Mar 4, 9:30 pm, John Machin sjmac...@lexicon.net wrote: Your data has been FUABARred (the first A being for Almost) -- the \u3c00 and \u3e00 were once and respectively. You will Hi John, I realized that a few minutes after posting. I then realized that I could just extract the text between the stuff with \u3c00 xml preserve etc, which I did; it was good enough since it was a one-off affair, I had to convert a to-do list from one program to another. Thanks for replying and sorry for the noise :-) Next time you need to extract some data from an xml file, please (for your own good) don't do whatever you did in that code -- note that the unicode equivalent of is u\u003c, NOT u\u3c00; I wasn't joking when I said it had been FU. -- http://mail.python.org/mailman/listinfo/python-list
Re: getting text out of an xml string
On Mar 5, 6:53 am, JT jeff.temp...@gmail.com wrote: Yo, So I have almost convinced a small program to do what I want it to do. One thing remains (at least, one thing I know of at the moment): I am converting xml to some other format, and there are strings in the xml like this. The python: elif v == content: print content, a.childNodes[0].nodeValue what gets printed: content \u3c00note xml:space=preserve\u3e00see forms in red inbox \u3c00/note\u3e00 what this should say is see forms in red inbox because that is what the the program whose xml file i am trying to convert, properly displays, because that is what I typed in oh so long ago. So my question to you is, how can I convert this enhanced version to a normal string? Esp. since there is this xml:space=preserve thing in there ... I suspect the rest is just some unicode issue. Thanks for any help. J long time no post T Your data has been FUABARred (the first A being for Almost) -- the \u3c00 and \u3e00 were once and respectively. You will need to show (a) a snippet of the xml file including the data that has the problem (b) the code that you have written, cut down to a small script that is runnable and displays the problem. Tell us what version of Python you are running, on what OS. -- http://mail.python.org/mailman/listinfo/python-list
Re: 2to3 chokes on bad character
On Feb 23, 7:47 pm, Frank Millman fr...@chagford.com wrote: Hi all I don't know if this counts as a bug in 2to3.py, but when I ran it on my program directory it crashed, with a traceback but without any indication of which file caused the problem. [traceback snipped] UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 5055: invalid start byte On investigation, I found some funny characters in docstrings that I copy/pasted from a pdf file. Here are the details if they are of any use. Oddly, I found two instances where characters 'look like' apostrophes when viewed in my text editor, but one of them was accepted by 2to3 and the other caused the crash. The one that was accepted consists of three bytes - 226, 128, 153 (as reported by python 2.6) How did you incite it to report like that? Just use repr(the_3_bytes). It'll show up as '\xe2\x80\x99'. from unicodedata import name as ucname ''.join(chr(i) for i in (226, 128, 153)).decode('utf8') u'\u2019' ucname(_) 'RIGHT SINGLE QUOTATION MARK' What you have there is the UTF-8 representation of U+2019 RIGHT SINGLE QUOTATION MARK. That's OK. or 226, 8364, 8482 (as reported by python3.2). Sorry, but you have instructed Python 3.2 to commit a nonsense: [ord(chr(i).decode('cp1252')) for i in (226, 128, 153)] [226, 8364, 8482] In other words, you have taken that 3-byte sequence, decoded each byte separately using cp1252 (aka the usual suspect) into a meaningless Unicode character and printed its ordinal. In Python 3, don't use repr(); it has undergone the MHTP transformation and become ascii(). The one that crashed consists of a single byte - 146 (python 2.6) or 8217 (python 3.2). chr(146).decode('cp1252') u'\u2019' hex(8217) '0x2019' The issue is not that 2to3 should handle this correctly, but that it should give a more informative error message to the unsuspecting user. Your Python 2.x code should be TESTED before you poke 2to3 at it. In this case just trying to run or import the offending code file would have given an informative syntax error (you have declared the .py file to be encoded in UTF-8 but it's not). BTW I have always waited for 'final releases' before upgrading in the past, but this makes me realise the importance of checking out the beta versions - I will do so in future. I'm willing to bet that the same would happen with Python 3.1, if a 3.1 to 3.2 upgrade is what you are talking about -- http://mail.python.org/mailman/listinfo/python-list
Re: 2to3 chokes on bad character
On Feb 25, 12:00 am, Peter Otten __pete...@web.de wrote: John Machin wrote: Your Python 2.x code should be TESTED before you poke 2to3 at it. In this case just trying to run or import the offending code file would have given an informative syntax error (you have declared the .py file to be encoded in UTF-8 but it's not). The problem is that Python 2.x accepts arbitrary bytes in string constants. Ummm ... isn't that a bug? According to section 2.1.4 of the Python 2.7.1 Language Reference Manual: The encoding is used for all lexical analysis, in particular to find the end of a string, and to interpret the contents of Unicode literals. String literals are converted to Unicode for syntactical analysis, then converted back to their original encoding before interpretation starts ... How do you reconcile used for all lexical analysis and String literals are converted to Unicode for syntactical analysis with the actual (astonishing to me) behaviour? -- http://mail.python.org/mailman/listinfo/python-list
Re: py3k: converting int to bytes
On Feb 25, 4:39 am, Terry Reedy wrote: Note: an as yet undocumented feature of bytes (at least in Py3) is that bytes(count) == bytes()*count == b'\x00'*count. Python 3.1.3 docs for bytes() say same constructor args as for bytearray(); this says about the source parameter: If it is an integer, the array will have that size and will be initialized with null bytes -- http://mail.python.org/mailman/listinfo/python-list
[issue11204] re module: strange behaviour of space inside {m, n}
New submission from John Machin sjmac...@lexicon.net: A pattern like rb{1,3}\Z matches b, bb, and bbb, as expected. There is no documentation of the behaviour of rb{1, 3}\Z -- it matches the LITERAL TEXT b{1, 3} in normal mode and b{1,3} in verbose mode. # paste the following at the interactive prompt: pat = rb{1, 3}\Z bool(re.match(pat, bb)) # False bool(re.match(pat, b{1, 3})) # True bool(re.match(pat, bb, re.VERBOSE)) # False bool(re.match(pat, b{1, 3}, re.VERBOSE)) # False bool(re.match(pat, b{1,3}, re.VERBOSE)) # True Suggested change, in decreasing order of preference: (1) Ignore leading/trailing spaces when parsing the m and n components of {m,n} (2) Raise an exception if the exact syntax is not followed (3) Document the existing behaviour Note: deliberately matching the literal text would be expected to be done by escaping the left brace: pat2 = rb\{1, 3}\Z bool(re.match(pat2, b{1, 3})) # True and this is not prevented by the suggested changes. -- messages: 128472 nosy: sjmachin priority: normal severity: normal status: open title: re module: strange behaviour of space inside {m, n} versions: Python 2.7, Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11204 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: python crash problem
On Feb 3, 8:21 am, Terry Reedy tjre...@udel.edu wrote: On 2/2/2011 2:19 PM, Yelena wrote: . When having a problem with a 3rd party module, not part of the stdlib, you should give a source. http://sourceforge.net/projects/dbfpy/ This appears to be a compiled extension. Nearly always, when Python crashes running such, it is a problem with the extension. So you probably need to direct your question to the author or a project mailing list if there is one. It has always appeared to me to be a pure-Python package. There are no .c or .pyx files in the latest source (.tgz) distribution. The Windows installer installs only files whose extensions match py[co]?. -- http://mail.python.org/mailman/listinfo/python-list
[issue10954] No warning for csv.writer API change
John Machin sjmac...@lexicon.net added the comment: Skip, the docs bug is #7198. This is the meaningful-exception bug. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10954 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10954] No warning for csv.writer API change
John Machin sjmac...@lexicon.net added the comment: I don't understand Changing csv api is a feature request that could only happen in 3.3. This is NOT a request for an API change. Lennert's point is that an API change was made in 3.0 as compared with 2.6 but there is no fixer in 2to3. What is requested is for csv.reader/writer to give more meaningful error messages for valid 2.x code that has been put through fixer-less 2to3. The name of the arg is newline. newlines is an attribute that stores what was actually found in universal newlines mode. newline='' is needed on input for the same reason that binary mode is required in 2.x: \r and \n may quite validly appear in data, inside a quoted field, and must not be treated as part of a row separator. newline='' is needed on output for the same reason that binary mode is required in 2.x: any \n in the data and any \n in the caller's chosen line terminator must be preserved from being changed to os.linesep (e.g. \r\n). newline is not available as an attribute of the _io.TextIOWrapper object created by open('xxx.csv', 'w', newline=''); is exposing this possible? -- versions: +Python 3.2 -Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10954 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10954] No warning for csv.writer API change
John Machin sjmac...@lexicon.net added the comment: I believe that both csv.reader and csv.writer should fail with a meaningful message if mode is binary or newline is not '' -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10954 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7198] Extraneous newlines with csv.writer on Windows
John Machin sjmac...@lexicon.net added the comment: docpatch for 3.x csv docs: In the csv.writer docs, insert the sentence If csvfile is a file object, it should be opened with newline=''. immediately after the sentence csvfile can be any object with a write() method. In the closely-following example, change the open call from open('eggs.csv', 'w') to open('eggs.csv', 'w', newline=''). In section 13.1.5 Examples, there are 2 reader cases and 1 writer case that likewise need inserting , newline='' in the open call. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7198 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: Interesting bug
On Jan 2, 12:22 am, Daniel Fetchinson fetchin...@googlemail.com wrote: An AI bot is playing a trick on us. Yes, it appears that the mystery is solved: Mark V. Shaney is alive and well and living in Bangalore :-) -- http://mail.python.org/mailman/listinfo/python-list
[issue7198] Extraneous newlines with csv.writer on Windows
John Machin sjmac...@users.sourceforge.net added the comment: Skip, I'm WRITING, not reading.. Please read the 3.1 documentation for csv.writer. It does NOT mention newline='', and neither does the example. Please fix. Other problems with the examples: (1) They encourage a bad habit (open inside the call to reader/writer); good practice is to retain the reference to the file handle (preferably with a with statement) so that it can be closed properly. (2) delimiter=' ' is very unrealistic. The documentation for both 2.x and 3.x should be much more explicit about what is needed in open() for csv to work properly and portably: 2.x read: use mode='rb' -- otherwise fail on Windows 2.x write: use mode='wb' -- otherwise fail on Windows 3.x read: use newline='' -- otherwise fail unconditionally(?) 3.x write: use newline='' -- otherwise fail on Windows The 2.7 documentation says If csvfile is a file object, it must be opened with the 'b' flag on platforms where that makes a difference ... in my experience, people are left asking what platforms? what difference?; Windows should be mentioned explicitly. -- versions: +Python 2.7, Python 3.2, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7198 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7198] Extraneous newlines with csv.writer on Windows
John Machin sjmac...@users.sourceforge.net added the comment: Please re-open this. The binary/text mode problem still exists with Python 3.X on Windows. Quite simply, there is no option available to the caller to open the output file in binary mode, because the module is throwing str objects at the file. The module's idea of taking control in the default case appears to be to write \r\n which is then processed by the Windows runtime and becomes \r\r\n. Python 3.1.3 (r313:86834, Nov 27 2010, 18:30:53) [MSC v.1500 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. import csv f = open('terminator31.csv', 'w') row = ['foo', None, 3.14159] writer = csv.writer(f) writer.writerow(row) 14 writer.writerow(row) 14 f.close() open('terminator31.csv', 'rb').read() b'foo,,3.14159\r\r\nfoo,,3.14159\r\r\n' And it's not just a row terminator problem; newlines embedded in fields are likewise expanded to \r\n by the Windows runtime. -- nosy: +sjmachin versions: +Python 3.1 -Python 2.6 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7198 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: Modifying an existing excel spreadsheet
On Dec 21, 8:56 am, Ed Keith e_...@yahoo.com wrote: I have a user supplied 'template' Excel spreadsheet. I need to create a new excel spreadsheet based on the supplied template, with data filled in. I found the tools herehttp://www.python-excel.org/, andhttp://sourceforge.net/projects/pyexcelerator/. I have been trying to use the former, since the latter seems to be devoid of documentation (not even any docstrings). pyExcelerator is abandonware. Use xlwt instead; it's a bug-fixed/ maintained/enhanced fork of pyExcelerator Read the tutorial that you'll find mentioned on http://www.python-excel.org Join the google group that's also mentioned there; look at past questions, ask some more, ... -- http://mail.python.org/mailman/listinfo/python-list
Re: Ensuring symmetry in difflib.SequenceMatcher
On Nov 24, 8:43 pm, Peter Otten __pete...@web.de wrote: John Yeung wrote: I'm generally pleased with difflib.SequenceMatcher: It's probably not the best available string matcher out there, but it's in the standard library and I've seen worse in the wild. One thing that kind of bothers me is that it's sensitive to which argument you pick as seq1 and which you pick as seq2: Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. import difflib difflib.SequenceMatcher(None, 'BYRD', 'BRADY').ratio() 0.2 difflib.SequenceMatcher(None, 'BRADY', 'BYRD').ratio() 0.3 Is this a bug? I am guessing the algorithm is implemented correctly, and that it's just an inherent property of the algorithm used. It's certainly not what I'd call a desirably property. Are there any simple adjustments that can be made without sacrificing (too much) performance? def symmetric_ratio(a, b, S=difflib.SequenceMatcher): return (S(None, a, b).ratio() + S(None, b, a).ratio())/2.0 I'm expecting 50% performance loss ;) Seriously, have you tried to calculate the ratio with realistic data? Without looking into the source I would expect the two ratios to get more similar. Peter Surnames are extremely realistic data. The OP should consider using Levenshtein distance, which is symmetric. A good (non-naive) implementation should be much faster than difflib. ratio = 1.0 - levenshtein(a, b) / float(max(len(a), len(b))) -- http://mail.python.org/mailman/listinfo/python-list
Re: Raw Unicode docstring
On Nov 17, 9:34 am, Alexander Kapps alex.ka...@web.de wrote: urScheißt\nderBär\nim Wald? Nicht ohne eine Genehmigung von der Umwelt Erhaltung Abteilung. -- http://mail.python.org/mailman/listinfo/python-list
Re: A bug for raw string literals in Py3k?
On Oct 31, 11:23 pm, Yingjie Lan lany...@yahoo.com wrote: So I suppose this is a bug? It's not, see http://docs.python.org/py3k/reference/lexical_analysis.html#literals # Specifically, a raw string cannot end in a single backslash Thanks! That looks weird to me ... doesn't this contradict with: All backslashes in raw string literals are interpreted literally. (seehttp://docs.python.org/release/3.0.1/whatsnew/3.0.html): All backslashes in syntactically-correct raw string literals are interpreted literally. -- http://mail.python.org/mailman/listinfo/python-list
Re: Runtime error
On Oct 29, 3:26 am, Sebastian python-maill...@elygor.de wrote: Hi all, I am new to python and I don't know how to fix this error. I only try to execute python (or a cgi script) and I get an ouptu like [...] 'import site' failed; traceback: Traceback (most recent call last): File /usr/lib/python2.6/site.py, line 513, in module main() File /usr/lib/python2.6/site.py, line 496, in main known_paths = addsitepackages(known_paths) File /usr/lib/python2.6/site.py, line 288, in addsitepackages addsitedir(sitedir, known_paths) File /usr/lib/python2.6/site.py, line 185, in addsitedir addpackage(sitedir, name, known_paths) File /usr/lib/python2.6/site.py, line 155, in addpackage exec line File string, line 1, in module File /usr/lib/python2.6/site.py, line 185, in addsitedir addpackage(sitedir, name, known_paths) File /usr/lib/python2.6/site.py, line 155, in addpackage exec line File string, line 1, in module File /usr/lib/python2.6/site.py, line 185, in addsitedir addpackage(sitedir, name, known_paths) File /usr/lib/python2.6/site.py, line 155, in addpackage exec line [...] File /usr/lib/python2.6/site.py, line 185, in addsitedir addpackage(sitedir, name, known_paths) File /usr/lib/python2.6/site.py, line 155, in addpackage exec line File string, line 1, in module File /usr/lib/python2.6/site.py, line 175, in addsitedir sitedir, sitedircase = makepath(sitedir) File /usr/lib/python2.6/site.py, line 76, in makepath dir = os.path.abspath(os.path.join(*paths)) RuntimeError: maximum recursion depth exceeded What is going wrong with my python install? What do I have to change? Reading the code for site.py, it looks like you may have a .pth file that is self-referential (or a chain or 2 or more .pth files!) that are sending you around in a loop. If you are having trouble determining what files are involved, you could put some print statements in your site.py at about lines 155 and 185 (which appear to be in the loop, according to the traceback) or step through it with a debugger. -- http://mail.python.org/mailman/listinfo/python-list
Re: Get alternative char name with unicodedata.name() if no formal one defined
On Oct 14, 7:25 pm, Dirk Wallenstein hals...@t-online.de wrote: Hi, I'd like to get control char names for the first 32 codepoints, but they apparently only have an alias and no official name. Is there a way to get the alternative character name (alias) in Python? AFAIK there is no programatically-available list of those names. Try something like: name = unicodedata.name(x, some_default) if x u\x1f else (NULL, etc etc, UNIT SEPARATOR)[ord(x)] or similarly with a prepared dict: C0_CONTROL_NAMES = { u\x00: NULL, # etc u\x1f: UNIT SEPARATOR, } name = unicodedata.name(x, some_default) if x u\x1f else C0_CONTROL_NAMES[x] -- http://mail.python.org/mailman/listinfo/python-list
Re: Wrong default endianess in utf-16 and utf-32 !?
jmfauth wxjmfauth at gmail.com writes: When an endianess is not specified, (BE, LE, unmarked forms), the Unicode Consortium specifies, the default byte serialization should be big-endian. See http://www.unicode.org/faq//utf_bom.html Q: Which of the UTFs do I need to support? and Q: Why do some of the UTFs have a BE or LE in their label, such as UTF-16LE? Sometimes it is necessary to read right to the end of an answer: Q: Why do some of the UTFs have a BE or LE in their label, such as UTF-16LE? A: [snip] the unmarked form uses big-endian byte serialization by default, but may include a byte order mark at the beginning to indicate the actual byte serialization used. -- http://mail.python.org/mailman/listinfo/python-list
cp936 uses gbk codec, doesn't decode `\x80` as U+20AC EURO SIGN
| '\x80'.decode('cp936') Traceback (most recent call last): File stdin, line 1, in module UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 0: incomplete multibyte sequence However: Retrieved 2010-10-10 from http://www.unicode.org/Public /MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT #Name: cp936 to Unicode table #Unicode version: 2.0 #Table version: 2.01 #Table format: Format A #Date: 1/7/2000 # #Contact: shawn.ste...@microsoft.com ... 0x7F0x007F #DELETE 0x800x20AC #EURO SIGN 0x81#DBCS LEAD BYTE Retrieved 2010-10-10 from http://msdn.microsoft.com/en-us/goglobal/cc305153.aspx Windows Codepage 936 [pictorial mapping; shows 80 mapping to 20AC] Retrieved 2010-10-10 from http://demo.icu-project.org /icu-bin/convexp?conv=windows-936-2000s=ALL [pictorial mapping for converter windows-936-2000 with aliases including GBK, CP936, MS936; shows 80 mapping to 20AC] So Microsoft appears to think that cp936 includes the euro, and the ICU project seem to think that GBK and cp936 both include the euro. A couple of questions: Is this a bug or a shrug? Where can one find the mapping tables from which the various CJK codecs are derived? -- http://mail.python.org/mailman/listinfo/python-list
[issue9980] str(float) failure
Changes by John Machin sjmac...@users.sourceforge.net: -- nosy: +sjmachin ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9980 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
strange results from sys.version
I am trying to help a user of my xlrd package who says he is getting anomalous results on his work computer but not on his home computer. Attempts to reproduce his alleged problem in a verifiable manner on his work computer have failed, so far ... the only meaning difference in script output is in sys.version User (work): sys.version: 2.7 (r27:82500, Aug 23 2010, 17:18:21) etc Me : sys.version: 2.7 (r27:82525, Jul 4 2010, 09:01:59) etc I have just now downloaded the Windows x86 msi from www.python.org and reinstalled it on another computer. It gives the same result as on my primary computer (above). User result looks whacked: lower patch number, later date. www.python.org says Python 2.7 was released on July 3rd, 2010. Is it possible that the work computer is using an unofficial release? What other possibilities are there? Thanks in advance ... -- http://mail.python.org/mailman/listinfo/python-list
Re: Detect string has non-ASCII chars without checking each char?
On Aug 22, 5:07 pm, Michel Claveau - MVPenleverlesx_xx...@xmclavxeaux.com.invalid wrote: Hi! Another way : # -*- coding: utf-8 -*- import unicodedata def test_ascii(struni): strasc=unicodedata.normalize('NFD', struni).encode('ascii','replace') if len(struni)==len(strasc): return True else: return False print test_ascii(uabcde) print test_ascii(uabcdê) -1 Try your code with uabcd\xa1 ... it says it's ASCII. Suggestions: test_ascii = lambda s: len(s.decode('ascii', 'ignore')) == len(s) or test_ascii = lambda s: all(c u'\x80' for c in s) or use try/except Also: if a == b: return True else: return False is a horribly bloated way of writing return a == b -- http://mail.python.org/mailman/listinfo/python-list
Re: Detect string has non-ASCII chars without checking each char?
On Aug 23, 1:10 am, Michel Claveau - MVPenleverlesx_xx...@xmclavxeaux.com.invalid wrote: Re ! Try your code with uabcd\xa1 ... it says it's ASCII. Ah? in my computer, it say False Perhaps your computer has a problem. Mine does this with both Python 2.7 and Python 2.3 (which introduced the unicodedata.normalize function): import unicodedata t1 = uabcd\xa1 t2 = unicodedata.normalize('NFD', t1) t3 = t2.encode('ascii', 'replace') [t1, t2, t3] [u'abcd\xa1', u'abcd\xa1', 'abcd?'] map(len, _) [5, 5, 5] -- http://mail.python.org/mailman/listinfo/python-list
Re: re.sub and variables
On Aug 13, 7:33 am, fuglyducky fuglydu...@gmail.com wrote: On Aug 12, 2:06 pm, fuglyducky fuglydu...@gmail.com wrote: I have a function that I am attempting to call from another file. I am attempting to replace a string using re.sub with another string. The problem is that the second string is a variable. When I get the output, it shows the variable name rather than the value. Is there any way to pass a variable into a regex? If not, is there any other way to do this? I need to be able to dump the variable value into the replacement string. For what it's worth this is an XML file so I'm not afraid to use some sort of XML library but they look fairly complicated for a newbie like me. Also, this is py3.1.2 is that makes any difference. Thanks!!! # import random import re import datetime def pop_time(some_string, start_time): global that_string rand_time = random.randint(0, 30) delta_time = datetime.timedelta(seconds=rand_time) for line in some_string: end_time = delta_time + start_time new_string = re.sub(thisstring, thisstring\\end_time, some_string) start_time = end_time return new_string Disregard...I finally figured out how to use string.replace. That appears to work perfectly. Still...if anyone happens to know about passing a variable into a regex that would be great. Instead of new_string = re.sub( thisstring, thisstring\\end_time, some_string) you probably meant to use something like new_string = re.sub( thisstring, thisstring + \\ + end_time, some_string) string.replace is antique and deprecated. You should be using methods of str objects, not functions in the string module. s1 = foobarzot s2 = s1.replace(bar, -) s2 'foo-zot' -- http://mail.python.org/mailman/listinfo/python-list
Re: Ascii to Unicode.
On Jul 30, 4:18 am, Carey Tilden carey.til...@gmail.com wrote: In this case, you've been able to determine the correct encoding (latin-1) for those errant bytes, so the file itself is thus known to be in that encoding. The most probably correct encoding is, as already stated, and agreed by the OP to be, cp1252. -- http://mail.python.org/mailman/listinfo/python-list
Re: Where is the help page for re.MatchObject?
On Jul 28, 1:26 pm, Peng Yu pengyu...@gmail.com wrote: I know the library reference webpage for re.MatchObject is athttp://docs.python.org/library/re.html#re.MatchObject But I don't find such a help page in python help(). Does anybody know how to get it in help()? Yes, but it doesn't tell you very much: | import re | help(re.match('x', 'x')) | Help on SRE_Match object: | | class SRE_Match(__builtin__.object) | | -- http://mail.python.org/mailman/listinfo/python-list
Re: Ascii to Unicode.
On Jul 29, 4:32 am, Joe Goldthwaite j...@goldthwaites.com wrote: Hi, I've got an Ascii file with some latin characters. Specifically \xe1 and \xfc. I'm trying to import it into a Postgresql database that's running in Unicode mode. The Unicode converter chokes on those two characters. I could just manually replace those to characters with something valid but if any other invalid characters show up in later versions of the file, I'd like to handle them correctly. I've been playing with the Unicode stuff and I found out that I could convert both those characters correctly using the latin1 encoder like this; import unicodedata s = '\xe1\xfc' print unicode(s,'latin1') The above works. When I try to convert my file however, I still get an error; import unicodedata input = file('ascii.csv', 'r') output = file('unicode.csv','w') for line in input.xreadlines(): output.write(unicode(line,'latin1')) input.close() output.close() Traceback (most recent call last): File C:\Users\jgold\CloudmartFiles\UnicodeTest.py, line 10, in __main__ output.write(unicode(line,'latin1')) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 295: ordinal not in range(128) I'm stuck using Python 2.4.4 which may be handling the strings differently depending on if they're in the program or coming from the file. I just haven't been able to figure out how to get the Unicode conversion working from the file data. Can anyone explain what is going on? Hello hello ... you are running on Windows; the likelihood that you actually have data encoded in latin1 is very very small. Follow MRAB's answer but replace latin1 by cp1252. -- http://mail.python.org/mailman/listinfo/python-list
Re: newb
On Jul 27, 9:07 pm, whitey m...@here.com wrote: hi all. am totally new to python and was wondering if there are any newsgroups that are there specifically for beginners. i have bought a book for $2 called learn to program using python by alan gauld. starting to read it but it was written in 2001. presuming that the commands and info would still be valid? any websites or books that are a must for beginners? any input would be much appreciated...cheers 2001 is rather old. Most of what you'll want is on the web. See http://wiki.python.org/moin/BeginnersGuide -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode error
dirknbr dirknbr at gmail.com writes: I have kind of developped this but obviously it's not nice, any better ideas? try: text=texts[i] text=text.encode('latin-1') text=text.encode('utf-8') except: text=' ' As Steven has pointed out, if the .encode('latin-1') works, the result is thrown away. This would be very fortunate. It appears that your goal was to encode the text in latin1 if possible, otherwise in UTF-8, with no indication of which encoding was used. Your second posting confirmed that you were doing this in a loop, ending up with the possibility that your output file would have records with mixed encodings. Did you consider what a programmer writing code to READ your output file would need to do, e.g. attempt to decode each record as UTF-8 with a fall-back to latin1??? Did you consider what would be the result of sending a stream of mixed-encoding text to a display device? As already advised, the short answer to avoid all of that hassle; just encode in UTF-8. -- http://mail.python.org/mailman/listinfo/python-list
Re: SyntaxError not honoured in list comprehension?
On Jul 5, 1:08 am, Thomas Jollans tho...@jollans.com wrote: On 07/04/2010 03:49 PM, jmfauth wrote: File psi last command, line 1 print9.0 ^ SyntaxError: invalid syntax somewhat strange, yes. There are two tokens, print9 (a name) and .0 (a float constant) -- looks like SyntaxError to me. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python 2.7 released
On Jul 5, 12:27 pm, Martineau ggrp2.20.martin...@dfgh.net wrote: On Jul 4, 8:34 am, Benjamin Peterson benja...@python.org wrote: On behalf of the Python development team, I'm jocund to announce the second release candidate of Python 2.7. Python 2.7 will be the last major version in the 2.x series. However, it will also have an extended period of bugfix maintenance. 2.7 includes many features that were first released in Python 3.1. The faster io module, the new nested with statement syntax, improved float repr, set literals, dictionary views, and the memoryview object have been backported from 3.1. Other features include an ordered dictionary implementation, unittests improvements, a new sysconfig module, auto-numbering of fields in the str/unicode format method, and support for ttk Tile in Tkinter. For a more extensive list of changes in 2.7, seehttp://doc.python.org/dev/whatsnew/2.7.htmlorMisc/NEWS in the Python distribution. To download Python 2.7 visit: http://www.python.org/download/releases/2.7/ 2.7 documentation can be found at: http://docs.python.org/2.7/ This is a production release and should be suitable for all libraries and applications. Please report any bugs you find, so they can be fixed in the next maintenance releases. The bug tracker is at: http://bugs.python.org/ Enjoy! -- Benjamin Peterson Release Manager benjamin at python.org (on behalf of the entire python-dev team and 2.7's contributors) Benjamin (or anyone else), do you know where I can get the Compiled Windows Help file -- python27.chm -- for this release? In the past I've been able to download it from the Python web site, but have been unable to locate it anywhere for this new release. I can't build it myself because I don't have the Microsoft HTML help file compiler. Thanks in advance. If you have a Windows box, download the .msi installer for Python 2.7 and install it. The chm file will be in C:\Python27\Doc (if you choose the default installation directory). Otherwise ask a friendly local Windows user for a copy. -- http://mail.python.org/mailman/listinfo/python-list
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: About the E0 80 81 61 problem: my interpretation is that you are correct, the 80 is not valid in the current state (start byte == E0), so no look-ahead, three FFFDs must be issued followed by 0061. I don't really care about issuing too many FFFDs so long as it doesn't munch valid sequences. However it would be very nice to get an explicit message about surrogates. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: escape character / csv module
On Jul 2, 6:04 am, MRAB pyt...@mrabarnett.plus.com wrote: The csv module imports from _csv, which suggests to me that there's code written in C which thinks that the \x00 is a NUL terminator, so it's a bug, although it's very unusual to want to write characters like \x00 to a CSV file, and I wouldn't be surprised if this is the first time it's been noticed! :-) Don't be surprised, read the documentation (http://docs.python.org/ library/csv.html#module-csv): Note This version of the csv module doesn’t support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe; see the examples in section Examples. These restrictions will be removed in the future. The NUL/printable part of the note has been there since the module was introduced in Python 2.3.0. -- http://mail.python.org/mailman/listinfo/python-list
Re: Handling text lines from files with some (few) starnge chars
On Jun 6, 12:14 pm, MRAB pyt...@mrabarnett.plus.com wrote: Paulo da Silva wrote: Em 06-06-2010 00:41, Chris Rebert escreveu: On Sat, Jun 5, 2010 at 4:03 PM, Paulo da Silva psdasilva.nos...@netcabonospam.pt wrote: ... Specify the encoding of the text when opening the file using the `encoding` parameter. For Windows-1252 for example: your_file = open(path/to/file.ext, 'r', encoding='cp1252') OK! This fixes my current problem. I used encoding=iso-8859-15. This is how my text files are encoded. But what about a more general case where the encoding of the text file is unknown? Is there anything like autodetect? An encoding like 'cp1252' uses 1 byte/character, but so does 'cp1250'. How could you tell which was the correct encoding? Well, if the file contained words in a certain language and some of the characters were wrong, then you'd know that the encoding was wrong. This does imply, though, that you'd need to know what the language should look like! You could try different encodings, and for each one try to identify what could be words, then look them up in dictionaries for various languages to see whether they are real words... This has been automated (semi-successfully, with caveats) by the chardet package ... see http://chardet.feedparser.org/ -- http://mail.python.org/mailman/listinfo/python-list
Re: signed vs unsigned int
On Jun 2, 4:43 pm, johnty johntyw...@gmail.com wrote: i'm reading bytes from a serial port, and storing it into an array. each byte represents a signed 8-bit int. currently, the code i'm looking at converts them to an unsigned int by doing ord(array[i]). however, what i'd like is to get the _signed_ integer value. whats the easiest way to do this? signed = unsigned if unsigned = 127 else unsigned - 256 -- http://mail.python.org/mailman/listinfo/python-list
Re: expat parsing error
On Jun 2, 1:57 am, kak...@gmail.com kak...@gmail.com wrote: On Jun 1, 11:12 am, kak...@gmail.com kak...@gmail.com wrote: On Jun 1, 11:09 am, John Bokma j...@castleamber.com wrote: kak...@gmail.com kak...@gmail.com writes: On Jun 1, 10:34 am, Stefan Behnel stefan...@behnel.de wrote: kak...@gmail.com, 01.06.2010 16:00: how can i fix it, how to ignore the headers and parse only the XML? Consider reading the answers you got in the last thread that you opened with exactly this question. Stefan That's exactly, what i did but something seems to not working with the solutions i had, when i changed my implementation from pure Python's sockets to twisted library! That's the reason i have created a new post! Any ideas why this happened? As I already explained: if you send your headers as well to any XML parser it will choke on those, because the headers are /not/ valid / well-formed XML. The solution is to remove the headers from your data. As I explained before: headers are followed by one empty line. Just remove lines up and until including the empty line, and pass the data to any XML parser. -- John Bokma j3b Hacking Hiking in Mexico - http://johnbokma.com/http://castleamber.com/-Perl; Python Development Thank you so much i'll try it! Antonis Dear John can you provide me a simple working solution? I don't seem to get it You're not wrong. Trysomething like this: rubbish1, rubbish2, xml = your_guff.partition('\n\n') -- http://mail.python.org/mailman/listinfo/python-list
Re: Help with Regexp, \b
On May 30, 1:30 am, andrew cooke and...@acooke.org wrote: That's what I thought it did... Then I read the docs and confused empty string with space(!) and convinced myself otherwise. I think I am going senile. Not necessarily. Conflating concepts like string containing whitespace, string containing space(s), empty aka 0-length string, None, (ASCII) NUL, and (SQL) NULL appears to be an age- independent problem :-) -- http://mail.python.org/mailman/listinfo/python-list
Re: UnicodeDecodeError having fetch web page
Rob Williscroft rtw at rtw.me.uk writes: Barry wrote in news:83dc485a-5a20-403b-99ee-c8c627bdbab3 @m21g2000vbr.googlegroups.com in gmane.comp.python.general: UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: unexpected code byte It may not be you, en.wiktionary.org is sending gzip encoded content back, It sure is; here's where the offending 0x8b comes from: ID1 (IDentification 1) ID2 (IDentification 2) These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139 (0x8b, \213), to identify the file as being in gzip format. (from http://www.faqs.org/rfcs/rfc1952.html) -- http://mail.python.org/mailman/listinfo/python-list
Re: help need to write a python spell checker
On May 19, 1:37 pm, Steven D'Aprano steve-REMOVE- t...@cybersource.com.au wrote: On Wed, 19 May 2010 13:01:10 +1000, Nigel Rowe wrote: I'm happy to do you homework for you, cost is us$1000 per hour. Email to your professor automatically on receipt. I'll do it for $700 an hour! he could save the money if he oogledgay orvignay ellspay eckerchay -- http://mail.python.org/mailman/listinfo/python-list
Re: Puzzled by code pages
Adam Tauno Williams awilliam at whitemice.org writes: On Fri, 2010-05-14 at 20:27 -0400, Adam Tauno Williams wrote: I'm trying to process OpenStep plist files in Python. I have a parser which works, but only for strict ASCII. However plist files may contain accented characters - equivalent to ISO-8859-2 (I believe). For example I read in the line: 'skyp4_filelist_10201/localit\xc3\xa0 termali_sortfield = NSFileName;\n' What is the correct way to re-encode this data into UTF-8 so I can use unicode strings, and then write the output back to ISO8859-? Buried in the parser is a str(...) call. Replacing that with unicode(...) and now the OpenSTEP plist parser is working with Italian plists. Some observations: Italian text is much more likely to be encoded in ISO-8859-1 than ISO-8859-2. The latter covers eastern European languages (e.g. Polish, Czech, Hungarian) that use the Latin alphabet with many decorations not found in western alphabets. Let's look at the 'localit\xc3\xa0' example. Using ISO-8859-2, that decodes to u'localit\u0102\xa0'. The second-last character is LATIN CAPITAL LETTER A WITH BREVE (according to unicodedata.name()). The last character is NO-BREAK SPACE. Doesn't look like an Italian word to me. However, using UTF-8, that decodes to u'localit\xe0'. The last character is LATIN SMALL LETTER A WITH GRAVE. Looks like a plausible Italian word to me. Also to Wikipedia: A località (literally locality; plural località) is the name given in Italian administrative law to a type of territorial subdivision of a comune ... Conclusions: It's worth closely scrutinising accented characters - equivalent to ISO-8859-2 (I believe). Which variety of OpenStep plist files are you looking at: NeXTSTEP, GNUstep, or MAC OS X? If the latter, it's evidently an XML document, and you should be letting the XML parser decode it for you and in any case as an XML document it's most likely UTF-8, not ISO-8859-2. It's worth examining your definition of working. -- http://mail.python.org/mailman/listinfo/python-list
Re: Fastest way to calculate leading whitespace
dasacc22 dasacc22 at gmail.com writes: U presume entirely to much. I have a preprocessor that normalizes documents while performing other more complex operations. Theres nothing buggy about what im doing Are you sure? Your solution calculates (the number of leading whitespace characters) + (the number of TRAILING whitespace characters). Problem 1: including TRAILING whitespace. Example: content + 3 * + \n has 4 leading spaces according to your reckoning; should be 0. Fix: use lstrip() instead of strip() Problem 2: assuming all whitespace characters have *effective* width the same as . Examples: TAB has width 4 or 8 or whatever you want it to be. There are quite a number of whitespace characters, even when you stick to ASCII. When you look at Unicode, there are heaps more. Here's a list of BMP characters such that character.isspace() is True, showing the Unicode codepoint, the Python repr(), and the name of the character (other than for control characters): U+0009 u'\t' ? U+000A u'\n' ? U+000B u'\x0b' ? U+000C u'\x0c' ? U+000D u'\r' ? U+001C u'\x1c' ? U+001D u'\x1d' ? U+001E u'\x1e' ? U+001F u'\x1f' ? U+0020 u' ' SPACE U+0085 u'\x85' ? U+00A0 u'\xa0' NO-BREAK SPACE U+1680 u'\u1680' OGHAM SPACE MARK U+2000 u'\u2000' EN QUAD U+2001 u'\u2001' EM QUAD U+2002 u'\u2002' EN SPACE U+2003 u'\u2003' EM SPACE U+2004 u'\u2004' THREE-PER-EM SPACE U+2005 u'\u2005' FOUR-PER-EM SPACE U+2006 u'\u2006' SIX-PER-EM SPACE U+2007 u'\u2007' FIGURE SPACE U+2008 u'\u2008' PUNCTUATION SPACE U+2009 u'\u2009' THIN SPACE U+200A u'\u200a' HAIR SPACE U+200B u'\u200b' ZERO WIDTH SPACE U+2028 u'\u2028' LINE SEPARATOR U+2029 u'\u2029' PARAGRAPH SEPARATOR U+202F u'\u202f' NARROW NO-BREAK SPACE U+205F u'\u205f' MEDIUM MATHEMATICAL SPACE U+3000 u'\u3000' IDEOGRAPHIC SPACE Hmmm, looks like all kinds of widths, from zero upwards. -- http://mail.python.org/mailman/listinfo/python-list
Re: How to get xml.etree.ElementTree not bomb on invalid characters in XML file ?
On May 5, 12:11 am, Barak, Ron ron.ba...@lsi.com wrote: -Original Message- From: Stefan Behnel [mailto:stefan...@behnel.de] Sent: Tuesday, May 04, 2010 10:24 AM To: python-l...@python.org Subject: Re: How to get xml.etree.ElementTree not bomb on invalid characters in XML file ? Barak, Ron, 04.05.2010 09:01: I'm parsing XML files using ElementTree from xml.etree (see code below (and attached xml_parse_example.py)). However, I'm coming across input XML files (attached an example: tmp.xml) which include invalid characters, that produce the following traceback: $ python xml_parse_example.py Traceback (most recent call last): xml.parsers.expat.ExpatError: not well-formed (invalid token): line 6, column 34 I hope you are aware that this means that the input you are parsing is not XML. It's best to reject the file and tell the producers that they are writing broken output files. You should always fix the source, instead of trying to make sense out of broken input in fragile ways. I read the documentation for xml.etree.ElementTree and see that it may take an optional parser parameter, but I don't know what this parser should be - to ignore the invalid characters. Could you suggest a way to call ElementTree, so it won't bomb on these invalid characters ? No. The parser in lxml.etree has a 'recover' option that lets it try to recover from input errors, but in general, XML parsers are required to reject non well-formed input. Stefan Hi Stefan, The XML file seems to be valid XML (all XML viewers I tried were able to read it). You can verify this by trying to read the XML example I attached to the original message (attached again here). Actually, when trying to view the file with an XML viewer, these offensive characters are not shown. It's just that some of the fields include characters that the parser used by ElementTree seems to chock on. Bye, Ron. tmp_small.xml 1KViewDownload Have a look at your file with e.g. a hex editor or just Python repr() -- see below. You will see that there are four cases of taggood_data\x00garbage/tag where garbage is repeated \x00 or just random line noise or uninitialised memory. m_sanApiName1MainStorage_snap\x00\x00*SNIP*\x00\x00/ m_sanApiName1 m_detailBROLB21\x00\xeequot;\x00\x00\x00\x90,\x02G\xdc\xfb\x04P\xdc \xfb\x04\x01a\xfcgt;(\xe8\xfb\x04/m_detail It's a toss-up whether the gt; in there is accidental or a deliberate attempt to sanitise the garbage !-) m_detailAlstom\x00\x00o\x00m\x00\x00*SNIP*\x00\x00/m_detail m_sanApiVersionV5R1.28.1 [R - LA]\x00\x00*SNIP*\x00\x00/ m_sanApiVersion The garbage in the 2nd case is such as to make the initial declaration encoding=UTF-8 an outright lie and I'm curious as to how the XML parser managed to get as far as it did -- it must decode a line at a time. As already advised: it's much better to reject that rubbish outright than to attempt to repair it. Repair should be contemplated only if it's a one-off exercise AND you can't get a fixed copy from the source. And while we're on the subject of rubbish: The XML file seems to be valid XML (all XML viewers I tried were able to read it). The conclusion from that is that all XML viewers that you tried are rubbish. -- http://mail.python.org/mailman/listinfo/python-list
Re: How to get xml.etree.ElementTree not bomb on invalid characters in XML file ?
On May 5, 3:43 am, Terry Reedy tjre...@udel.edu wrote: On 5/4/2010 11:37 AM, Stefan Behnel wrote: Barak, Ron, 04.05.2010 16:11: The XML file seems to be valid XML (all XML viewers I tried were able to read it). From Internet Explorer: The XML page cannot be displayed Cannot view XML input using XSL style sheet. Please correct the error and then click the Refresh button, or try again later. An invalid character was found in text content. Error processing resource 'file:///C:/Documents and Settings... m_detailBROLB21 This is what xmllint gives me: --- $ xmllint /home/sbehnel/tmp.xml tmp.xml:6: parser error : Char 0x0 out of allowed range m_sanApiName1MainStorage_snap ^ tmp.xml:6: parser error : Premature end of data in tag m_sanApiName1 line 6 m_sanApiName1MainStorage_snap ^ tmp.xml:6: parser error : Premature end of data in tag DbHbaGroup line 5 m_sanApiName1MainStorage_snap ^ tmp.xml:6: parser error : Premature end of data in tag database line 4 m_sanApiName1MainStorage_snap ^ --- The file contains 0-bytes - clearly not XML. IE agrees. Look closer. IE *DOESN'T* agree. It has ignored the problem on line 6 and lurched on to the next problem (in line 11). If you edit that file to remove the line noise in line 11, leaving the 3 cases of multiple \x00 bytes, IE doesn't complain at all about the (invalid) \x00 bytes. -- http://mail.python.org/mailman/listinfo/python-list
Re: condition and True or False
On May 3, 9:14 am, Steven D'Aprano st...@remove-this- cybersource.com.au wrote: If it is any arbitrary object, then x and True or False is just an obfuscated way of writing bool(x). Perhaps their code predates the introduction of bools, and they have defined global constants True and False but not bool. Then they removed the True and False bindings as no longer necessary, but neglected to replace the obfuscated conversion. Or perhaps they are maintaining code that must run on any 2.X. True and False would be set up conditional on Python version. Writing expression and True or False avoids a function call. -- http://mail.python.org/mailman/listinfo/python-list
Re: csv.py sucks for Decimal
On Apr 23, 9:23 am, Phlip phlip2...@gmail.com wrote: When I use the CSV library, with QUOTE_NONNUMERIC, and when I pass in a Decimal() object, I must convert it to a string. Why must you? What unwanted effect do you observe when you don't convert it? the search for an alternate CSV module, without this bug, will indeed begin very soon! What bug? I'm pointing out that QUOTE_NONNUMERIC would work better with an option to detect numeric-as-string, and absolve it. That would allow Decimal() to do its job, unimpeded. Decimal()'s job is to create an instance of the decimal.Decimal class; how is that being impeded by anything in the csv module? -- http://mail.python.org/mailman/listinfo/python-list
[issue8308] raw_bytes.decode('cp932') -- spurious mappings
John Machin sjmac...@users.sourceforge.net added the comment: Thanks, Martin. Issue closed as far as I'm concerned. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8308 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8308] raw_bytes.decode('cp932') -- spurious mappings
New submission from John Machin sjmac...@users.sourceforge.net: According to the following references, the bytes 80, A0, FD, FE, and FF are not defined in cp932: http://msdn.microsoft.com/en-au/goglobal/cc305152.aspx http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003s=ALL However CPython 3.1.2 does this: print(ascii(b'\x80\xa0\xfd\xfe\xff'.decode('cp932'))) '\x80\uf8f0\uf8f1\uf8f2\uf8f3' (as do 2.5, 2.6. and 2.7 with the appropriate syntax) This maps 80 to U+0080 (not very useful) and maps the other 4 bytes into the Private Use Area (PUA)!! Each case should be treated as undefined/unexpected/error/... -- components: Unicode messages: 102308 nosy: sjmachin severity: normal status: open title: raw_bytes.decode('cp932') -- spurious mappings type: behavior versions: Python 2.7, Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8308 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: @ezio.melotti: Your second sentence is true, but it is not the whole truth. Bytes in the range C0-FF (whose high bit *is* set) ALSO shouldn't be considered part of the sequence because they (like 00-7F) are invalid as continuation bytes; they are either starter bytes (C2-F4) or invalid for any purpose (C0-C2 and F5-FF). Further, some bytes in the range 80-BF are NOT always valid as the first continuation byte, it depends on what starter byte they follow. The simple way of summarising the above is to say that a byte that is not a valid continuation byte in the current state (failing byte) is not a part of the current (now known to be invalid) sequence, and the decoder must try again (resync) with the failing byte. Do you agree with my example 3? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: #ezio.melotti: I'm considering valid all the bytes that start with '10...' Sorry, WRONG. Read what I wrote: Further, some bytes in the range 80-BF are NOT always valid as the first continuation byte, it depends on what starter byte they follow. Consider these sequences: (1) E0 80 80 (2) E0 9F 80. Both are invalid sequences (over-long). Specifically the first continuation byte may not be in 80-9F. Those bytes start with '10...' but they are invalid after an E0 starter byte. Please read Table 3-7. Well-Formed UTF-8 Byte Sequences and surrounding text in Unicode 5.2.0 chapter 3 (bearing in mind that CPython (for good reasons) doesn't implement the surrogates restriction, so that the special case for starter byte ED is not used in CPython). Note the other 3 special cases for the first continuation byte. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: Unicode has been frozen at 0x10. That's it. There is no such thing as a valid 5-byte or 6-byte UTF-8 string. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: @lemburg: RFC 2279 was obsoleted by RFC 3629 over 6 years ago. The standard now says 21 bits is it. F5-FF are declared to be invalid. I don't understand what you mean by supporting those possibilities. The code is correctly issuing an error message. The goal of supporting the new resyncing and FFFD-emitting rules might be better met however by throwing away the code in the default clause and instead merely setting the entries for F5-FF in the utf8_code_length array to zero. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: Patch review: Preamble: pardon my ignorance of how the codebase works, but trunk unicodeobject.c is r79494 (and allows encoding of surrogate codepoints), py3k unicodeobject.c is r79506 (and bans the surrogate caper) and I can't find the r79542 that the patch mentions ... help, please! length 2 case: 1. the loop can be hand-unrolled into oblivion. It can be entered only when s[1] 0xC0 != 0x80 (previous if test). 2. the over-long check (if (ch 0x80)) hasn't been touched. It could be removed and the entries for C0 and C1 in the utf8_code_length array set to 0. length 3 case: 1. the tests involving s[0] being 0xE0 or 0xED are misplaced. 2. the test s[0] == 0xE0 s[1] 0xA0 if not misplaced would be shadowing the over-long test (ch 0x800). It seems better to use the over-long test (with endinpos set to 1). 3. The test s[0] == 0xED relates to the surrogates caper which in the py3k version is handled in the same place as the over-long test. 4. unrolling loop: needs no loop, only 1 test ... if s[1] is good, then we know s[2] must be bad without testing it, because we start the for loop only when s[1] is bad || s[2] is bad. length 4 case: as for the len 3 case generally ... misplaced tests, F1 test shadows over-long test, F4 test shadows max value test, too many loop iterations. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: Chapter 3, page 94: As a consequence of the well-formedness conditions specified in Table 3-7, the following byte values are disallowed in UTF-8: C0–C1, F5–FF Of course they should be handled by the simple expedient of setting their length entry to zero. Why write code when there is an existing mechanism?? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: @lemburg: perhaps applying the same logic as for the other sequences is a better strategy What other sequences??? F5-FF are invalid bytes; they don't start valid sequences. What same logic?? At the start of a character, they should get the same short sharp treatment as any other non-starter byte e.g. 80 or C0. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
John Machin sjmac...@users.sourceforge.net added the comment: @lemburg: failing byte seems rather obvious: first byte that you meet that is not valid in the current state. I don't understand your explanation, especially does not have the high bit set. I think you mean is a valid starter byte. See example 3 below. Example 1: F1 80 41 42 43. F1 implies a 4-byte character. 80 is OK. 41 is not in 80-BF. It is the failing byte; high bit not set. Required action is to emit FFFD then resync on the 41, causing 0041 0042 0043 to be emitted. Total output: FFFD 0041 0042 0043. Current code emits FFFD 0043. Example 2: F1 80 FF 42 43. F1 implies a 4-byte character. 80 is OK. FF is not in 80-BF. It is the failing byte. Required action is to emit FFFD then resync on the FF. FF is not a valid starter byte, so emit FFFD, and resync on the 42, causing 0042 0043 to be emitted. Total output: FFFD FFFD 0042 0043. Current code emits FFFD 0043. Example 3: F1 80 C2 81 43. F1 implies a 4-byte character. 80 is OK. C2 is not in 80-BF. It is the failing byte. Required action is to emit FFFD then resync on the C2. C2 and 81 have the high bit set, but C2 is a valid starter byte, and remaining bytes are OK, causing 0081 0043 to be emitted. Total output: FFFD 0081 0043. Current code emits FFFD 0043. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
New submission from John Machin sjmac...@users.sourceforge.net: Unicode 5.2.0 chapter 3 (Conformance) has a new section (headed Constraints on Conversion Processes) after requirement D93. Recent Pythons e.g. 3.1.2 don't comply. Using the Unicode example: print(ascii(b\xc2\x41\x42.decode('utf8', 'replace'))) '\ufffdB' # should produce u'\ufffdAB' Resynchronisation currently starts at a position derived by considering the length implied by the start byte: print(ascii(b\xf1ABCD.decode('utf8', 'replace'))) '\ufffdD' # should produce u'\ufffdABCD'; resync should start from the *failing* byte. Notes: This applies to the 'ignore' option as well as the 'replace' option. The Unicode discussion mentions security exploits. -- messages: 101972 nosy: sjmachin severity: normal status: open title: str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0 type: behavior versions: Python 2.7, Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8271 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: subtraction is giving me a syntax error
On Mar 16, 5:43 am, Baptiste Carvello baptiste...@free.fr wrote: Joel Pendery a écrit : So I am trying to write a bit of code and a simple numerical subtraction y_diff = y_diff-H is giving me the error Syntaxerror: Non-ASCII character '\x96' in file on line 70, but no encoding declared. I would say that when you press the minus key, your operating system doesn't encode the standard (ASCII) minus character, but some fancy character, which Python cannot interpret. The likelihood that any operating system however brain-damaged and in whatever locale would provide by default a keyboard or input method that generated EN DASH when the '-' key is struck is somewhere between zero and epsilon. Already advanced theories like used a word processor instead of a programmer's editor and scraped it off the web are much more plausible. More precisely, I suspect you are unsing Windows with codepage 1252 (latin 1). Codepage 1252 is not latin1 in the generally accepted meaning of latin1 i.e. ISO-8859-1. It is a superset. MS in their wisdom or otherwise chose to use most of the otherwise absolutely wasted slots assigned to C1 control characters in latin1. With this encoding, you have 2 kinds of minus signs: the standard (45th character, in hex '\x2d') and the non-standard (150th character, in hex '\x96'). cf:http://msdn.microsoft.com/en-us/library/cc195054.aspx The above link quite correctly says that '\x96` maps to U+2013 EN DASH. EN DASH is not any kind of minus sign. Aside: the syndrome causing the problem is apparent with cp125x for x in range(9) -- http://mail.python.org/mailman/listinfo/python-list
Re: datelib pythonification
On Feb 21, 12:37 pm, alex goretoy agore...@gmail.com wrote: hello all, since I posted this last time, I've added a new function dates_diff and [SNIP] I'm rather unsure of the context of this posting ... I'm assuming that the subject datelib pythonification refers to trying to make datelib more pythonic, with which you appear to need help. Looking just at the new function (looks like a method to me) dates_diff, problems include: 1. Mostly ignores PEP-8 about spaces after commas, around operators 2. Checks types 3. Checks types using type(x) == type(y) 4. Inconsistent type checking: checks types in case of dates_diff(date1, date2) but not in case of dates_diff([date1, date2]) 5. Doesn't check for 3 or more args. 6. The 0-arg case is for what purpose? 7. The one-arg case is overkill -- if the caller has the two values in alist, all you are saving them from is the * in dates_diff(*alist) 8. Calling type(date.today()) once per 2-arg call would be a gross extravagance; calling it twice per 2-arg call is mind-boggling. 9. start,end=(targs[0][0],targs[0][1]) ... multiple constant subscripts is a code smell; this one is pongier than usual because it could easily be replaced by start, end = targs[0] Untested fix of problems 1, 3, 4, 5, 8, 9: DATE_TYPE = type(date.today()) def dates_diff(self, *targs): nargs = len(targs) if nargs == 0: return self.enddate - self.startdate if nargs == 1: arg = targs[0] if not isinstance(arg, (list, tuple)) or len(arg) != 2: raise Exception( single arg must be list or tuple of length 2) start, end = arg elif nargs == 2: start, end = targs else: raise Exception(expected 0,1, or 2 args; found %d % nargs) if isinstance(start, DATE_TYPE) and isinstance(end, DATE_TYPE): return end - start raise Exception(both values must be of type DATE_TYPE) HTH, John -- http://mail.python.org/mailman/listinfo/python-list
Re: parsing an Excel formula with the re module
On Jan 13, 7:15 pm, Paul McGuire pt...@austin.rr.com wrote: On Jan 5, 1:49 pm, Tim Chase python.l...@tim.thechases.com wrote: vsoler wrote: Hence, I need toparseExcel formulas. Can I do it by means only of re (regular expressions)? I know that for simple formulas such as =3*A7+5 it is indeed possible. What about complex for formulas that include functions, sheet names and possibly other *.xls files? Where things start getting ugly is when you have nested function calls, such as =if(Sum(A1:A25)42,Min(B1:B25), if(Sum(C1:C25)3.14, (Min(C1:C25)+3)*18,Max(B1:B25))) Regular expressions don't do well with nested parens (especially arbitrarily-nesting-depth such as are possible), so I'd suggest going for a full-blown parsing solution like pyparsing. If you have fair control over what can be contained in the formulas and you know they won't contain nested parens/functions, you might be able to formulate some sort of kinda, sorta, maybe parses some forms of formulas regexp. -tkc This might give the OP a running start: Unfortunately this will blow up after only a few paces; see below ... from pyparsing import (CaselessKeyword, Suppress, Word, alphas, alphanums, nums, Optional, Group, oneOf, Forward, Regex, operatorPrecedence, opAssoc, dblQuotedString) test1 = =3*A7+5 test2 = =3*Sheet1!$A$7+5 test2a ==3*'Sheet 1'!$A$7+5 test2b ==3*'O''Reilly''s sheet'!$A$7+5 test3 = =if(Sum(A1:A25)42,Min(B1:B25), \ if(Sum(C1:C25)3.14, (Min(C1:C25)+3)*18,Max(B1:B25))) Many functions can take a variable number of args and they are not restricted to cell references e.g. test3a = =sum(a1:a25,10,min(b1,c2,d3)) The arg separator is comma or semicolon depending on the locale ... a parser should accept either. EQ,EXCL,LPAR,RPAR,COLON,COMMA,DOLLAR = map(Suppress, '=!():,$') sheetRef = Word(alphas, alphanums) colRef = Optional(DOLLAR) + Word(alphas,max=2) rowRef = Optional(DOLLAR) + Word(nums) cellRef = Group(Optional(sheetRef + EXCL)(sheet) + colRef(col) + rowRef(row)) cellRange = (Group(cellRef(start) + COLON + cellRef(end)) (range) | cellRef ) expr = Forward() COMPARISON_OP = oneOf( = = = != ) condExpr = expr + COMPARISON_OP + expr ifFunc = (CaselessKeyword(if) + LPAR + Group(condExpr)(condition) + that should be any expression; at run-time it expects a boolean (TRUE or FALSE) or a number (0 means false, non-0 means true). Text causes a #VALUE! error. Trying to subdivide expressions into conditional / numeric /text just won't work. COMMA + expr(if_true) + COMMA + expr(if_false) + RPAR) statFunc = lambda name : CaselessKeyword(name) + LPAR + cellRange + RPAR sumFunc = statFunc(sum) minFunc = statFunc(min) maxFunc = statFunc(max) aveFunc = statFunc(ave) funcCall = ifFunc | sumFunc | minFunc | maxFunc | aveFunc multOp = oneOf(* /) addOp = oneOf(+ -) needs power op ^ numericLiteral = Regex(r\-?\d+(\.\d+)?) Sorry, that - in there is a unary minus operator. What about 1e23 ? operand = numericLiteral | funcCall | cellRange | cellRef arithExpr = operatorPrecedence(operand, [ (multOp, 2, opAssoc.LEFT), (addOp, 2, opAssoc.LEFT), ]) textOperand = dblQuotedString | cellRef textExpr = operatorPrecedence(textOperand, [ ('', 2, opAssoc.LEFT), ]) Excel evaluates excessively permissively, and the punters are definitely not known for self-restraint. The above just won't work: 2.3 4.5 produces text 2.34.5, while 2.3 + 4.5 produces number 6.8. -- http://mail.python.org/mailman/listinfo/python-list
Re: parsing an Excel formula with the re module
On Jan 14, 2:05 pm, Gabriel Genellina gagsl-...@yahoo.com.ar wrote: En Wed, 13 Jan 2010 05:15:52 -0300, Paul McGuire pt...@austin.rr.com escribió: vsoler wrote: Hence, I need toparseExcel formulas. Can I do it by means only of re (regular expressions)? This might give the OP a running start: from pyparsing import (CaselessKeyword, Suppress, ... Did you build those parsing rules just by common sense, or following some actual specification? Leave your common sense with the barkeep when you enter the Excel saloon; it is likely to be a hindrance. The specification is what Excel does. -- http://mail.python.org/mailman/listinfo/python-list
Re: parsing an Excel formula with the re module
On Jan 15, 3:41 pm, Paul McGuire pt...@austin.rr.com wrote: I never represented that this parser would handle any and all Excel formulas! But I should hope the basic structure of a pyparsing solution might help the OP add some of the other features you cited, if necessary. It's actually pretty common to take an incremental approach in making such a parser, and so here are some of the changes that you would need to make based on the deficiencies you pointed out: functions can have a variable number of arguments, of any kind of expression - statFunc = lambda name : CaselessKeyword(name) + LPAR + delimitedList (expr) + RPAR sheet name could also be a quoted string - sheetRef = Word(alphas, alphanums) | QuotedString(',escQuote='') add boolean literal support - boolLiteral = oneOf(TRUE FALSE) - operand = numericLiteral | funcCall | boolLiteral | cellRange | cellRef or a string literal ... you seem to have ignored the significant point that the binary operators don't have narrow type requirements of their args (2.3 4.5 produces text 2.34.5, while 2.3 + 4.5 produces number 6.8); your attempt to enforce particular types for args at compile-time is erroneous OVER-engineering. -- http://mail.python.org/mailman/listinfo/python-list
Re: parsing an Excel formula with the re module
On 12/01/2010 6:26 PM, Chris Withers wrote: John Machin wrote: The xlwt package (of which I am the maintainer) has a lexer and parser for a largish subset of the syntax ... see http://pypi.python.org/pypi/xlwt xlrd, no? A facility in xlrd to decompile Excel formula bytecode into a text formula is currently *under discussion*. The OP was planning to dig the formula text out using COM then parse the formula text looking for cell references and appeared to have a rather simplistic view of the ease of parsing Excel formula text -- that's why I pointed him at those facilities (existing, released, proven in the field) in xlwt. -- http://mail.python.org/mailman/listinfo/python-list
Re: What is built-in method sub
On Jan 12, 7:30 am, Jeremy jlcon...@gmail.com wrote: On Jan 11, 1:15 pm, Diez B. Roggisch de...@nospam.web.de wrote: Jeremy schrieb: On Jan 11, 12:54 pm, Carl Banks pavlovevide...@gmail.com wrote: On Jan 11, 11:20 am, Jeremy jlcon...@gmail.com wrote: I just profiled one of my Python scripts and discovered that 99% of the time was spent in {built-in method sub} What is this function and is there a way to optimize it? I'm guessing this is re.sub (or, more likely, a method sub of an internal object that is called by re.sub). If all your script does is to make a bunch of regexp substitutions, then spending 99% of the time in this function might be reasonable. Optimize your regexps to improve performance. (We can help you if you care to share any.) If my guess is wrong, you'll have to be more specific about what your sctipt does, and maybe share the profile printout or something. Carl Banks Your guess is correct. I had forgotten that I was using that function. I am using the re.sub command to remove trailing whitespace from lines in a text file. The commands I use are copied below. If you have any suggestions on how they could be improved, I would love to know. Thanks, Jeremy lines = self._outfile.readlines() self._outfile.close() line = string.join(lines) if self.removeWS: # Remove trailing white space on each line trailingPattern = '(\S*)\ +?\n' line = re.sub(trailingPattern, '\\1\n', line) line = line.rstrip()? Diez Yep. I was trying to reinvent the wheel. I just remove the trailing whitespace before joining the lines. Actually you don't do that. Your regex has three components: (1) (\S*) zero or more occurrences of not-whitespace (2) \ +? one or more (non-greedy) occurrences of SPACE (3) \n a newline Component (2) should be \s+? In any case this is a round-about way of doing it. Try writing a regex that does it simply: replace trailing whitespace by an empty string. Another problem with your approach: it doesn't work if the line is not terminated by \n -- this is quite possible if the lines are being read from a file. A wise person once said: Re-inventing the wheel is often accompanied by forgetting to re-invent the axle. -- http://mail.python.org/mailman/listinfo/python-list
Re: Porblem with xlutils/xlrd/xlwt
On Jan 10, 8:51 pm, pp parul.pande...@gmail.com wrote: On Jan 9, 8:23 am, John Machin sjmac...@lexicon.net wrote: On Jan 9, 9:56 pm, pp parul.pande...@gmail.com wrote: On Jan 9, 3:52 am, Jon Clements jon...@googlemail.com wrote: On Jan 9, 10:44 am, pp parul.pande...@gmail.com wrote: On Jan 9, 3:42 am, Jon Clements jon...@googlemail.com wrote: On Jan 9, 10:24 am, pp parul.pande...@gmail.com wrote: yeah all my versions are latest fromhttp://www.python-excel.org. just checked!! How did you check? You didn't answer this question. what could be the problem? Does rb = xlrd.open_workbook('somesheet.xls', on_demand=True) work by itself? Yes it does. The problem is with line: wb = copy(rb) here I am getting the error: AttributeError: 'Book' object has no attribute 'on_demand' Please replace the first 4 lines of your script by these 6 lines: import xlrd assert xlrd.__VERSION__ == 0.7.1 from xlwt import easyxf from xlutils.copy import copy rb = xlrd.open_workbook( 'source.xls',formatting_info=True, on_demand=False) and run it again. Please copy all the output and paste it into your response. This time when I ran the code sent by you I got the following results:I am using ipython for running the code. AssertionError Traceback (most recent call last) /home/parul/CODES/copy_1.py in module() 1 2 import xlrd 3 assert xlrd.__VERSION__ == 0.7.1 4 from xlwt import easyxf 5 from xlutils.copy import copy 6 rb = xlrd.open_workbook('source.xls',formatting_info=True, on_demand=False) AssertionError: WARNING: Failure executing file: copy_1.py Your traceback appears to show an AssertionError from an import statement. We could do without an extra layer of noise in the channel; please consider giving ipython the flick (for debug purposes, at least) and use Python to run your script from the shell prompt. Change the second line to read: print xlrd.__VERSION__ I used www.python-excel.org to get xlrd and xlwt .. so they are latest versions. Let's concentrate on xlrd. I presume that means that you clicked on the xlrd Download link which took you to http://pypi.python.org/pypi/xlrd from which you can download the latest version of the package. That page has xlrd 0.7.1 in a relatively large font at the top. You would have been presented with options to download one of these xlrd-0.7.1.tar.gz xlrd-0.7.1.win32.exe xlrd-0.7.1.zip (each uploaded on 2009-06-01). Which one did you download, and then what did you do with it? Or perhaps you ignored those and read further down to Download link which took you to an out-of-date page but you didn't notice the 0.6.1 in large bold type at the top nor the Page last updated on 11 June 2007 at the bottom nor the 0.6.1 in the name of the file that you downloaded ... sorry about that; I've smacked the webmaster about the chops :-) Cheers, John -- http://mail.python.org/mailman/listinfo/python-list
Re: How to get many places of pi from Machin's Equation?
On Jan 9, 10:31 pm, Richard D. Moores rdmoo...@gmail.com wrote: Machin's Equation is 4 arctan (1/5) - arctan(1/239) = pi/4 Using Python 3.1 and the math module: from math import atan, pi pi 3.141592653589793 (4*atan(.2) - atan(1/239))*4 3.1415926535897936 (4*atan(.2) - atan(1/239))*4 == pi False abs((4*atan(.2) - atan(1/239))*4) - pi .01 False abs((4*atan(.2) - atan(1/239))*4) - pi .0001 False abs((4*atan(.2) - atan(1/239))*4) - pi .001 True Is there a way in Python 3.1 to calculate pi to greater accuracy using Machin's Equation? Even to an arbitrary number of places? Considering that my namesake calculated pi to 100 decimal places with the computational equipment available in 1706 (i.e. not much), I'd bet you London to a brick that Python (any version from 0.1 onwards) could be used to simulate his calculations to any reasonable number of places. So my answers to your questions are yes and yes. Suggestion: search_the_fantastic_web(machin pi python) -- http://mail.python.org/mailman/listinfo/python-list
Re: Porblem with xlutils/xlrd/xlwt
On Jan 9, 9:56 pm, pp parul.pande...@gmail.com wrote: On Jan 9, 3:52 am, Jon Clements jon...@googlemail.com wrote: On Jan 9, 10:44 am, pp parul.pande...@gmail.com wrote: On Jan 9, 3:42 am, Jon Clements jon...@googlemail.com wrote: On Jan 9, 10:24 am, pp parul.pande...@gmail.com wrote: yeah all my versions are latest fromhttp://www.python-excel.org. just checked!! How did you check? what could be the problem? Does rb = xlrd.open_workbook('somesheet.xls', on_demand=True) work by itself? Yes it does. The problem is with line: wb = copy(rb) here I am getting the error: AttributeError: 'Book' object has no attribute 'on_demand' Please replace the first 4 lines of your script by these 6 lines: import xlrd assert xlrd.__VERSION__ == 0.7.1 from xlwt import easyxf from xlutils.copy import copy rb = xlrd.open_workbook( 'source.xls',formatting_info=True, on_demand=False) and run it again. Please copy all the output and paste it into your response. -- http://mail.python.org/mailman/listinfo/python-list
Re: Astronomy--Programs to Compute Siderial Time?
On Jan 7, 2:40 pm, W. eWatson wolftra...@invalid.com wrote: John Machin wrote: What you have been reading is the Internal maintenance specification (large font, near the top of the page) for the module. The xml file is the source of the docs, not meant to be user-legible. What is it used for? The maintainer of the module processes the xml file with some script or other to create the user-legible docs. Do I need it? No. -- http://mail.python.org/mailman/listinfo/python-list
Re: How do I access what's in this module?
On Jan 8, 12:21 pm, Fencer no.i.d...@want.mail.from.spammers.com wrote: Hello, look at this lxml documentation page:http://codespeak.net/lxml/api/index.html That's for getting details about an object once you know what object you need to use to do what. In the meantime, consider reading the tutorial and executing some of the examples: http://codespeak.net/lxml/tutorial.html How do I access the functions and variables listed? I tried from lxml.etree import ElementTree and the import itself seems to pass without complaint by the python interpreter but I can't seem to access anything in ElementTree, not the functions or variables. What is the proper way to import that module? For example: from lxml.etree import ElementTree ElementTree.dump(None) Traceback (most recent call last): File console, line 1, in module lxml.etree is a module. ElementTree is effectively a class. The error message that you omitted to show us might have given you a clue. To save keystrokes you may like to try from lxml import etree as ET and thereafter refer to the module as ET | from lxml import etree as ET | type(ET) | type 'module' | type(ET.ElementTree) | type 'builtin_function_or_method' | help(ET.ElementTree) | Help on built-in function ElementTree in module lxml.etree: | | ElementTree(...) | ElementTree(element=None, file=None, parser=None) | | ElementTree wrapper class. Also, can I access those items that begin with an underscore if I get the import sorted? Using pommy slang like sorted in an IT context has the potential to confuse your transatlantic correspondents :-) Can access? Yes. Should access? The usual Python convention is that an object whose name begins with an underscore should be accessed only via a documented interface (or, at your own risk, if you think you know what you are doing). HTH, John -- http://mail.python.org/mailman/listinfo/python-list
Re: How do I access what's in this module?
On Jan 8, 2:45 pm, Fencer no.i.d...@want.mail.from.spammers.com wrote: On 2010-01-08 04:40, John Machin wrote: For example: from lxml.etree import ElementTree ElementTree.dump(None) Traceback (most recent call last): File console, line 1, inmodule lxml.etree is a module. ElementTree is effectively a class. The error message that you omitted to show us might have given you a clue. But I did show the error message? It's just above what you just wrote. I try to include all relevant information in my posts. excerpt Traceback (most recent call last): File console, line 1, in module Also, can I access those items ... /excerpt Error message should appear after line starting with File. Above excerpt taken from google groups; identical to what shows in http://news.gmane.org/gmane.comp.python.general ... what are you looking at? With Windows XP and Python 2.5.4 I get: Traceback (most recent call last): File stdin, line 1, in module AttributeError: 'builtin_function_or_method' object has no attribute 'dump' It turns out I no longer want to access anything in there but I thank you for your information nontheless. You're welcome -- the advice on _methods is portable :-) -- http://mail.python.org/mailman/listinfo/python-list
Re: TypeError
On Jan 7, 3:29 am, MRAB pyt...@mrabarnett.plus.com wrote: Victor Subervi wrote: ValueError: unsupported format character '(' (0x28) at index 54 args = (unsupported format character '(' (0x28) at index 54,) Apparently that character is a file separator, which I presume is an invisible character. I tried retyping the area in question, but with no avail (threw same error). Please advise. Complete code follows. OP is barking up the wrong tree. file separator has ordinal 28 DECIMAL. Correct tree contains '(' (left parenthesis, ordinal 0x28 (HEX)) as the error message says. -- http://mail.python.org/mailman/listinfo/python-list
Re: 3 byte network ordered int, How To ?
On Jan 7, 5:33 am, Matthew Barnett mrabarn...@mrabarnett.plus.com wrote: mudit tuli wrote: For a single byte, struct.pack('B',int) For two bytes, struct.pack('H',int) what if I want three bytes ? Four bytes and then discard the most-significant byte: struct.pack('I', int)[ : -1] AARRGGHH! network ordering is BIGendian, struct.pack('. is LITTLEendian -- http://mail.python.org/mailman/listinfo/python-list
Re: parsing an Excel formula with the re module
On Jan 6, 6:54 am, vsoler vicente.so...@gmail.com wrote: On 5 ene, 20:21, vsoler vicente.so...@gmail.com wrote: On 5 ene, 20:05, Mensanator mensana...@aol.com wrote: On Jan 5, 12:35 pm, MRAB pyt...@mrabarnett.plus.com wrote: vsoler wrote: Hello, I am acessing an Excel file by means of Win 32 COM technology. For a given cell, I am able to read its formula. I want to make a map of how cells reference one another, how different sheets reference one another, how workbooks reference one another, etc. Hence, I need to parse Excel formulas. Can I do it by means only of re (regular expressions)? I know that for simple formulas such as =3*A7+5 it is indeed possible. What about complex for formulas that include functions, sheet names and possibly other *.xls files? For example =Book1!A5+8 should be parsed into [=,Book1, !, A5,+,8] Can anybody help? Any suggestions? Do you mean how or do you really mean whether, ie, get a list of the other cells that are referred to by a certain cell, for example, =3*A7+5 should give [A7] and =Book1!A5+8 should give [Book1!A5] Ok, although Book1 would be the default name of a workbook, with default worksheets labeled Sheet1. Sheet2, etc. If I had a worksheet named Sheety that wanted to reference a cell on Sheetx OF THE SAME WORKBOOK, it would be =Sheet2!A7. If the reference was to a completely different workbook (say Book1 with worksheets labeled Sheet1, Sheet2) then the cell might have =[Book1]Sheet1!A7. And don't forget the $'s! You may see =[Book1]Sheet1!$A$7. Yes, Mensanator, but... what re should I use? I'm looking for the re statement. No doubt you can help! Thank you. Let me give you an example: import re re.split(([^0-9]), 123+456*/) [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’] I find it excellent that one single statement is able to do a lexical analysis of an expression! That is NOT lexical analysis. If the expression contains variables, such as A12 or B9, I can try another re expression. Which one should I use? And if my expression contains parenthesis? And the sin() function? You need a proper lexical analysis, followed by a parser. What you are trying to do can NOT be accomplished in any generality with a single regex. The Excel formula syntax has several tricky bits. E.g. IIRC whether TAX09 is a (macro) name or a cell reference depends on what version of Excel you are targetting but if it appears like TAX09! A1:B2 then it's a sheet name. The xlwt package (of which I am the maintainer) has a lexer and parser for a largish subset of the syntax ... see http://pypi.python.org/pypi/xlwt -- http://mail.python.org/mailman/listinfo/python-list
Re: TypeError
On Jan 7, 11:14 am, John Machin sjmac...@lexicon.net wrote: On Jan 7, 3:29 am, MRAB pyt...@mrabarnett.plus.com wrote: Victor Subervi wrote: ValueError: unsupported format character '(' (0x28) at index 54 args = (unsupported format character '(' (0x28) at index 54,) Apparently that character is a file separator, which I presume is an invisible character. I tried retyping the area in question, but with no avail (threw same error). Please advise. Complete code follows. OP is barking up the wrong tree. file separator has ordinal 28 DECIMAL. Correct tree contains '(' (left parenthesis, ordinal 0x28 (HEX)) as the error message says. It took a bit of mucking about to get an example of that error message (without reading the Python source code): | anything = object() \| foo%( % anything Traceback (most recent call last): File stdin, line 1, in module TypeError: format requires a mapping | foo%( % {} Traceback (most recent call last): File stdin, line 1, in module ValueError: incomplete format key | foo%2( % anything Traceback (most recent call last): File stdin, line 1, in module ValueError: unsupported format character '(' (0x28) at index 5 FWIW, the OP's message subject is TypeError but the reported message contains ValueError ... possibly indicative of code that first builds a format string (incorrectly) and then uses it with error messages that can vary from run to run depending on exactly what was stuffed into the format string. I note that in the code shown there are examples of building an SQL query where the table name is concocted at runtime via the % operator ... key phrases: bad database design (one table per store!), SQL injection attack A proper traceback would be very nice ... at this stage it's not certain what was the line of source that triggers the exception. -- http://mail.python.org/mailman/listinfo/python-list
Re: Astronomy--Programs to Compute Siderial Time?
On Jan 7, 11:40 am, W. eWatson wolftra...@invalid.com wrote: W. eWatson wrote: Is there a smallish Python library of basic astronomical functions? There are a number of large such libraries that are crammed with excessive functions not needed for common calculations. It looks like I've entered a new era in my knowledge of Python. Mild curiosity: this would be a wonderful outcome, but what makes it look so? I found a module somewhat like I want, siderial.py. You can see an intro to it at http://infohost.nmt.edu/tcc/help/lang/python/examples/sidereal/ims//. It appears that I can get the code for it through section 1.2, near the bottom. I scooped it siderial.py up, and placed it in a corresponding file of the same name and type via NotePad. However, there is a xml file below it. I know little about it. I thought maybe I could do the same, but Notepad didn't like some characters in it. As I understand Python doc files are useful. So how do I get this done, and where do I put the files? The file you need is sidereal.py, not your twice-mentioned siderial.py (the existence of which on the referenced website is doubtful). What you have been reading is the Internal maintenance specification (large font, near the top of the page) for the module. The xml file is the source of the docs, not meant to be user-legible. A very tiny amount of googling sidereal.py (quotes included) leads to the user documentation at http://infohost.nmt.edu/tcc/help/lang/python/examples/sidereal/ Where do you put the files? Well, we're now down to only one file, sidereal.py, and you put it wherever you'd put any other module that you'd like to call ... if there's only going to be one caller, put it in the same directory as that caller's code. More generally, drop it in YOUR_PYTHON_INSTALL_DIR/Lib/site-packages -- http://mail.python.org/mailman/listinfo/python-list
Re: TypeError
On Jan 7, 1:38 pm, Steve Holden st...@holdenweb.com wrote: John Machin wrote: [...] I note that in the code shown there are examples of building an SQL query where the table name is concocted at runtime via the % operator ... key phrases: bad database design (one table per store!), SQL injection attack I'm not trying to defend the code overall, but most databases won't let you parameterize the table or column names, just the data values. That's correct, and that's presumably why the OP is constructing whole SQL statements on the fly e.g. cursor.execute('select max(ID) from %sCustomerData;' % store) What is the reason for but in but most databases won't ...? What are you rebutting? Let me try again: One table per store is bad design. The implementation of that bad design may use: cursor.execute('select max(ID) from %sCustomerData;' % store) or (if available) cursor.execute('select max(ID) from ?CustomerData;', (store, )) but the implementation means is irrelevant. -- http://mail.python.org/mailman/listinfo/python-list
Re: Significant whitespace
On Jan 2, 10:29 am, Roy Smith r...@panix.com wrote: To address your question more directly, here's a couple of ways Fortran treated whitespace which would surprise the current crop of Java/PHP/Python/Ruby programmers: 1) Line numbers (i.e. the things you could GOTO to) were in column 2-7 (column 1 was reserved for a comment indicator). This is not quite significant whitespace, it's more like significant indentation. That would also surprise former FORTRAN programmers (who rarely referred to the language as Fortran). A comment was signified by a C in col 1. Otherwise cols 1-5 were used for statement labels (the things you could GOTO), col 6 for a statement continuation indicator, cols 7-72 for statement text, and cols 73-80 for card sequence numbers. -- http://mail.python.org/mailman/listinfo/python-list
Re: creating ZIP files on the cheap
On Dec 24, 7:34 am, samwyse samw...@gmail.com wrote: I've got an app that's creating Open Office docs; if you don't know, these are actually ZIP files with a different extension. In my case, like many other people, I generating from boilerplate, so only one component (content.xml) of my ZIP file will ever change. Instead of creating the entire ZIP file each time, what is the cheapest way to accomplish my goal? I'd kind-of like to just write the first part of the file as a binary blob, then write my bit, then write most of the table of contents as another blob, and finally write a TOC entry for my bit. Has anyone ever done anything like this? Thanks. Option 1: set up a file that contains everything except the content.xml. Then for each new file: copy the empty file, open the copy with zipfile (mode 'a') and write your content.xml. This at least is understandable and maintainable. Option 2 (recommended): insert some timing apparatus into your script. How much time is taken by the template stuff? Is it worth chancing your arm on getting the binary blob stuff correct? Is it maintainable? I.e. pretend that the next person to maintain your code knows where you live and owns a chainsaw. -- http://mail.python.org/mailman/listinfo/python-list
Re: dictionary with tuple keys
Ben Finney ben+python at benfinney.id.au writes: In this case, I'll use ‘itertools.groupby’ to make a new sequence of keys and values, and then extract the keys and values actually wanted. Ah, yes, Zawinski revisited ... itertools.groupby is the new regex :-) Certainly it might be clearer if written as one or more loops, instead of iterators. But I find the above relatively clear, and using the built-in iterator objects will likely make for a less buggy implementation. Relative clarity like relative beauty is in the eye of the beholder, and few parents have ugly children :-) The problem with itertools.groupby is that unlike SQL's GROUP BY it needs sorted input. The OP's requirement (however interpreted) can be met without sorting. Your interpretation can be implemented simply: from collections import defaultdict result = defaultdict(list) for key, value in foo.iteritems(): result[key[:2]].append(value) -- http://mail.python.org/mailman/listinfo/python-list