from:"John Machin"

Re: python math problem

2013-02-15 Thread John Machin

On Feb 16, 6:39 am, Kene Meniru kene.men...@illom.org wrote:

 x = (math.sin(math.radians(angle)) * length)
 y = (math.cos(math.radians(angle)) * length)

A suggestion about coding style:

from math import sin, cos, radians # etc etc
x = sin(radians(angle)) * length
y = cos(radians(angle)) * length

... easier to write, easier to read.
-- 
http://mail.python.org/mailman/listinfo/python-list

[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-29 Thread John Machin


John Machin sjmac...@lexicon.net added the comment:

@Ezio: Comparison of the behaviour of \letter inside/outside character classes 
is irrelevant. The rules for inside can be expressed simply as:

1. Letters dDsSwW are special; they represent categories as documented, and do 
in fact have a similar meaning outside character classes.

2. Otherwise normal Python rules for backslash escapes in string literals 
should be followed. This means automatically that \a - \x07, \A - A, \b - 
backspace, \B - B, \z - z and \Z - Z.

@Georg: No need to read the source, just read my initial posting: It's compiled 
as a zero-length matcher (at) inside a character class (in) i.e. a 
nonsense, then at runtime the illegality is deliberately ignored.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-29 Thread John Machin


John Machin sjmac...@lexicon.net added the comment:

Whoops: normal Python rules for backslash escapes should have had a note but 
revert to the C behaviour of stripping the \ from unrecognised escapes which 
is what re appears to do in its own \ handling.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-28 Thread John Machin


New submission from John Machin sjmac...@lexicon.net:

Expected behaviour illustrated using C:

 import re
 re.findall(r'[\C]', 'CCC')
['C', 'C', 'C']
 re.compile(r'[\C]', 128)
literal 67
_sre.SRE_Pattern object at 0x01FC6E78
 re.compile(r'C', 128)
literal 67
_sre.SRE_Pattern object at 0x01FC6F08

Incorrect behaviour exhibited by A (and by B and 
Z):

 re.findall(r'[\A]', 'AAA')
[]
 re.compile(r'A', 128)
literal 65
_sre.SRE_Pattern object at 0x01FC6F98
 re.compile(r'[\A]', 128)
in
  at at_beginning_string  FAIL 
_sre.SRE_Pattern object at 0x01FDF0B0


Also there is no self-checking at runtime; the switch default has a comment to 
the effect that nothing can be done, so pretend that the unknown opcode matched 
nothing. Zen?

--
messages: 152194
nosy: sjmachin
priority: normal
severity: normal
status: open
title: re pattern r[\A] should work like A but matches nothing. Ditto B and 
Z.
type: behavior
versions: Python 2.7, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13899] re pattern r[\A] should work like A but matches nothing. Ditto B and Z.

2012-01-28 Thread John Machin


John Machin sjmac...@lexicon.net added the comment:

@ezio: Of course the context is inside a character class.

I expect r'[\b]' to act like r'\b' aka r'\x08' aka backspace because (1) that 
is the treatment applied to all other C-like control char escapes (2) the docs 
say so explicitly: Inside a character range, \b represents the backspace 
character, for compatibility with Python’s string literals.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13899
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13782] xml.etree.ElementTree: Element.append doesn't type-check its argument

2012-01-13 Thread John Machin


New submission from John Machin sjmac...@lexicon.net:

import xml.etree.ElementTree as et
node = et.Element('x')
node.append(not_an_Element_instance)

2.7 and 3.2 produce no complaint at all.
2.6 and 3.1 produce an AssertionError.

However cElementTree in all 4 versions produces a TypeError.

Please fix 2.7 and 3.2 ElementTree to produce a TypeError.

--
messages: 151210
nosy: sjmachin
priority: normal
severity: normal
status: open
title: xml.etree.ElementTree: Element.append doesn't type-check its argument
type: behavior
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13782
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Re: unicode by default

2011-05-12 Thread John Machin

On Thu, May 12, 2011 4:31 pm, harrismh777 wrote:


 So, the UTF-16 UTF-32 is INTERNAL only, for Python

NO. See one of my previous messages. UTF-16 and UTF-32, like UTF-8 are
encodings for the EXTERNAL representation of Unicode characters in byte
streams.

 I also was not aware that UTF-8 chars could be up to six(6) byes long
 from left to right.

It could be, once upon a time in ISO faerieland, when it was thought that
Unicode could grow to 2**32 codepoints. However ISO and the Unicode
consortium have agreed that 17 planes is the utter max, and accordingly a
valid UTF-8 byte sequence can be no longer than 4 bytes ... see below

 chr(17 * 65536)
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: chr() arg not in range(0x11)
 chr(17 * 65536 - 1)
'\U0010'
 _.encode('utf8')
b'\xf4\x8f\xbf\xbf'
 b'\xf5\x8f\xbf\xbf'.decode('utf8')
Traceback (most recent call last):
  File stdin, line 1, in module
  File C:\python32\lib\encodings\utf_8.py, line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf5 in position 0:
invalid start byte


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode by default

2011-05-11 Thread John Machin

On Thu, May 12, 2011 8:51 am, harrismh777 wrote:
 Is it true that if I am
 working without using bytes sequences that I will not need to care about
 the encoding anyway, unless of course I need to specify a unicode code
 point?

Quite the contrary.

(1) You cannot work without using bytes sequences. Files are byte
sequences. Web communication is in bytes. You need to (know / assume / be
able to extract / guess) the input encoding. You need to encode your
output using an encoding that is expected by the consumer (or use an
output method that will do it for you).

(2) You don't need to use bytes to specify a Unicode code point. Just use
an escape sequence e.g. \u0404 is a Cyrillic character.



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: urllib2 request with binary file as payload

2011-05-11 Thread John Machin

On Thu, May 12, 2011 10:20 am, Michiel Sikma wrote:
 Hi there,
 I made a small script implementing a part of Youtube's API that allows
 you to upload videos. It's pretty straightforward and uses urllib2.
 The script was written for Python 2.6, but the server I'm going to use
 it on only has 2.5 (and I can't update it right now, unfortunately).
 It seems that one vital thing doesn't work in 2.5's urllib2:

 --

 data = open(video['filename'], 'rb')

 opener = urllib2.build_opener(urllib2.HTTPHandler)
 req = urllib2.Request(settings['upload_location'], data, {
   'Host': 'uploads.gdata.youtube.com',
   'Content-Type': video['type'],
   'Content-Length': '%d' % os.path.getsize(video['filename'])
 })
 req.get_method = lambda: 'PUT'
 url = opener.open(req)

 --

 This works just fine on 2.6:
 send: open file 'file.mp4', mode 'rb' at 0x1005db580
 sendIng a read()able

 However, on 2.5 it refuses:
 Traceback (most recent call last):
[snip]
 TypeError: sendall() argument 1 must be string or read-only buffer, not
 file

I don't use this stuff, just curious. But I can read docs. Quoting from
the 2.6.6 docs:


class urllib2.Request(url[, data][, headers][, origin_req_host][,
unverifiable])
This class is an abstraction of a URL request.

url should be a string containing a valid URL.

data may be a string specifying additional data to send to the server, or
None if no such data is needed. Currently HTTP requests are the only ones
that use data; the HTTP request will be a POST instead of a GET when the
data parameter is provided. data should be a buffer in the standard
application/x-www-form-urlencoded format. The urllib.urlencode() function
takes a mapping or sequence of 2-tuples and returns a string in this
format.


2.6 is expecting a string, according to the above. No mention of file.
Moreover it expects the data to be urlencoded. 2.7.1 docs say the same
thing. Are you sure you have shown the code that worked with 2.6?


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode by default

2011-05-11 Thread John Machin

On Thu, May 12, 2011 11:22 am, harrismh777 wrote:
 John Machin wrote:
 (1) You cannot work without using bytes sequences. Files are byte
 sequences. Web communication is in bytes. You need to (know / assume /
 be
 able to extract / guess) the input encoding. You need to encode your
 output using an encoding that is expected by the consumer (or use an
 output method that will do it for you).

 (2) You don't need to use bytes to specify a Unicode code point. Just
 use
 an escape sequence e.g. \u0404 is a Cyrillic character.


 Thanks John.  In reverse order, I understand point (2). I'm less clear
 on point (1).

 If I generate a string of characters that I presume to be ascii/utf-8
 (no \u0404 type characters)
 and write them to a file (stdout) how does
 default encoding affect that file.by default..?   I'm not seeing that
 there is anything unusual going on...

About characters that I presume to be ascii/utf-8 (no \u0404 type
characters): All Unicode characters (including U+0404) are encodable in
bytes using UTF-8.

The result of sys.stdout.write(unicode_characters) to a TERMINAL depends
mostly on sys.stdout.encoding. This is likely to be UTF-8 on a
linux/OSX/platform. On a typical American / Western European /[former]
colonies Windows box, this is likely to be cp850 on a Command Prompt
window, and cp1252 in IDLE.

UTF-8: All Unicode characters are encodable in UTF-8. Only problem arises
if the terminal can't render the character -- you'll get spaces or blobs
or boxes with hex digits in them or nothing.

Windows (Command Prompt window): only a small subset of characters can be
encoded in e.g. cp850; anything else causes an exception.

Windows (IDLE): ignores sys.stdout.encoding and renders the characters
itself. Same outcome as *x/UTF-8 above.

If you write directly (or sys.stdout is redirected) to a FILE, the default
encoding is obtained by sys.getdefaultencoding() and is AFAIK ascii unless
the machine's site.py has been fiddled with to make it UTF-8 or something
else.

   If I open the file with vi?  If
 I open the file with gedit?  emacs?

Any editor will have a default encoding; if that doesn't match the file
encoding, you have a (hopefully obvious) problem if the editor doesn't
detect the mismatch. Consult your editor's docs or HTFF1K.

 Another question... in mail I'm receiving many small blocks that look
 like sprites with four small hex codes, scattered about the mail...
 mostly punctuation, maybe?   ... guessing, are these unicode code
 points,

yes

 and if so what is the best way to 'guess' the encoding?

google(chardet) or rummage through the mail headers (but 4 hex digits in
a box are a symptom of inability to render, not necessarily caused by an
incorrect decoding)

 ... is
 it coded in the stream somewhere...protocol?

Should be.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode by default

2011-05-11 Thread John Machin

On Thu, May 12, 2011 1:44 pm, harrismh777 wrote:
 By
 default it looks like Python3 is writing output with UTF-8 as default...
 and I thought that by default Python3 was using either UTF-16 or UTF-32.
 So, I'm confused here...  also, I used the character sequence \u00A3
 which I thought was UTF-16... but Python3 changed my intent to  'c2a3'
 which is the normal UTF-8...

Python uses either a 16-bit or a 32-bit INTERNAL representation of Unicode
code points. Those NN bits have nothing to do with the UTF-NN encodings,
which can be used to encode the codepoints as byte sequences for EXTERNAL
purposes. In your case, UTF-8 has been used as it is the default encoding
on your platform.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode by default

2011-05-11 Thread John Machin

On Thu, May 12, 2011 2:14 pm, Benjamin Kaplan wrote:

 If the file you're writing to doesn't specify an encoding, Python will
 default to locale.getdefaultencoding(),

No such attribute. Perhaps you mean locale.getpreferredencoding()



-- 
http://mail.python.org/mailman/listinfo/python-list

codecs.open() doesn't handle platform-specific line terminator

2011-05-09 Thread John Machin

According to the 3.2 docs
(http://docs.python.org/py3k/library/codecs.html#codecs.open),

Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-bit
values. This means that no automatic conversion of b'\n' is done on
reading and writing.

The first point is that one would NOT expect conversion of b'\n' anyway.
One expects '\n' - os.sep.encode(the_encoding) on writing and vice versa
on reading.

The second point is that there is no such restriction with the built-in
open(), which appears to work as expected, doing (e.g. Windows, UTF-16LE)
'\n' - b'\r\x00\n\x00' when writing and vice versa on reading, and not
striking out when thrown curve balls like '\u0a0a'.

Why is codecs.open() different? What does encodings using 8-bit values
mean? What data loss?



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: codec for UTF-8 with BOM

2011-05-02 Thread John Machin

On Monday, 2 May 2011 19:47:45 UTC+10, Chris Rebert  wrote:
 On Mon, May 2, 2011 at 1:34 AM, Ulrich Eckhardt
 ulrich@dominolaser.com wrote:

 The correct name, as you found below and as is corroborated by the
 webpage, seems to be utf_8_sig:
  uFOøbar.encode('utf_8_sig')
 '\xef\xbb\xbfFO\xc3\xb8bar'

To complete the picture, decoding swallows the BOM:

  '\xef\xbb\xbfFO\xc3\xb8bar'.decode('utf_8_sig')
 u'FO\xf8bar'

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Snowball to Python compiler

2011-04-21 Thread John Machin


On Friday, April 22, 2011 8:05:37 AM UTC+10, Matt Chaput wrote:

 I'm looking for some code that will take a Snowball program and compile 
 it into a Python script. Or, less ideally, a Snowball interpreter 
 written in Python.
 
 (http://snowball.tartarus.org/)

If anyone has done such things they are not advertising them in the usual 
places.

A third (more-than-) possible solution: google(python snowball); the first 
page of results has at least 3 hits referring to Python wrappers for Snowball.
-- 
http://mail.python.org/mailman/listinfo/python-list

[issue7198] Extraneous newlines with csv.writer on Windows

2011-03-19 Thread John Machin


John Machin sjmac...@lexicon.net added the comment:

Can somebody please review my doc patch submitted 2 months ago?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7198
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7198] Extraneous newlines with csv.writer on Windows

2011-03-19 Thread John Machin


John Machin sjmac...@lexicon.net added the comment:

Skip, The changes that I suggested have NOT been made. Please re-read the doc 
page you pointed to. The writer paragraph does NOT mention that newline='' is 
required when writing. The writer examples do NOT include newline=''. The 
examples have NOT been enhanced by using a with statement and not using space 
as an example delimiter.

PLEASE RE-OPEN THIS ISSUE.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7198
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10954] No warning for csv.writer API change

2011-03-19 Thread John Machin


John Machin sjmac...@lexicon.net added the comment:

The doc patch proposed by Skip on 2001-01-24 for this bug has NOT been 
reviewed, let alone applied. Sibling bug #7198 has been closed in error. 
Somebody please help.

--
nosy: +skip.montanaro

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10954
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10954] No warning for csv.writer API change

2011-03-19 Thread John Machin


John Machin sjmac...@lexicon.net added the comment:

Terry, I have already made the point the docs bug is #7198. This is the 
meaningful-exception bug.

My review is changing 'should' to 'must' is not very useful without a 
consistent interpretation of what those two words mean and without any 
enforcement of use of newline=''.

I was patient enough to wait 2 months for a review of my doc patch on #7198. 

My issues are that the 3.2 docs have NOT been changed (have a look at the 
csv.writer paragraph: do you see the word newline anywhere??), #7198 has been 
closed without any action, and BOTH of these two issues (which have in effect 
been lurking about since Python 3.0.0alpha) appear to have been abandoned.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10954
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Re: getting text out of an xml string

2011-03-05 Thread John Machin

On Mar 5, 8:57 am, JT jeff.temp...@gmail.com wrote:
 On Mar 4, 9:30 pm, John Machin sjmac...@lexicon.net wrote:

  Your data has been FUABARred (the first A being for Almost) -- the
  \u3c00 and \u3e00 were once  and  respectively. You will

 Hi John,

    I realized that a few minutes after posting.  I then realized that
 I could just extract the text between the stuff with \u3c00 xml
 preserve etc, which I did; it was good enough since it was a one-off
 affair, I had to convert a to-do list from one program to another.
 Thanks for replying and sorry for the noise :-)

Next time you need to extract some data from an xml file, please (for
your own good) don't do whatever you did in that code -- note that the
unicode equivalent of  is u\u003c, NOT u\u3c00; I wasn't joking
when I said it had been FU.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: getting text out of an xml string

2011-03-04 Thread John Machin

On Mar 5, 6:53 am, JT jeff.temp...@gmail.com wrote:
 Yo,

  So I have almost convinced a small program to do what I want it to
 do.  One thing remains (at least, one thing I know of at the moment):
 I am converting xml to some other format, and there are strings in the
 xml like this.

 The python:

 elif v == content:
                 print content, a.childNodes[0].nodeValue

 what gets printed:

 content \u3c00note xml:space=preserve\u3e00see forms in red inbox
 \u3c00/note\u3e00

 what this should say is see forms in red inbox because that is what
 the the program whose xml file i am trying to convert, properly
 displays, because that is what I typed in oh so long ago.  So my
 question to you is, how can I convert this enhanced version to a
 normal string?  Esp. since there is this xml:space=preserve thing
 in there ... I suspect the rest is just some unicode issue.  Thanks
 for any help.

        J long time no post T

Your data has been FUABARred (the first A being for Almost) -- the
\u3c00 and \u3e00 were once  and  respectively. You will
need to show (a) a snippet of the xml file including the data that has
the problem (b) the code that you have written, cut down to a small
script that is runnable and displays the problem. Tell us what version
of Python you are running, on what OS.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: 2to3 chokes on bad character

2011-02-24 Thread John Machin

On Feb 23, 7:47 pm, Frank Millman fr...@chagford.com wrote:
 Hi all

 I don't know if this counts as a bug in 2to3.py, but when I ran it on my
 program directory it crashed, with a traceback but without any indication of
 which file caused the problem.

[traceback snipped]

 UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 5055:
 invalid start byte

 On investigation, I found some funny characters in docstrings that I
 copy/pasted from a pdf file.

 Here are the details if they are of any use. Oddly, I found two instances
 where characters 'look like' apostrophes when viewed in my text editor, but
 one of them was accepted by 2to3 and the other caused the crash.

 The one that was accepted consists of three bytes - 226, 128, 153 (as
 reported by python 2.6)

How did you incite it to report like that? Just use repr(the_3_bytes).
It'll show up as '\xe2\x80\x99'.

  from unicodedata import name as ucname
  ''.join(chr(i) for i in (226, 128, 153)).decode('utf8')
 u'\u2019'
  ucname(_)
 'RIGHT SINGLE QUOTATION MARK'

What you have there is the UTF-8 representation of U+2019 RIGHT SINGLE
QUOTATION MARK. That's OK.

 or 226, 8364, 8482 (as reported by python3.2).

Sorry, but you have instructed Python 3.2 to commit a nonsense:

  [ord(chr(i).decode('cp1252')) for i in (226, 128, 153)]
 [226, 8364, 8482]

In other words, you have taken that 3-byte sequence, decoded each byte
separately using cp1252 (aka the usual suspect) into a meaningless
Unicode character and printed its ordinal.

In Python 3, don't use repr(); it has undergone the MHTP
transformation and become ascii().


 The one that crashed consists of a single byte - 146 (python 2.6) or 8217
 (python 3.2).

  chr(146).decode('cp1252')
 u'\u2019'
  hex(8217)
 '0x2019'


 The issue is not that 2to3 should handle this correctly, but that it should
 give a more informative error message to the unsuspecting user.

Your Python 2.x code should be TESTED before you poke 2to3 at it. In
this case just trying to run or import the offending code file would
have given an informative syntax error (you have declared the .py file
to be encoded in UTF-8 but it's not).

 BTW I have always waited for 'final releases' before upgrading in the past,
 but this makes me realise the importance of checking out the beta versions -
 I will do so in future.

I'm willing to bet that the same would happen with Python 3.1, if a
3.1 to 3.2 upgrade is what you are talking about



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: 2to3 chokes on bad character

2011-02-24 Thread John Machin

On Feb 25, 12:00 am, Peter Otten __pete...@web.de wrote:
 John Machin wrote:

  Your Python 2.x code should be TESTED before you poke 2to3 at it. In
  this case just trying to run or import the offending code file would
  have given an informative syntax error (you have declared the .py file
  to be encoded in UTF-8 but it's not).

 The problem is that Python 2.x accepts arbitrary bytes in string constants.

Ummm ... isn't that a bug? According to section 2.1.4 of the Python
2.7.1 Language Reference Manual: The encoding is used for all
lexical analysis, in particular to find the end of a string, and to
interpret the contents of Unicode literals. String literals are
converted to Unicode for syntactical analysis, then converted back to
their original encoding before interpretation starts ...

How do you reconcile used for all lexical analysis and String
literals are converted to Unicode for syntactical analysis with the
actual (astonishing to me) behaviour?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: py3k: converting int to bytes

2011-02-24 Thread John Machin

On Feb 25, 4:39 am, Terry Reedy wrote:

 Note: an as yet undocumented feature of bytes (at least in Py3) is that
 bytes(count) == bytes()*count == b'\x00'*count.

Python 3.1.3 docs for bytes() say same constructor args as for
bytearray(); this says about the source parameter: If it is an
integer, the array will have that size and will be initialized with
null bytes
-- 
http://mail.python.org/mailman/listinfo/python-list

[issue11204] re module: strange behaviour of space inside {m, n}

2011-02-12 Thread John Machin


New submission from John Machin sjmac...@lexicon.net:

A pattern like rb{1,3}\Z matches b, bb, and bbb, as expected. There is 
no documentation of the behaviour of rb{1, 3}\Z -- it matches the LITERAL 
TEXT b{1, 3} in normal mode and b{1,3} in verbose mode.

# paste the following at the interactive prompt:
pat = rb{1, 3}\Z
bool(re.match(pat, bb)) # False
bool(re.match(pat, b{1, 3})) # True
bool(re.match(pat, bb, re.VERBOSE)) # False
bool(re.match(pat, b{1, 3}, re.VERBOSE)) # False
bool(re.match(pat, b{1,3}, re.VERBOSE)) # True

Suggested change, in decreasing order of preference:
(1) Ignore leading/trailing spaces when parsing the m and n components of {m,n}
(2) Raise an exception if the exact syntax is not followed
(3) Document the existing behaviour

Note: deliberately matching the literal text would be expected to be done by 
escaping the left brace:

pat2 = rb\{1, 3}\Z
bool(re.match(pat2, b{1, 3})) # True

and this is not prevented by the suggested changes.

--
messages: 128472
nosy: sjmachin
priority: normal
severity: normal
status: open
title: re module: strange behaviour of space inside {m, n}
versions: Python 2.7, Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11204
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Re: python crash problem

2011-02-05 Thread John Machin

On Feb 3, 8:21 am, Terry Reedy tjre...@udel.edu wrote:
 On 2/2/2011 2:19 PM, Yelena wrote:

.

 When having a problem with a 3rd party module, not part of the stdlib,
 you should give a source.
    http://sourceforge.net/projects/dbfpy/
 This appears to be a compiled extension. Nearly always, when Python
 crashes running such, it is a problem with the extension. So you
 probably need to direct your question to the author or a project mailing
 list if there is one.

It has always appeared to me to be a pure-Python package. There are
no .c or .pyx files in the latest source (.tgz) distribution. The
Windows installer installs only files whose extensions match
py[co]?.
-- 
http://mail.python.org/mailman/listinfo/python-list

[issue10954] No warning for csv.writer API change

2011-01-23 Thread John Machin


John Machin sjmac...@lexicon.net added the comment:

Skip, the docs bug is #7198. This is the meaningful-exception bug.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10954
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10954] No warning for csv.writer API change

2011-01-22 Thread John Machin


John Machin sjmac...@lexicon.net added the comment:

I don't understand Changing csv api is a feature request that could only 
happen in 3.3. This is NOT a request for an API change. Lennert's point is 
that an API change was made in 3.0 as compared with 2.6 but there is no fixer 
in 2to3. What is requested is for csv.reader/writer to give more meaningful 
error messages for valid 2.x code that has been put through fixer-less 2to3.

The name of the arg is newline. newlines is an attribute that stores what 
was actually found in universal newlines mode.

newline='' is needed on input for the same reason that binary mode is required 
in 2.x: \r and \n may quite validly appear in data, inside a quoted field, and 
must not be treated as part of a row separator.

newline='' is needed on output for the same reason that binary mode is required 
in 2.x: any \n in the data and any \n in the caller's chosen line terminator 
must be preserved from being changed to os.linesep (e.g. \r\n).

newline is not available as an attribute of the _io.TextIOWrapper object 
created by open('xxx.csv', 'w', newline=''); is exposing this possible?

--
versions: +Python 3.2 -Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10954
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10954] No warning for csv.writer API change

2011-01-20 Thread John Machin


John Machin sjmac...@lexicon.net added the comment:

I believe that both csv.reader and csv.writer should fail with a meaningful 
message if mode is binary or newline is not ''

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10954
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7198] Extraneous newlines with csv.writer on Windows

2011-01-19 Thread John Machin


John Machin sjmac...@lexicon.net added the comment:

docpatch for 3.x csv docs:

In the csv.writer docs, insert the sentence If csvfile is a file object, it 
should be opened with newline=''. immediately after the sentence csvfile can 
be any object with a write() method.

In the closely-following example, change the open call from open('eggs.csv', 
'w') to open('eggs.csv', 'w', newline='').

In section 13.1.5 Examples, there are 2 reader cases and 1 writer case that 
likewise need inserting , newline='' in the open call.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7198
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Re: Interesting bug

2011-01-03 Thread John Machin

On Jan 2, 12:22 am, Daniel Fetchinson fetchin...@googlemail.com
wrote:

 An AI bot is playing a trick on us.

Yes, it appears that the mystery is solved: Mark V. Shaney is alive
and well and living in Bangalore :-)
-- 
http://mail.python.org/mailman/listinfo/python-list

[issue7198] Extraneous newlines with csv.writer on Windows

2010-12-26 Thread John Machin


John Machin sjmac...@users.sourceforge.net added the comment:

Skip, I'm WRITING, not reading.. Please read the 3.1 documentation for 
csv.writer. It does NOT mention newline='', and neither does the example. 
Please fix.

Other problems with the examples: (1) They encourage a bad habit (open inside 
the call to reader/writer); good practice is to retain the reference to the 
file handle (preferably with a with statement) so that it can be closed 
properly. (2) delimiter=' ' is very unrealistic.

The documentation for both 2.x and 3.x should be much more explicit about what 
is needed in open() for csv to work properly and portably:

2.x read: use mode='rb' -- otherwise fail on Windows
2.x write: use mode='wb' -- otherwise fail on Windows
3.x read: use newline='' -- otherwise fail unconditionally(?)
3.x write: use newline='' -- otherwise fail on Windows

The 2.7 documentation says If csvfile is a file object, it must be opened 
with the 'b' flag on platforms where that makes a difference ... in my 
experience, people are left asking what platforms? what difference?; Windows 
should be mentioned explicitly.

--
versions: +Python 2.7, Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7198
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7198] Extraneous newlines with csv.writer on Windows

2010-12-23 Thread John Machin


John Machin sjmac...@users.sourceforge.net added the comment:

Please re-open this. The binary/text mode problem still exists with Python 3.X 
on Windows. Quite simply, there is no option available to the caller to open 
the output file in binary mode, because the module is throwing str objects at 
the file. The module's idea of taking control in the default case appears to 
be to write \r\n which is then processed by the Windows runtime and becomes 
\r\r\n.

Python 3.1.3 (r313:86834, Nov 27 2010, 18:30:53) [MSC v.1500 32 bit (Intel)] on 
win32
Type help, copyright, credits or license for more information.
 import csv
 f = open('terminator31.csv', 'w')
 row = ['foo', None, 3.14159]
 writer = csv.writer(f)
 writer.writerow(row)
14
 writer.writerow(row)
14
 f.close()
 open('terminator31.csv', 'rb').read()
b'foo,,3.14159\r\r\nfoo,,3.14159\r\r\n'


And it's not just a row terminator problem; newlines embedded in fields are 
likewise expanded to \r\n by the Windows runtime.

--
nosy: +sjmachin
versions: +Python 3.1 -Python 2.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7198
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Re: Modifying an existing excel spreadsheet

2010-12-22 Thread John Machin

On Dec 21, 8:56 am, Ed Keith e_...@yahoo.com wrote:
 I have a user supplied 'template' Excel spreadsheet. I need to create a new 
 excel spreadsheet based on the supplied template, with data filled in.

 I found the tools 
 herehttp://www.python-excel.org/, andhttp://sourceforge.net/projects/pyexcelerator/.
  I have been trying to use the former, since the latter seems to be devoid of 
 documentation (not even any docstrings).

pyExcelerator is abandonware. Use xlwt instead; it's a bug-fixed/
maintained/enhanced fork of pyExcelerator

Read the tutorial that you'll find mentioned on http://www.python-excel.org

Join the google group that's also mentioned there; look at past
questions, ask some more, ...
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Ensuring symmetry in difflib.SequenceMatcher

2010-11-24 Thread John Machin

On Nov 24, 8:43 pm, Peter Otten __pete...@web.de wrote:
 John Yeung wrote:
  I'm generally pleased with difflib.SequenceMatcher:  It's probably not
  the best available string matcher out there, but it's in the standard
  library and I've seen worse in the wild.  One thing that kind of
  bothers me is that it's sensitive to which argument you pick as seq1
  and which you pick as seq2:

  Python 2.6.1 (r261:67517, Dec  4 2008, 16:51:00) [MSC v.1500 32 bit
  (Intel)] on
  win32
  Type help, copyright, credits or license for more information.
  import difflib
  difflib.SequenceMatcher(None, 'BYRD', 'BRADY').ratio()
  0.2
  difflib.SequenceMatcher(None, 'BRADY', 'BYRD').ratio()
  0.3

  Is this a bug?  I am guessing the algorithm is implemented correctly,
  and that it's just an inherent property of the algorithm used.  It's
  certainly not what I'd call a desirably property.  Are there any
  simple adjustments that can be made without sacrificing (too much)
  performance?

 def symmetric_ratio(a, b, S=difflib.SequenceMatcher):
     return (S(None, a, b).ratio() + S(None, b, a).ratio())/2.0

 I'm expecting 50% performance loss ;)

 Seriously, have you tried to calculate the ratio with realistic data?
 Without looking into the source I would expect the two ratios to get more
 similar.

 Peter

Surnames are extremely realistic data. The OP should consider using
Levenshtein distance, which is symmetric. A good (non-naive)
implementation should be much faster than difflib.

ratio = 1.0 - levenshtein(a, b) / float(max(len(a), len(b)))
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Raw Unicode docstring

2010-11-16 Thread John Machin

On Nov 17, 9:34 am, Alexander Kapps alex.ka...@web.de wrote:

   urScheißt\nderBär\nim Wald?

Nicht ohne eine Genehmigung von der Umwelt Erhaltung Abteilung.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: A bug for raw string literals in Py3k?

2010-10-31 Thread John Machin

On Oct 31, 11:23 pm, Yingjie Lan lany...@yahoo.com wrote:
   So I suppose this is a bug?

  It's not, see

 http://docs.python.org/py3k/reference/lexical_analysis.html#literals

  # Specifically, a raw string cannot end in a single backslash

 Thanks! That looks weird to me ... doesn't this contradict with:

 All backslashes in raw string literals are interpreted literally.
 (seehttp://docs.python.org/release/3.0.1/whatsnew/3.0.html):

All backslashes in syntactically-correct raw string literals are
interpreted literally.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Runtime error

2010-10-29 Thread John Machin

On Oct 29, 3:26 am, Sebastian python-maill...@elygor.de wrote:
 Hi all,

 I am new to python and I don't know how to fix this error. I only try to
 execute python (or a cgi script) and I get an ouptu like

 [...]
 'import site' failed; traceback:
 Traceback (most recent call last):
 File /usr/lib/python2.6/site.py, line 513, in module
 main()
 File /usr/lib/python2.6/site.py, line 496, in main
 known_paths = addsitepackages(known_paths)
 File /usr/lib/python2.6/site.py, line 288, in addsitepackages
 addsitedir(sitedir, known_paths)
 File /usr/lib/python2.6/site.py, line 185, in addsitedir
 addpackage(sitedir, name, known_paths)
 File /usr/lib/python2.6/site.py, line 155, in addpackage
 exec line
 File string, line 1, in module
 File /usr/lib/python2.6/site.py, line 185, in addsitedir
 addpackage(sitedir, name, known_paths)
 File /usr/lib/python2.6/site.py, line 155, in addpackage
 exec line
 File string, line 1, in module
 File /usr/lib/python2.6/site.py, line 185, in addsitedir
 addpackage(sitedir, name, known_paths)
 File /usr/lib/python2.6/site.py, line 155, in addpackage
 exec line
 [...]
 File /usr/lib/python2.6/site.py, line 185, in addsitedir
 addpackage(sitedir, name, known_paths)
 File /usr/lib/python2.6/site.py, line 155, in addpackage
 exec line
 File string, line 1, in module
 File /usr/lib/python2.6/site.py, line 175, in addsitedir
 sitedir, sitedircase = makepath(sitedir)
 File /usr/lib/python2.6/site.py, line 76, in makepath
 dir = os.path.abspath(os.path.join(*paths))
 RuntimeError: maximum recursion depth exceeded

 What is going wrong with my python install? What do I have to change?

Reading the code for site.py, it looks like you may have a .pth file
that is self-referential (or a chain or 2 or more .pth files!) that
are sending you around in a loop. If you are having trouble
determining what files are involved, you could put some print
statements in your site.py at about lines 155 and 185 (which appear to
be in the loop, according to the traceback) or step through it with a
debugger.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Get alternative char name with unicodedata.name() if no formal one defined

2010-10-14 Thread John Machin

On Oct 14, 7:25 pm, Dirk Wallenstein hals...@t-online.de wrote:
 Hi,
 I'd like to get control char names for the first 32 codepoints, but they
 apparently only have an alias and no official name. Is there a way to
 get the alternative character name (alias) in Python?


AFAIK there is no programatically-available list of those names. Try
something like:

name = unicodedata.name(x, some_default) if x  u\x1f else (NULL,
etc etc, UNIT SEPARATOR)[ord(x)]

or similarly with a prepared dict:

C0_CONTROL_NAMES = {
u\x00: NULL,
# etc
u\x1f: UNIT SEPARATOR,
}

name = unicodedata.name(x, some_default) if x  u\x1f else
C0_CONTROL_NAMES[x]
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Wrong default endianess in utf-16 and utf-32 !?

2010-10-12 Thread John Machin

jmfauth wxjmfauth at gmail.com writes:

 When an endianess is not specified, (BE, LE, unmarked forms),
 the Unicode Consortium specifies, the default byte serialization
 should be big-endian.
 
 See http://www.unicode.org/faq//utf_bom.html
 Q: Which of the UTFs do I need to support?
 and
 Q: Why do some of the UTFs have a BE or LE in their label,
 such as UTF-16LE?

Sometimes it is necessary to read right to the end of an answer:

Q: Why do some of the UTFs have a BE or LE in their label, such as UTF-16LE?

A: [snip] the unmarked form uses big-endian byte serialization by default, but
may include a byte order mark at the beginning to indicate the actual byte
serialization used.

-- 
http://mail.python.org/mailman/listinfo/python-list

cp936 uses gbk codec, doesn't decode `\x80` as U+20AC EURO SIGN

2010-10-10 Thread John Machin


| '\x80'.decode('cp936')
Traceback (most recent call last):
  File stdin, line 1, in module
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80
 in position 0: incomplete multibyte sequence

However:

Retrieved 2010-10-10 from
http://www.unicode.org/Public
/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT

#Name: cp936 to Unicode table
#Unicode version: 2.0
#Table version: 2.01
#Table format:  Format A
#Date:  1/7/2000
#
#Contact:   shawn.ste...@microsoft.com
...
0x7F0x007F  #DELETE
0x800x20AC  #EURO SIGN
0x81#DBCS LEAD BYTE

Retrieved 2010-10-10 from
http://msdn.microsoft.com/en-us/goglobal/cc305153.aspx

Windows Codepage 936
[pictorial mapping; shows 80 mapping to 20AC]

Retrieved 2010-10-10 from
http://demo.icu-project.org
/icu-bin/convexp?conv=windows-936-2000s=ALL

[pictorial mapping for converter
windows-936-2000 with
aliases including GBK, CP936, MS936;
shows 80 mapping to 20AC]

So Microsoft appears to think that
cp936 includes the euro,
and the ICU project seem to think that GBK and cp936
both include the euro.

A couple of questions:

Is this a bug or a shrug?

Where can one find the mapping tables
from which the various CJK codecs are derived?




-- 
http://mail.python.org/mailman/listinfo/python-list

[issue9980] str(float) failure

2010-09-29 Thread John Machin


Changes by John Machin sjmac...@users.sourceforge.net:


--
nosy: +sjmachin

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9980
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

strange results from sys.version

2010-09-27 Thread John Machin

I am trying to help a user of my xlrd package who says he is getting 
anomalous results on his work computer but not on his home computer.


Attempts to reproduce his alleged problem in a verifiable manner on his 
work computer have failed, so far ... the only meaning difference in 
script output is in sys.version


User (work): sys.version: 2.7 (r27:82500, Aug 23 2010, 17:18:21) etc
Me : sys.version: 2.7 (r27:82525, Jul  4 2010, 09:01:59) etc

I have just now downloaded the Windows x86 msi from www.python.org and 
reinstalled it on another computer. It gives the same result as on my 
primary computer (above).


User result looks whacked: lower patch number, later date. 
www.python.org says Python 2.7 was released on July 3rd, 2010.


Is it possible that the work computer is using an unofficial release? 
What other possibilities are there?


Thanks in advance ...
--
http://mail.python.org/mailman/listinfo/python-list

Re: Detect string has non-ASCII chars without checking each char?

2010-08-22 Thread John Machin

On Aug 22, 5:07 pm, Michel Claveau -
MVPenleverlesx_xx...@xmclavxeaux.com.invalid wrote:
 Hi!

 Another way :

   # -*- coding: utf-8 -*-

   import unicodedata

   def test_ascii(struni):
       strasc=unicodedata.normalize('NFD', struni).encode('ascii','replace')
       if len(struni)==len(strasc):
          return True
       else:
          return False

   print test_ascii(uabcde)
   print test_ascii(uabcdê)

-1

Try your code with uabcd\xa1 ... it says it's ASCII.

Suggestions:
   test_ascii = lambda s: len(s.decode('ascii', 'ignore')) == len(s)
or
   test_ascii = lambda s: all(c  u'\x80' for c in s)
or
   use try/except

Also:
if a == b:
return True
else:
return False
is a horribly bloated way of writing
return a == b


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Detect string has non-ASCII chars without checking each char?

2010-08-22 Thread John Machin

On Aug 23, 1:10 am, Michel Claveau -
MVPenleverlesx_xx...@xmclavxeaux.com.invalid wrote:
 Re !

  Try your code with uabcd\xa1 ... it says it's ASCII.

 Ah?  in my computer, it say False

Perhaps your computer has a problem. Mine does this with both Python
2.7 and Python 2.3 (which introduced the unicodedata.normalize
function):

   import unicodedata
   t1 = uabcd\xa1
   t2 = unicodedata.normalize('NFD', t1)
   t3 = t2.encode('ascii', 'replace')
   [t1, t2, t3]
  [u'abcd\xa1', u'abcd\xa1', 'abcd?']
   map(len, _)
  [5, 5, 5]
  
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: re.sub and variables

2010-08-12 Thread John Machin

On Aug 13, 7:33 am, fuglyducky fuglydu...@gmail.com wrote:
 On Aug 12, 2:06 pm, fuglyducky fuglydu...@gmail.com wrote:



  I have a function that I am attempting to call from another file. I am
  attempting to replace a string using re.sub with another string. The
  problem is that the second string is a variable. When I get the
  output, it shows the variable name rather than the value. Is there any
  way to pass a variable into a regex?

  If not, is there any other way to do this? I need to be able to dump
  the variable value into the replacement string.

  For what it's worth this is an XML file so I'm not afraid to use some
  sort of XML library but they look fairly complicated for a newbie like
  me.

  Also, this is py3.1.2 is that makes any difference.

  Thanks!!!

  #

  import random
  import re
  import datetime

  def pop_time(some_string, start_time):
      global that_string

      rand_time = random.randint(0, 30)
      delta_time = datetime.timedelta(seconds=rand_time)

      for line in some_string:
          end_time = delta_time + start_time
          new_string = re.sub(thisstring, thisstring\\end_time,
  some_string)
          start_time = end_time

      return new_string

 Disregard...I finally figured out how to use string.replace. That
 appears to work perfectly. Still...if anyone happens to know about
 passing a variable into a regex that would be great.

Instead of

new_string = re.sub(
thisstring, thisstring\\end_time, some_string)

you probably meant to use something like

new_string = re.sub(
thisstring, thisstring + \\ + end_time, some_string)

string.replace is antique and deprecated. You should be using methods
of str objects, not functions in the string module.

   s1 = foobarzot
   s2 = s1.replace(bar, -)
   s2
  'foo-zot'
  
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Ascii to Unicode.

2010-07-30 Thread John Machin

On Jul 30, 4:18 am, Carey Tilden carey.til...@gmail.com wrote:
 In this case, you've been able to determine the
 correct encoding (latin-1) for those errant bytes, so the file itself
 is thus known to be in that encoding.

The most probably correct encoding is, as already stated, and agreed
by the OP to be, cp1252.


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Where is the help page for re.MatchObject?

2010-07-28 Thread John Machin

On Jul 28, 1:26 pm, Peng Yu pengyu...@gmail.com wrote:
 I know the library reference webpage for re.MatchObject is 
 athttp://docs.python.org/library/re.html#re.MatchObject

 But I don't find such a help page in python help(). Does anybody know
 how to get it in help()?

Yes, but it doesn't tell you very much:

|  import re
|  help(re.match('x', 'x'))
| Help on SRE_Match object:
|
| class SRE_Match(__builtin__.object)
|
| 
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Ascii to Unicode.

2010-07-28 Thread John Machin

On Jul 29, 4:32 am, Joe Goldthwaite j...@goldthwaites.com wrote:
 Hi,

 I've got an Ascii file with some latin characters. Specifically \xe1 and
 \xfc.  I'm trying to import it into a Postgresql database that's running in
 Unicode mode. The Unicode converter chokes on those two characters.

 I could just manually replace those to characters with something valid but
 if any other invalid characters show up in later versions of the file, I'd
 like to handle them correctly.

 I've been playing with the Unicode stuff and I found out that I could
 convert both those characters correctly using the latin1 encoder like this;

         import unicodedata

         s = '\xe1\xfc'
         print unicode(s,'latin1')

 The above works.  When I try to convert my file however, I still get an
 error;

         import unicodedata

         input = file('ascii.csv', 'r')
         output = file('unicode.csv','w')

         for line in input.xreadlines():
                 output.write(unicode(line,'latin1'))

         input.close()
         output.close()

 Traceback (most recent call last):
   File C:\Users\jgold\CloudmartFiles\UnicodeTest.py, line 10, in __main__
     output.write(unicode(line,'latin1'))
 UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position
 295: ordinal not in range(128)

 I'm stuck using Python 2.4.4 which may be handling the strings differently
 depending on if they're in the program or coming from the file.  I just
 haven't been able to figure out how to get the Unicode conversion working
 from the file data.

 Can anyone explain what is going on?

Hello hello ... you are running on Windows; the likelihood that you
actually have data encoded in latin1 is very very small. Follow MRAB's
answer but replace latin1 by cp1252.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: newb

2010-07-27 Thread John Machin

On Jul 27, 9:07 pm, whitey m...@here.com wrote:
 hi all. am totally new to python and was wondering if there are any
 newsgroups that are there specifically for beginners. i have bought a
 book for $2 called learn to program using python by alan gauld.
 starting to read it but it was written in 2001. presuming that the
 commands and info would still be valid? any websites or books that are a
 must for beginners? any input would be much appreciated...cheers

2001 is rather old. Most of what you'll want is on the web. See
http://wiki.python.org/moin/BeginnersGuide
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode error

2010-07-24 Thread John Machin

dirknbr dirknbr at gmail.com writes:

 I have kind of developped this but obviously it's not nice, any better
 ideas?
 
 try:
 text=texts[i]
 text=text.encode('latin-1')
 text=text.encode('utf-8')
 except:
 text=' '

As Steven has pointed out, if the .encode('latin-1') works, the result is thrown
away. This would be very fortunate. 

It appears that your goal was to encode the text in latin1 if possible,
otherwise in UTF-8, with no indication of which encoding was used. Your second
posting confirmed that you were doing this in a loop, ending up with the
possibility that your output file would have records with mixed encodings.

Did you consider what a programmer writing code to READ your output file would
need to do, e.g. attempt to decode each record as UTF-8 with a fall-back to
latin1??? Did you consider what would be the result of sending a stream of
mixed-encoding text to a display device?

As already advised, the short answer to avoid all of that hassle; just encode in
UTF-8.



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: SyntaxError not honoured in list comprehension?

2010-07-04 Thread John Machin

On Jul 5, 1:08 am, Thomas Jollans tho...@jollans.com wrote:
 On 07/04/2010 03:49 PM, jmfauth wrote:
    File psi last command, line 1
      print9.0
             ^
  SyntaxError: invalid syntax

 somewhat strange, yes.

There are two tokens, print9 (a name) and .0 (a float constant) --
looks like SyntaxError to me.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python 2.7 released

2010-07-04 Thread John Machin

On Jul 5, 12:27 pm, Martineau ggrp2.20.martin...@dfgh.net wrote:
 On Jul 4, 8:34 am, Benjamin Peterson benja...@python.org wrote:



  On behalf of the Python development team, I'm jocund to announce the second
  release candidate of Python 2.7.

  Python 2.7 will be the last major version in the 2.x series. However, it 
  will
  also have an extended period of bugfix maintenance.

  2.7 includes many features that were first released in Python 3.1. The 
  faster io
  module, the new nested with statement syntax, improved float repr, set 
  literals,
  dictionary views, and the memoryview object have been backported from 3.1. 
  Other
  features include an ordered dictionary implementation, unittests 
  improvements, a
  new sysconfig module, auto-numbering of fields in the str/unicode format 
  method,
  and support for ttk Tile in Tkinter.  For a more extensive list of changes 
  in
  2.7, seehttp://doc.python.org/dev/whatsnew/2.7.htmlorMisc/NEWS in the Python
  distribution.

  To download Python 2.7 visit:

       http://www.python.org/download/releases/2.7/

  2.7 documentation can be found at:

       http://docs.python.org/2.7/

  This is a production release and should be suitable for all libraries and
  applications.  Please report any bugs you find, so they can be fixed in the 
  next
  maintenance releases.  The bug tracker is at:

       http://bugs.python.org/

  Enjoy!

  --
  Benjamin Peterson
  Release Manager
  benjamin at python.org
  (on behalf of the entire python-dev team and 2.7's contributors)

 Benjamin (or anyone else), do you know where I can get the Compiled
 Windows Help file -- python27.chm -- for this release? In the past
 I've been able to download it from the Python web site, but have been
 unable to locate it anywhere for this new release. I can't build it
 myself because I don't have the Microsoft HTML help file compiler.

 Thanks in advance.

If you have a Windows box, download the .msi installer for Python 2.7
and install it. The chm file will be in C:\Python27\Doc (if you choose
the default installation directory). Otherwise ask a friendly local
Windows user for a copy.
-- 
http://mail.python.org/mailman/listinfo/python-list

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-07-03 Thread John Machin


John Machin sjmac...@users.sourceforge.net added the comment:

About the E0 80 81 61 problem: my interpretation is that you are correct, the 
80 is not valid in the current state (start byte == E0), so no look-ahead, 
three FFFDs must be issued followed by 0061. I don't really care about issuing 
too many FFFDs so long as it doesn't munch valid sequences. However it would be 
very nice to get an explicit message about surrogates.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Re: escape character / csv module

2010-07-02 Thread John Machin

On Jul 2, 6:04 am, MRAB pyt...@mrabarnett.plus.com wrote:


 The csv module imports from _csv, which suggests to me that there's code
 written in C which thinks that the \x00 is a NUL terminator, so it's a
 bug, although it's very unusual to want to write characters like \x00
 to a CSV file, and I wouldn't be surprised if this is the first time
 it's been noticed! :-)

Don't be surprised, read the documentation (http://docs.python.org/
library/csv.html#module-csv):

Note

This version of the csv module doesn’t support Unicode input. Also,
there are currently some issues regarding ASCII NUL characters.
Accordingly, all input should be UTF-8 or printable ASCII to be safe;
see the examples in section Examples. These restrictions will be
removed in the future.

The NUL/printable part of the note has been there since the module was
introduced in Python 2.3.0.




-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Handling text lines from files with some (few) starnge chars

2010-06-05 Thread John Machin

On Jun 6, 12:14 pm, MRAB pyt...@mrabarnett.plus.com wrote:
 Paulo da Silva wrote:
  Em 06-06-2010 00:41, Chris Rebert escreveu:
  On Sat, Jun 5, 2010 at 4:03 PM, Paulo da Silva
  psdasilva.nos...@netcabonospam.pt wrote:
  ...

  Specify the encoding of the text when opening the file using the
  `encoding` parameter. For Windows-1252 for example:

  your_file = open(path/to/file.ext, 'r', encoding='cp1252')

  OK! This fixes my current problem. I used encoding=iso-8859-15. This
  is how my text files are encoded.
  But what about a more general case where the encoding of the text file
  is unknown? Is there anything like autodetect?

  
 An encoding like 'cp1252' uses 1 byte/character, but so does 'cp1250'.
 How could you tell which was the correct encoding?

 Well, if the file contained words in a certain language and some of the
 characters were wrong, then you'd know that the encoding was wrong. This
 does imply, though, that you'd need to know what the language should
 look like!

 You could try different encodings, and for each one try to identify what
 could be words, then look them up in dictionaries for various languages
 to see whether they are real words...

This has been automated (semi-successfully, with caveats) by the
chardet package ... see http://chardet.feedparser.org/
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: signed vs unsigned int

2010-06-02 Thread John Machin

On Jun 2, 4:43 pm, johnty johntyw...@gmail.com wrote:
 i'm reading bytes from a serial port, and storing it into an array.

 each byte represents a signed 8-bit int.

 currently, the code i'm looking at converts them to an unsigned int by
 doing ord(array[i]). however, what i'd like is to get the _signed_
 integer value. whats the easiest way to do this?

signed = unsigned if unsigned = 127 else unsigned - 256
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: expat parsing error

2010-06-01 Thread John Machin

On Jun 2, 1:57 am, kak...@gmail.com kak...@gmail.com wrote:
 On Jun 1, 11:12 am, kak...@gmail.com kak...@gmail.com wrote:



  On Jun 1, 11:09 am, John Bokma j...@castleamber.com wrote:

   kak...@gmail.com kak...@gmail.com writes:
On Jun 1, 10:34 am, Stefan Behnel stefan...@behnel.de wrote:
kak...@gmail.com, 01.06.2010 16:00:

 how can i fix it, how to ignore the headers and parse only
 the XML?

Consider reading the answers you got in the last thread that you opened
with exactly this question.

Stefan

That's exactly, what i did but something seems to not working with the
solutions i had, when i changed my implementation from pure Python's
sockets to twisted library!
That's the reason i have created a new post!
Any ideas why this happened?

   As I already explained: if you send your headers as well to any XML
   parser it will choke on those, because the headers are /not/ valid /
   well-formed XML. The solution is to remove the headers from your
   data. As I explained before: headers are followed by one empty
   line. Just remove lines up and until including the empty line, and pass
   the data to any XML parser.

   --
   John Bokma                                                               
   j3b

   Hacking  Hiking in Mexico -  
   http://johnbokma.com/http://castleamber.com/-Perl; Python Development

  Thank you so much i'll try it!
  Antonis

 Dear John can you provide me a simple working solution?
 I don't seem to get it

You're not wrong. Trysomething like this:

rubbish1, rubbish2, xml = your_guff.partition('\n\n')
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with Regexp, \b

2010-05-31 Thread John Machin

On May 30, 1:30 am, andrew cooke and...@acooke.org wrote:


 That's what I thought it did...  Then I read the docs and confused
 empty string with space(!) and convinced myself otherwise.  I
 think I am going senile.

Not necessarily. Conflating concepts like string containing
whitespace, string containing space(s), empty aka 0-length
string, None, (ASCII) NUL, and (SQL) NULL appears to be an age-
independent problem :-)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError having fetch web page

2010-05-26 Thread John Machin

Rob Williscroft rtw at rtw.me.uk writes:

 
 Barry wrote in news:83dc485a-5a20-403b-99ee-c8c627bdbab3
 @m21g2000vbr.googlegroups.com in gmane.comp.python.general:
 

  UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1:
  unexpected code byte
 
 It may not be you, en.wiktionary.org is sending gzip 
 encoded content back,

It sure is; here's where the offending 0x8b comes from:

ID1 (IDentification 1)
   ID2 (IDentification 2)
These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139
(0x8b, \213), to identify the file as being in gzip format.

(from http://www.faqs.org/rfcs/rfc1952.html)


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: help need to write a python spell checker

2010-05-18 Thread John Machin

On May 19, 1:37 pm, Steven D'Aprano steve-REMOVE-
t...@cybersource.com.au wrote:
 On Wed, 19 May 2010 13:01:10 +1000, Nigel Rowe wrote:
  I'm happy to do you homework for you, cost is us$1000 per hour.  Email
  to your professor automatically on receipt.

 I'll do it for $700 an hour!

he could save the money if he oogledgay orvignay ellspay eckerchay
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Puzzled by code pages

2010-05-15 Thread John Machin

Adam Tauno Williams awilliam at whitemice.org writes:

 On Fri, 2010-05-14 at 20:27 -0400, Adam Tauno Williams wrote:
  I'm trying to process OpenStep plist files in Python.  I have a parser
  which works, but only for strict ASCII.  However plist files may contain
  accented characters - equivalent to ISO-8859-2 (I believe).  For example
  I read in the line:

  'skyp4_filelist_10201/localit\xc3\xa0 termali_sortfield =
  NSFileName;\n'
  What is the correct way to re-encode this data into UTF-8 so I can use
  unicode strings, and then write the output back to ISO8859-?

 Buried in the parser is a str(...) call.  Replacing that with
 unicode(...) and now the OpenSTEP plist parser is working with Italian
 plists.

Some observations:

Italian text is much more likely to be encoded in ISO-8859-1 than ISO-8859-2.
The latter covers eastern European languages (e.g. Polish, Czech, Hungarian)
that use the Latin alphabet with many decorations not found in western 
alphabets.

Let's look at the 'localit\xc3\xa0' example. Using ISO-8859-2, that decodes to
u'localit\u0102\xa0'. The second-last character is LATIN CAPITAL LETTER A WITH
BREVE (according to unicodedata.name()). The last character is NO-BREAK SPACE.
Doesn't look like an Italian word to me.

However, using UTF-8, that decodes to u'localit\xe0'. The last character is
LATIN SMALL LETTER A WITH GRAVE. Looks like a plausible Italian word to me. Also
to Wikipedia: A località (literally locality; plural località) is the name
given in Italian administrative law to a type of territorial subdivision of a
comune ...

Conclusions:

It's worth closely scrutinising accented characters - equivalent to ISO-8859-2
(I believe). Which variety of OpenStep plist files are you looking at:
NeXTSTEP, GNUstep, or MAC OS X? If the latter, it's evidently an XML document,
and you should be letting the XML parser decode it for you and in any case as an
XML document it's most likely UTF-8, not ISO-8859-2.

It's worth examining your definition of working.


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Fastest way to calculate leading whitespace

2010-05-09 Thread John Machin

dasacc22 dasacc22 at gmail.com writes:

 
 U presume entirely to much. I have a preprocessor that normalizes
 documents while performing other more complex operations.  Theres
 nothing buggy about what im doing

Are you sure?

Your solution calculates (the number of leading whitespace characters) + (the
number of TRAILING whitespace characters).

Problem 1: including TRAILING whitespace.
Example: content + 3 *   + \n has 4 leading spaces according to your
reckoning; should be 0.
Fix: use lstrip() instead of strip()

Problem 2: assuming all whitespace characters have *effective* width the same as
 .
Examples: TAB has width 4 or 8 or whatever you want it to be. There are quite a
number of whitespace characters, even when you stick to ASCII. When you look at
Unicode, there are heaps more. Here's a list of BMP characters such that
character.isspace() is True, showing the Unicode codepoint, the Python repr(),
and the name of the character (other than for control characters):

U+0009 u'\t' ?
U+000A u'\n' ?
U+000B u'\x0b' ?
U+000C u'\x0c' ?
U+000D u'\r' ?
U+001C u'\x1c' ?
U+001D u'\x1d' ?
U+001E u'\x1e' ?
U+001F u'\x1f' ?
U+0020 u' ' SPACE
U+0085 u'\x85' ?
U+00A0 u'\xa0' NO-BREAK SPACE
U+1680 u'\u1680' OGHAM SPACE MARK
U+2000 u'\u2000' EN QUAD
U+2001 u'\u2001' EM QUAD
U+2002 u'\u2002' EN SPACE
U+2003 u'\u2003' EM SPACE
U+2004 u'\u2004' THREE-PER-EM SPACE
U+2005 u'\u2005' FOUR-PER-EM SPACE
U+2006 u'\u2006' SIX-PER-EM SPACE
U+2007 u'\u2007' FIGURE SPACE
U+2008 u'\u2008' PUNCTUATION SPACE
U+2009 u'\u2009' THIN SPACE
U+200A u'\u200a' HAIR SPACE
U+200B u'\u200b' ZERO WIDTH SPACE
U+2028 u'\u2028' LINE SEPARATOR
U+2029 u'\u2029' PARAGRAPH SEPARATOR
U+202F u'\u202f' NARROW NO-BREAK SPACE
U+205F u'\u205f' MEDIUM MATHEMATICAL SPACE
U+3000 u'\u3000' IDEOGRAPHIC SPACE

Hmmm, looks like all kinds of widths, from zero upwards.




-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to get xml.etree.ElementTree not bomb on invalid characters in XML file ?

2010-05-04 Thread John Machin

On May 5, 12:11 am, Barak, Ron ron.ba...@lsi.com wrote:
  -Original Message-
  From: Stefan Behnel [mailto:stefan...@behnel.de]
  Sent: Tuesday, May 04, 2010 10:24 AM
  To: python-l...@python.org
  Subject: Re: How to get xml.etree.ElementTree not bomb on
  invalid characters in XML file ?

  Barak, Ron, 04.05.2010 09:01:
    I'm parsing XML files using ElementTree from xml.etree (see code
   below (and attached xml_parse_example.py)).

   However, I'm coming across input XML files (attached an example:
   tmp.xml) which include invalid characters, that produce the
  following
   traceback:

   $ python xml_parse_example.py
   Traceback (most recent call last):
   xml.parsers.expat.ExpatError: not well-formed (invalid
  token): line 6,
   column 34

  I hope you are aware that this means that the input you are
  parsing is not XML. It's best to reject the file and tell the
  producers that they are writing broken output files. You
  should always fix the source, instead of trying to make sense
  out of broken input in fragile ways.

   I read the documentation for xml.etree.ElementTree and see
  that it may
   take an optional parser parameter, but I don't know what
  this parser
   should be - to ignore the invalid characters.

   Could you suggest a way to call ElementTree, so it won't
  bomb on these
   invalid characters ?

  No. The parser in lxml.etree has a 'recover' option that lets
  it try to recover from input errors, but in general, XML
  parsers are required to reject non well-formed input.

  Stefan

 Hi Stefan,
 The XML file seems to be valid XML (all XML viewers I tried were able to read 
 it).
 You can verify this by trying to read the XML example I attached to the 
 original message (attached again here).
 Actually, when trying to view the file with an XML viewer, these offensive 
 characters are not shown.
 It's just that some of the fields include characters that the parser used by 
 ElementTree seems to chock on.
 Bye,
 Ron.

  tmp_small.xml
  1KViewDownload

Have a look at your file with e.g. a hex editor or just Python repr()
-- see below. You will see that there are four cases of
taggood_data\x00garbage/tag
where garbage is repeated \x00 or just random line noise or
uninitialised memory.

m_sanApiName1MainStorage_snap\x00\x00*SNIP*\x00\x00/
m_sanApiName1

m_detailBROLB21\x00\xeequot;\x00\x00\x00\x90,\x02G\xdc\xfb\x04P\xdc
\xfb\x04\x01a\xfcgt;(\xe8\xfb\x04/m_detail

It's a toss-up whether the gt; in there is accidental or a deliberate
attempt to sanitise the garbage !-)

m_detailAlstom\x00\x00o\x00m\x00\x00*SNIP*\x00\x00/m_detail

m_sanApiVersionV5R1.28.1 [R - LA]\x00\x00*SNIP*\x00\x00/
m_sanApiVersion

The garbage in the 2nd case is such as to make the initial
declaration
encoding=UTF-8
an outright lie and I'm curious as to how the XML parser managed to
get as far as it did -- it must decode a line at a time.

As already advised: it's much better to reject that rubbish outright
than to attempt to repair it. Repair should be contemplated only if
it's a one-off exercise AND you can't get a fixed copy from the
source.

And while we're on the subject of rubbish: The XML file seems to be
valid XML (all XML viewers I tried were able to read it). The
conclusion from that is that all XML viewers that you tried are
rubbish.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to get xml.etree.ElementTree not bomb on invalid characters in XML file ?

2010-05-04 Thread John Machin

On May 5, 3:43 am, Terry Reedy tjre...@udel.edu wrote:
 On 5/4/2010 11:37 AM, Stefan Behnel wrote:

  Barak, Ron, 04.05.2010 16:11:
  The XML file seems to be valid XML (all XML viewers I tried were able
  to read it).

  From Internet Explorer:

 The XML page cannot be displayed
 Cannot view XML input using XSL style sheet. Please correct the error
 and then click the Refresh button, or try again later.

 

 An invalid character was found in text content. Error processing
 resource 'file:///C:/Documents and Settings...

       m_detailBROLB21



  This is what xmllint gives me:

  ---
  $ xmllint /home/sbehnel/tmp.xml
  tmp.xml:6: parser error : Char 0x0 out of allowed range
  m_sanApiName1MainStorage_snap
  ^
  tmp.xml:6: parser error : Premature end of data in tag m_sanApiName1 line 6
  m_sanApiName1MainStorage_snap
  ^
  tmp.xml:6: parser error : Premature end of data in tag DbHbaGroup line 5
  m_sanApiName1MainStorage_snap
  ^
  tmp.xml:6: parser error : Premature end of data in tag database line 4
  m_sanApiName1MainStorage_snap
  ^
  ---

  The file contains 0-bytes - clearly not XML.

 IE agrees.

Look closer. IE *DOESN'T* agree. It has ignored the problem on line 6
and lurched on to the next problem (in line 11). If you edit that file
to remove the line noise in line 11, leaving the 3 cases of multiple
\x00 bytes, IE doesn't complain at all about the (invalid) \x00 bytes.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: condition and True or False

2010-05-02 Thread John Machin

On May 3, 9:14 am, Steven D'Aprano st...@remove-this-
cybersource.com.au wrote:

 If it is any arbitrary object, then x and True or False is just an
 obfuscated way of writing bool(x). Perhaps their code predates the
 introduction of bools, and they have defined global constants True and
 False but not bool. Then they removed the True and False bindings as no
 longer necessary, but neglected to replace the obfuscated conversion.

Or perhaps they are maintaining code that must run on any 2.X. True
and False would be set up conditional on Python version. Writing
expression and True or False avoids a function call.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: csv.py sucks for Decimal

2010-04-25 Thread John Machin

On Apr 23, 9:23 am, Phlip phlip2...@gmail.com wrote:

 When I use the CSV library, with QUOTE_NONNUMERIC, and when I pass in
 a Decimal() object, I must convert it to a string.

Why must you? What unwanted effect do you observe when you don't
convert it?

 the search for an alternate CSV module, without
 this bug, will indeed begin very soon!

What bug?

 I'm pointing out that QUOTE_NONNUMERIC would work better with an
 option to detect numeric-as-string, and absolve it. That would allow
 Decimal() to do its job, unimpeded.

Decimal()'s job is to create an instance of the decimal.Decimal class;
how is that being impeded by anything in the csv module?
-- 
http://mail.python.org/mailman/listinfo/python-list

[issue8308] raw_bytes.decode('cp932') -- spurious mappings

2010-04-04 Thread John Machin


John Machin sjmac...@users.sourceforge.net added the comment:

Thanks, Martin. Issue closed as far as I'm concerned.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8308
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8308] raw_bytes.decode('cp932') -- spurious mappings

2010-04-03 Thread John Machin


New submission from John Machin sjmac...@users.sourceforge.net:

According to the following references, the bytes 80, A0, FD, FE, and FF are not 
defined in cp932:

http://msdn.microsoft.com/en-au/goglobal/cc305152.aspx
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003s=ALL

However CPython 3.1.2 does this:

  print(ascii(b'\x80\xa0\xfd\xfe\xff'.decode('cp932')))
 '\x80\uf8f0\uf8f1\uf8f2\uf8f3'

(as do 2.5, 2.6. and 2.7 with the appropriate syntax)

This maps 80 to U+0080 (not very useful) and maps the other 4 bytes into the 
Private Use Area (PUA)!! Each case should be treated as 
undefined/unexpected/error/...

--
components: Unicode
messages: 102308
nosy: sjmachin
severity: normal
status: open
title: raw_bytes.decode('cp932') -- spurious mappings
type: behavior
versions: Python 2.7, Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8308
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin


John Machin sjmac...@users.sourceforge.net added the comment:

@ezio.melotti: Your second sentence is true, but it is not the whole truth. 
Bytes in the range C0-FF (whose high bit *is* set) ALSO shouldn't be considered 
part of the sequence because they (like 00-7F) are invalid as continuation 
bytes; they are either starter bytes (C2-F4) or invalid for any purpose (C0-C2 
and F5-FF). Further, some bytes in the range 80-BF are NOT always valid as the 
first continuation byte, it depends on what starter byte they follow.

The simple way of summarising the above is to say that a byte that is not a 
valid continuation byte in the current state (failing byte) is not a part of 
the current (now known to be invalid) sequence, and the decoder must try again 
(resync) with the failing byte.

Do you agree with my example 3?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin


John Machin sjmac...@users.sourceforge.net added the comment:

#ezio.melotti: I'm considering valid all the bytes that start with '10...'

Sorry, WRONG. Read what I wrote: Further, some bytes in the range 80-BF are 
NOT always valid as the first continuation byte, it depends on what starter 
byte they follow.

Consider these sequences: (1) E0 80 80 (2) E0 9F 80. Both are invalid sequences 
(over-long). Specifically the first continuation byte may not be in 80-9F. 
Those bytes start with '10...' but they are invalid after an E0 starter byte.

Please read Table 3-7. Well-Formed UTF-8 Byte Sequences and surrounding text 
in Unicode 5.2.0 chapter 3 (bearing in mind that CPython (for good reasons) 
doesn't implement the surrogates restriction, so that the special case for 
starter byte ED is not used in CPython). Note the other 3 special cases for the 
first continuation byte.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin


John Machin sjmac...@users.sourceforge.net added the comment:

Unicode has been frozen at 0x10. That's it. There is no such thing as a 
valid 5-byte or 6-byte UTF-8 string.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin


John Machin sjmac...@users.sourceforge.net added the comment:

@lemburg: RFC 2279 was obsoleted by RFC 3629 over 6 years ago. The standard now 
says 21 bits is it. F5-FF are declared to be invalid. I don't understand what 
you mean by supporting those possibilities. The code is correctly issuing an 
error message. The goal of supporting the new resyncing and FFFD-emitting rules 
might be better met however by throwing away the code in the default clause and 
instead merely setting the entries for F5-FF in the utf8_code_length array to 
zero.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin


John Machin sjmac...@users.sourceforge.net added the comment:

Patch review:

Preamble: pardon my ignorance of how the codebase works, but trunk 
unicodeobject.c is r79494 (and allows encoding of surrogate codepoints), py3k 
unicodeobject.c is r79506 (and bans the surrogate caper) and I can't find the 
r79542 that the patch mentions ... help, please!

length 2 case: 
1. the loop can be hand-unrolled into oblivion. It can be entered only when 
s[1]  0xC0 != 0x80 (previous if test).
2. the over-long check (if (ch  0x80)) hasn't been touched. It could be 
removed and the entries for C0 and C1 in the utf8_code_length array set to 0.

length 3 case:
1. the tests involving s[0] being 0xE0 or 0xED are misplaced.
2. the test s[0] == 0xE0  s[1]  0xA0 if not misplaced would be shadowing the 
over-long test (ch  0x800). It seems better to use the over-long test (with 
endinpos set to 1).
3. The test s[0] == 0xED relates to the surrogates caper which in the py3k 
version is handled in the same place as the over-long test.
4. unrolling loop: needs no loop, only 1 test ... if s[1] is good, then we know 
s[2] must be bad without testing it, because we start the for loop only when 
s[1] is bad || s[2] is bad.

length 4 case: as for the len 3 case generally ... misplaced tests, F1 test 
shadows over-long test, F4 test shadows max value test, too many loop 
iterations.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin


John Machin sjmac...@users.sourceforge.net added the comment:

Chapter 3, page 94: As a consequence of the well-formedness conditions 
specified in Table 3-7, the following byte values are disallowed in UTF-8: 
C0–C1, F5–FF

Of course they should be handled by the simple expedient of setting their 
length entry to zero. Why write code when there is an existing mechanism??

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-04-01 Thread John Machin


John Machin sjmac...@users.sourceforge.net added the comment:

@lemburg: perhaps applying the same logic as for the other sequences is a 
better strategy

What other sequences??? F5-FF are invalid bytes; they don't start valid 
sequences. What same logic?? At the start of a character, they should get the 
same short sharp treatment as any other non-starter byte e.g. 80 or C0.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-03-31 Thread John Machin


John Machin sjmac...@users.sourceforge.net added the comment:

@lemburg: failing byte seems rather obvious: first byte that you meet that is 
not valid in the current state. I don't understand your explanation, especially 
does not have the high bit set. I think you mean is a valid starter byte. 
See example 3 below.

Example 1: F1 80 41 42 43. F1 implies a 4-byte character. 80 is OK. 41 is not 
in 80-BF. It is the failing byte; high bit not set. Required action is to 
emit FFFD then resync on the 41, causing 0041 0042 0043 to be emitted. Total 
output: FFFD 0041 0042 0043. Current code emits FFFD 0043.

Example 2: F1 80 FF 42 43. F1 implies a 4-byte character. 80 is OK. FF is not 
in 80-BF. It is the failing byte. Required action is to emit FFFD then resync 
on the FF. FF is not a valid starter byte, so emit FFFD, and resync on the 42, 
causing 0042 0043 to be emitted. Total output: FFFD FFFD 0042 0043. Current 
code emits FFFD 0043.

Example 3: F1 80 C2 81 43. F1 implies a 4-byte character. 80 is OK. C2 is not 
in 80-BF. It is the failing byte. Required action is to emit FFFD then resync 
on the C2. C2 and 81 have the high bit set, but C2 is a valid starter byte, and 
remaining bytes are OK, causing 0081 0043 to be emitted. Total output: FFFD 
0081 0043. Current code emits FFFD 0043.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

2010-03-30 Thread John Machin


New submission from John Machin sjmac...@users.sourceforge.net:

Unicode 5.2.0 chapter 3 (Conformance) has a new section (headed Constraints on 
Conversion Processes) after requirement D93. Recent Pythons e.g. 3.1.2 don't 
comply. Using the Unicode example:

  print(ascii(b\xc2\x41\x42.decode('utf8', 'replace')))
 '\ufffdB'
 # should produce u'\ufffdAB'

Resynchronisation currently starts at a position derived by considering the 
length implied by the start byte:

  print(ascii(b\xf1ABCD.decode('utf8', 'replace')))
 '\ufffdD'
 # should produce u'\ufffdABCD'; resync should start from the *failing* byte.

Notes: This applies to the 'ignore' option as well as the 'replace' option. The 
Unicode discussion mentions security exploits.

--
messages: 101972
nosy: sjmachin
severity: normal
status: open
title: str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
type: behavior
versions: Python 2.7, Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Re: subtraction is giving me a syntax error

2010-03-15 Thread John Machin

On Mar 16, 5:43 am, Baptiste Carvello baptiste...@free.fr wrote:
 Joel Pendery a écrit :

  So I am trying to write a bit of code and a simple numerical
  subtraction

  y_diff = y_diff-H

  is giving me the error

  Syntaxerror: Non-ASCII character '\x96' in file on line 70, but no
  encoding declared.


 I would say that when you press the minus key, your operating system doesn't
 encode the standard (ASCII) minus character, but some fancy character, which
 Python cannot interpret.

The likelihood that any operating system however brain-damaged and in
whatever locale would provide by default a keyboard or input
method that generated EN DASH when the '-' key is struck is somewhere
between zero and epsilon.

Already advanced theories like used a word processor instead of a
programmer's editor and scraped it off the web are much more
plausible.

 More precisely, I suspect you are unsing Windows with codepage 1252 (latin 1).

Codepage 1252 is not latin1 in the generally accepted meaning of
latin1 i.e. ISO-8859-1. It is a superset. MS in their wisdom or
otherwise chose to use most of the otherwise absolutely wasted slots
assigned to C1 control characters in latin1.

 With this encoding, you have 2 kinds of minus signs: the standard (45th
 character, in hex '\x2d') and the non-standard (150th character, in hex 
 '\x96').

 cf:http://msdn.microsoft.com/en-us/library/cc195054.aspx

The above link quite correctly says that '\x96` maps to U+2013 EN
DASH. EN DASH is not any kind of minus sign.

Aside: the syndrome causing the problem is apparent with cp125x for x
in range(9)



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: datelib pythonification

2010-02-21 Thread John Machin

On Feb 21, 12:37 pm, alex goretoy agore...@gmail.com wrote:
 hello all,
     since I posted this last time, I've added a new function dates_diff and

[SNIP]

I'm rather unsure of the context of this posting ... I'm assuming that
the subject datelib pythonification refers to trying to make
datelib more pythonic, with which you appear to need help.

Looking just at the new function (looks like a method to me)
dates_diff, problems include:

1. Mostly ignores PEP-8 about spaces after commas, around operators
2. Checks types
3. Checks types using type(x) == type(y)
4. Inconsistent type checking: checks types in case of
dates_diff(date1, date2) but not in case of dates_diff([date1, date2])
5. Doesn't check for 3 or more args.
6. The 0-arg case is for what purpose?
7. The one-arg case is overkill -- if the caller has the two values in
alist, all you are saving them from is the * in dates_diff(*alist)
8. Calling type(date.today()) once per 2-arg call would be a gross
extravagance; calling it twice per 2-arg call is mind-boggling.
9. start,end=(targs[0][0],targs[0][1]) ... multiple constant
subscripts is a code smell; this one is pongier than usual because it
could easily be replaced by start, end = targs[0]

Untested fix of problems 1, 3, 4, 5, 8, 9:

DATE_TYPE = type(date.today())

def dates_diff(self, *targs):
nargs = len(targs)
if nargs == 0:
return self.enddate - self.startdate
if nargs == 1:
arg = targs[0]
if not isinstance(arg, (list, tuple)) or len(arg) != 2:
raise Exception(
single arg must be list or tuple of length 2)
start, end = arg
elif nargs == 2:
start, end = targs
else:
raise Exception(expected 0,1, or 2 args; found %d % nargs)
if isinstance(start, DATE_TYPE) and isinstance(end, DATE_TYPE):
return end - start
raise Exception(both values must be of type DATE_TYPE)

HTH,

John

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parsing an Excel formula with the re module

2010-01-14 Thread John Machin

On Jan 13, 7:15 pm, Paul McGuire pt...@austin.rr.com wrote:
 On Jan 5, 1:49 pm, Tim Chase python.l...@tim.thechases.com wrote:



  vsoler wrote:
   Hence, I need toparseExcel formulas. Can I do it by means only of re
   (regular expressions)?

   I know that for simple formulas such as =3*A7+5 it is indeed
   possible. What about complex for formulas that include functions,
   sheet names and possibly other *.xls files?

  Where things start getting ugly is when you have nested function
  calls, such as

     =if(Sum(A1:A25)42,Min(B1:B25), if(Sum(C1:C25)3.14,
  (Min(C1:C25)+3)*18,Max(B1:B25)))

  Regular expressions don't do well with nested parens (especially
  arbitrarily-nesting-depth such as are possible), so I'd suggest
  going for a full-blown parsing solution like pyparsing.

  If you have fair control over what can be contained in the
  formulas and you know they won't contain nested parens/functions,
  you might be able to formulate some sort of kinda, sorta, maybe
  parses some forms of formulas regexp.

  -tkc

 This might give the OP a running start:

Unfortunately this will blow up after only a few paces; see
below ...


 from pyparsing import (CaselessKeyword, Suppress, Word, alphas,
     alphanums, nums, Optional, Group, oneOf, Forward, Regex,
     operatorPrecedence, opAssoc, dblQuotedString)

 test1 = =3*A7+5
 test2 = =3*Sheet1!$A$7+5

test2a ==3*'Sheet 1'!$A$7+5
test2b ==3*'O''Reilly''s sheet'!$A$7+5


 test3 = =if(Sum(A1:A25)42,Min(B1:B25),  \
      if(Sum(C1:C25)3.14, (Min(C1:C25)+3)*18,Max(B1:B25)))

Many functions can take a variable number of args and they are not
restricted to cell references e.g.

test3a = =sum(a1:a25,10,min(b1,c2,d3))

The arg separator is comma or semicolon depending on the locale ... a
parser should accept either.


 EQ,EXCL,LPAR,RPAR,COLON,COMMA,DOLLAR = map(Suppress, '=!():,$')
 sheetRef = Word(alphas, alphanums)
 colRef = Optional(DOLLAR) + Word(alphas,max=2)
 rowRef = Optional(DOLLAR) + Word(nums)
 cellRef = Group(Optional(sheetRef + EXCL)(sheet) + colRef(col) +
                     rowRef(row))

 cellRange = (Group(cellRef(start) + COLON + cellRef(end))
 (range)
                 | cellRef )

 expr = Forward()

 COMPARISON_OP = oneOf( =  = = != )
 condExpr = expr + COMPARISON_OP + expr

 ifFunc = (CaselessKeyword(if) +
           LPAR +
           Group(condExpr)(condition) +

that should be any expression; at run-time it expects a boolean (TRUE
or FALSE) or a number (0 means false, non-0 means true). Text causes a
#VALUE! error. Trying to subdivide expressions into conditional /
numeric /text just won't work.


           COMMA + expr(if_true) +
           COMMA + expr(if_false) + RPAR)
 statFunc = lambda name : CaselessKeyword(name) + LPAR + cellRange +
 RPAR
 sumFunc = statFunc(sum)
 minFunc = statFunc(min)
 maxFunc = statFunc(max)
 aveFunc = statFunc(ave)
 funcCall = ifFunc | sumFunc | minFunc | maxFunc | aveFunc

 multOp = oneOf(* /)
 addOp = oneOf(+ -)

needs power op ^

 numericLiteral = Regex(r\-?\d+(\.\d+)?)

Sorry, that - in there is a unary minus operator. What about 1e23 ?

 operand = numericLiteral | funcCall | cellRange | cellRef
 arithExpr = operatorPrecedence(operand,
     [
     (multOp, 2, opAssoc.LEFT),
     (addOp, 2, opAssoc.LEFT),
     ])

 textOperand = dblQuotedString | cellRef
 textExpr = operatorPrecedence(textOperand,
     [
     ('', 2, opAssoc.LEFT),
     ])

Excel evaluates excessively permissively, and the punters are
definitely not known for self-restraint. The above just won't work:
2.3  4.5 produces text 2.34.5, while 2.3 + 4.5 produces number
6.8.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parsing an Excel formula with the re module

2010-01-14 Thread John Machin

On Jan 14, 2:05 pm, Gabriel Genellina gagsl-...@yahoo.com.ar
wrote:
 En Wed, 13 Jan 2010 05:15:52 -0300, Paul McGuire pt...@austin.rr.com  
 escribió:

  vsoler wrote:
   Hence, I need toparseExcel formulas. Can I do it by means only of re
   (regular expressions)?

  This might give the OP a running start:

  from pyparsing import (CaselessKeyword, Suppress, ...

 Did you build those parsing rules just by common sense, or following some  
 actual specification?

Leave your common sense with the barkeep when you enter the Excel
saloon; it is likely to be a hindrance. The specification is what
Excel does.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parsing an Excel formula with the re module

2010-01-14 Thread John Machin

On Jan 15, 3:41 pm, Paul McGuire pt...@austin.rr.com wrote:
 I never represented that this parser would handle any and all Excel
 formulas!
  But I should hope the basic structure of a pyparsing
 solution might help the OP add some of the other features you cited,
 if necessary. It's actually pretty common to take an incremental
 approach in making such a parser, and so here are some of the changes
 that you would need to make based on the deficiencies you pointed out:

 functions can have a variable number of arguments, of any kind of
 expression
 - statFunc = lambda name : CaselessKeyword(name) + LPAR + delimitedList
 (expr) + RPAR

 sheet name could also be a quoted string
 - sheetRef = Word(alphas, alphanums) | QuotedString(',escQuote='')

 add boolean literal support
 - boolLiteral = oneOf(TRUE FALSE)
 - operand = numericLiteral | funcCall | boolLiteral | cellRange |
 cellRef

or a string literal ... you seem to have ignored the significant point
that the binary operators don't have narrow type requirements of their
args (2.3  4.5 produces text 2.34.5, while 2.3 + 4.5
produces number 6.8); your attempt to enforce particular types for
args at compile-time is erroneous OVER-engineering.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parsing an Excel formula with the re module

2010-01-12 Thread John Machin


On 12/01/2010 6:26 PM, Chris Withers wrote:

John Machin wrote:

The xlwt package (of which I am the maintainer) has a lexer and parser
for a largish subset of the syntax ... see  
http://pypi.python.org/pypi/xlwt


xlrd, no?


A facility in xlrd to decompile Excel formula bytecode into a text 
formula is currently *under discussion*.


The OP was planning to dig the formula text out using COM then parse the 
formula text looking for cell references and appeared to have a rather 
simplistic view of the ease of parsing Excel formula text -- that's why 
I pointed him at those facilities (existing, released, proven in the 
field) in xlwt.




--
http://mail.python.org/mailman/listinfo/python-list

Re: What is built-in method sub

2010-01-11 Thread John Machin

On Jan 12, 7:30 am, Jeremy jlcon...@gmail.com wrote:
 On Jan 11, 1:15 pm, Diez B. Roggisch de...@nospam.web.de wrote:



  Jeremy schrieb:

   On Jan 11, 12:54 pm, Carl Banks pavlovevide...@gmail.com wrote:
   On Jan 11, 11:20 am, Jeremy jlcon...@gmail.com wrote:

   I just profiled one of my Python scripts and discovered that 99% of
   the time was spent in
   {built-in method sub}
   What is this function and is there a way to optimize it?
   I'm guessing this is re.sub (or, more likely, a method sub of an
   internal object that is called by re.sub).

   If all your script does is to make a bunch of regexp substitutions,
   then spending 99% of the time in this function might be reasonable.
   Optimize your regexps to improve performance.  (We can help you if you
   care to share any.)

   If my guess is wrong, you'll have to be more specific about what your
   sctipt does, and maybe share the profile printout or something.

   Carl Banks

   Your guess is correct.  I had forgotten that I was using that
   function.

   I am using the re.sub command to remove trailing whitespace from lines
   in a text file.  The commands I use are copied below.  If you have any
   suggestions on how they could be improved, I would love to know.

   Thanks,
   Jeremy

   lines = self._outfile.readlines()
   self._outfile.close()

   line = string.join(lines)

   if self.removeWS:
       # Remove trailing white space on each line
       trailingPattern = '(\S*)\ +?\n'
       line = re.sub(trailingPattern, '\\1\n', line)

  line = line.rstrip()?

  Diez

 Yep.  I was trying to reinvent the wheel.  I just remove the trailing
 whitespace before joining the lines.

Actually you don't do that. Your regex has three components:

(1) (\S*) zero or more occurrences of not-whitespace
(2) \ +? one or more (non-greedy) occurrences of SPACE
(3) \n a newline

Component (2) should be \s+?

In any case this is a round-about way of doing it. Try writing a regex
that does it simply: replace trailing whitespace by an empty string.

Another problem with your approach: it doesn't work if the line is not
terminated by \n -- this is quite possible if the lines are being read
from a file.

A wise person once said: Re-inventing the wheel is often accompanied
by forgetting to re-invent the axle.



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Porblem with xlutils/xlrd/xlwt

2010-01-10 Thread John Machin

On Jan 10, 8:51 pm, pp parul.pande...@gmail.com wrote:
 On Jan 9, 8:23 am, John Machin sjmac...@lexicon.net wrote:



  On Jan 9, 9:56 pm, pp parul.pande...@gmail.com wrote:

   On Jan 9, 3:52 am, Jon Clements jon...@googlemail.com wrote:

On Jan 9, 10:44 am, pp parul.pande...@gmail.com wrote:

 On Jan 9, 3:42 am, Jon Clements jon...@googlemail.com wrote:

  On Jan 9, 10:24 am, pp parul.pande...@gmail.com wrote:
 yeah all my versions are latest fromhttp://www.python-excel.org.
 just checked!!

  How did you check?

You didn't answer this question.


 what could be the problem?

Does rb = xlrd.open_workbook('somesheet.xls', on_demand=True) work by
itself?

   Yes it does. The problem is with line: wb =  copy(rb)
   here I am getting the error: AttributeError: 'Book' object has no
   attribute 'on_demand'

  Please replace the first 4 lines of your script by these 6 lines:

  import xlrd
  assert xlrd.__VERSION__ == 0.7.1
  from xlwt import easyxf
  from xlutils.copy import copy
  rb = xlrd.open_workbook(
      'source.xls',formatting_info=True, on_demand=False)

  and run it again. Please copy all the output and paste it into your
  response.

 This time when I ran the code sent by you I got the following
 results:I am using ipython for running the code.

 AssertionError                            Traceback (most recent call
 last)

 /home/parul/CODES/copy_1.py in module()
       1
  2 import xlrd
       3 assert xlrd.__VERSION__ == 0.7.1
       4 from xlwt import easyxf
       5 from xlutils.copy import copy
       6 rb = xlrd.open_workbook('source.xls',formatting_info=True,
 on_demand=False)

 AssertionError:
 WARNING: Failure executing file: copy_1.py


Your traceback appears to show an AssertionError from an import
statement. We could do without an extra layer of noise in the channel;
please consider giving ipython the flick (for debug purposes, at
least) and use Python to run your script from the shell prompt.

Change the second line to read:

print xlrd.__VERSION__

 I used www.python-excel.org to get xlrd and xlwt .. so they are latest
 versions.

Let's concentrate on xlrd. I presume that means that you clicked
on the xlrd Download link which took you to http://pypi.python.org/pypi/xlrd
from which you can download the latest version of the package. That
page has xlrd 0.7.1 in a relatively large font at the top. You would
have been presented with options to download one of these

xlrd-0.7.1.tar.gz
xlrd-0.7.1.win32.exe
xlrd-0.7.1.zip

(each uploaded on 2009-06-01).

Which one did you download, and then what did you do with it?

Or perhaps you ignored those and read further down to Download link
which took you to an out-of-date page but you didn't notice the
0.6.1 in large bold type at the top nor the Page last updated on 11
June 2007 at the bottom nor the 0.6.1 in the name of the file that
you downloaded ... sorry about that; I've smacked the webmaster about
the chops :-)

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to get many places of pi from Machin's Equation?

2010-01-09 Thread John Machin

On Jan 9, 10:31 pm, Richard D. Moores rdmoo...@gmail.com wrote:
 Machin's Equation is

 4 arctan (1/5) - arctan(1/239) = pi/4

 Using Python 3.1 and the math module:



  from math import atan, pi
  pi
 3.141592653589793
  (4*atan(.2) - atan(1/239))*4
 3.1415926535897936
  (4*atan(.2) - atan(1/239))*4 == pi
 False
  abs((4*atan(.2) - atan(1/239))*4) - pi  .01
 False
  abs((4*atan(.2) - atan(1/239))*4) - pi  .0001
 False
  abs((4*atan(.2) - atan(1/239))*4) - pi  .001
 True

 Is there a way in Python 3.1 to calculate pi to greater accuracy using
 Machin's Equation? Even to an arbitrary number of places?

Considering that my namesake calculated pi to 100 decimal places with
the computational equipment available in 1706 (i.e. not much), I'd bet
you London to a brick that Python (any version from 0.1 onwards) could
be used to simulate his calculations to any reasonable number of
places. So my answers to your questions are yes and yes.

Suggestion: search_the_fantastic_web(machin pi python)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Porblem with xlutils/xlrd/xlwt

2010-01-09 Thread John Machin

On Jan 9, 9:56 pm, pp parul.pande...@gmail.com wrote:
 On Jan 9, 3:52 am, Jon Clements jon...@googlemail.com wrote:



  On Jan 9, 10:44 am, pp parul.pande...@gmail.com wrote:

   On Jan 9, 3:42 am, Jon Clements jon...@googlemail.com wrote:

On Jan 9, 10:24 am, pp parul.pande...@gmail.com wrote:

   yeah all my versions are latest fromhttp://www.python-excel.org.
   just checked!!

How did you check?

   what could be the problem?

  Does rb = xlrd.open_workbook('somesheet.xls', on_demand=True) work by
  itself?

 Yes it does. The problem is with line: wb =  copy(rb)
 here I am getting the error: AttributeError: 'Book' object has no
 attribute 'on_demand'

Please replace the first 4 lines of your script by these 6 lines:

import xlrd
assert xlrd.__VERSION__ == 0.7.1
from xlwt import easyxf
from xlutils.copy import copy
rb = xlrd.open_workbook(
'source.xls',formatting_info=True, on_demand=False)

and run it again. Please copy all the output and paste it into your
response.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Astronomy--Programs to Compute Siderial Time?

2010-01-07 Thread John Machin

On Jan 7, 2:40 pm, W. eWatson wolftra...@invalid.com wrote:
 John Machin wrote:


  What you have been reading is the Internal maintenance
  specification (large font, near the top of the page) for the module.
  The xml file is the source of the docs, not meant to be user-legible.

 What is it used for?

The maintainer of the module processes the xml file with some script
or other to create the user-legible docs.

 Do I need it?

No.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How do I access what's in this module?

2010-01-07 Thread John Machin

On Jan 8, 12:21 pm, Fencer no.i.d...@want.mail.from.spammers.com
wrote:
 Hello, look at this lxml documentation 
 page:http://codespeak.net/lxml/api/index.html

That's for getting details about an object once you know what object
you need to use to do what. In the meantime, consider reading the
tutorial and executing some of the examples:
http://codespeak.net/lxml/tutorial.html

 How do I access the functions and variables listed?

 I tried from lxml.etree import ElementTree and the import itself seems
 to pass without complaint by the python interpreter but I can't seem to
 access anything in ElementTree, not the functions or variables. What is
 the proper way to import that module?

 For example:
   from lxml.etree import ElementTree
   ElementTree.dump(None)
 Traceback (most recent call last):
    File console, line 1, in module

lxml.etree is a module. ElementTree is effectively a class. The error
message that you omitted to show us might have given you a clue.

To save keystrokes you may like to try
from lxml import etree as ET
and thereafter refer to the module as ET

|  from lxml import etree as ET
|  type(ET)
| type 'module'
|  type(ET.ElementTree)
| type 'builtin_function_or_method'
|  help(ET.ElementTree)
| Help on built-in function ElementTree in module lxml.etree:
|
| ElementTree(...)
| ElementTree(element=None, file=None, parser=None)
|
| ElementTree wrapper class.

 Also, can I access those items that begin with an underscore if I get
 the import sorted?

Using pommy slang like sorted in an IT context has the potential to
confuse your transatlantic correspondents :-)

Can access? Yes. Should access? The usual Python convention is that an
object whose name begins with an underscore should be accessed only
via a documented interface (or, at your own risk, if you think you
know what you are doing).

HTH,
John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How do I access what's in this module?

2010-01-07 Thread John Machin

On Jan 8, 2:45 pm, Fencer no.i.d...@want.mail.from.spammers.com
wrote:
 On 2010-01-08 04:40, John Machin wrote:



  For example:
      from lxml.etree import ElementTree
      ElementTree.dump(None)
  Traceback (most recent call last):
      File console, line 1, inmodule

  lxml.etree is a module. ElementTree is effectively a class. The error
  message that you omitted to show us might have given you a clue.

 But I did show the error message? It's just above what you just wrote. I
 try to include all relevant information in my posts.

excerpt
Traceback (most recent call last):
   File console, line 1, in module

Also, can I access those items ...
/excerpt

Error message should appear after line starting with File. Above
excerpt taken from google groups; identical to what shows in
http://news.gmane.org/gmane.comp.python.general ... what are you
looking at?

With Windows XP and Python 2.5.4 I get:

Traceback (most recent call last):
  File stdin, line 1, in module
AttributeError: 'builtin_function_or_method' object has no attribute
'dump'

 It turns out I no longer want to access anything in there but I thank
 you for your information nontheless.

You're welcome -- the advice on _methods is portable :-)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: TypeError

2010-01-06 Thread John Machin

On Jan 7, 3:29 am, MRAB pyt...@mrabarnett.plus.com wrote:
 Victor Subervi wrote:

  ValueError: unsupported format character '(' (0x28) at index 54
        args = (unsupported format character '(' (0x28) at index 54,)

  Apparently that character is a file separator, which I presume is an
  invisible character. I tried retyping the area in question, but with no
  avail (threw same error). Please advise. Complete code follows.


OP is barking up the wrong tree. file separator has ordinal 28
DECIMAL. Correct tree contains '(' (left parenthesis, ordinal 0x28
(HEX)) as the error message says.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: 3 byte network ordered int, How To ?

2010-01-06 Thread John Machin

On Jan 7, 5:33 am, Matthew Barnett mrabarn...@mrabarnett.plus.com
wrote:
 mudit tuli wrote:
  For a single byte, struct.pack('B',int)
  For two bytes, struct.pack('H',int)
  what if I want three bytes ?

 Four bytes and then discard the most-significant byte:

 struct.pack('I', int)[ : -1]

AARRGGHH! network ordering is BIGendian, struct.pack('. is
LITTLEendian
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parsing an Excel formula with the re module

2010-01-06 Thread John Machin

On Jan 6, 6:54 am, vsoler vicente.so...@gmail.com wrote:
 On 5 ene, 20:21, vsoler vicente.so...@gmail.com wrote:



  On 5 ene, 20:05, Mensanator mensana...@aol.com wrote:

   On Jan 5, 12:35 pm, MRAB pyt...@mrabarnett.plus.com wrote:

vsoler wrote:
 Hello,

 I am acessing an Excel file by means of Win 32 COM technology.
 For a given cell, I am able to read its formula. I want to make a map
 of how cells reference one another, how different sheets reference one
 another, how workbooks reference one another, etc.

 Hence, I need to parse Excel formulas. Can I do it by means only of re
 (regular expressions)?

 I know that for simple formulas such as =3*A7+5 it is indeed
 possible. What about complex for formulas that include functions,
 sheet names and possibly other *.xls files?

 For example    =Book1!A5+8 should be parsed into [=,Book1, !,
 A5,+,8]

 Can anybody help? Any suggestions?

Do you mean how or do you really mean whether, ie, get a list of the
other cells that are referred to by a certain cell, for example,
=3*A7+5 should give [A7] and =Book1!A5+8 should give [Book1!A5]

   Ok, although Book1 would be the default name of a workbook, with
   default
   worksheets labeled Sheet1. Sheet2, etc.

   If I had a worksheet named Sheety that wanted to reference a cell on
   Sheetx
   OF THE SAME WORKBOOK, it would be =Sheet2!A7. If the reference was to
   a completely
   different workbook (say Book1 with worksheets labeled Sheet1,
   Sheet2) then
   the cell might have =[Book1]Sheet1!A7.

   And don't forget the $'s! You may see =[Book1]Sheet1!$A$7.

  Yes, Mensanator, but...  what re should I use? I'm looking for the re
  statement. No doubt you can help!

  Thank you.

 Let me give you an example:

  import re
  re.split(([^0-9]), 123+456*/)

 [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’]

 I find it excellent that one single statement is able to do a lexical
 analysis of an expression!

That is NOT lexical analysis.

 If the expression contains variables, such as A12 or B9, I can try
 another re expression. Which one should I use?

 And if my expression contains parenthesis?   And the sin() function?

 You need a proper lexical analysis, followed by a parser. What you
are trying to do can NOT be accomplished in any generality with a
single regex. The Excel formula syntax has several tricky bits. E.g.
IIRC whether TAX09 is a (macro) name or a cell reference depends on
what version of Excel you are targetting but if it appears like TAX09!
A1:B2 then it's a sheet name.

The xlwt package (of which I am the maintainer) has a lexer and parser
for a largish subset of the syntax ... see  http://pypi.python.org/pypi/xlwt

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: TypeError

2010-01-06 Thread John Machin

On Jan 7, 11:14 am, John Machin sjmac...@lexicon.net wrote:
 On Jan 7, 3:29 am, MRAB pyt...@mrabarnett.plus.com wrote:

  Victor Subervi wrote:
   ValueError: unsupported format character '(' (0x28) at index 54
         args = (unsupported format character '(' (0x28) at index 54,)

   Apparently that character is a file separator, which I presume is an
   invisible character. I tried retyping the area in question, but with no
   avail (threw same error). Please advise. Complete code follows.

 OP is barking up the wrong tree. file separator has ordinal 28
 DECIMAL. Correct tree contains '(' (left parenthesis, ordinal 0x28
 (HEX)) as the error message says.

It took a bit of mucking about to get an example of that error message
(without reading the Python source code):

| anything = object()

\| foo%( % anything
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: format requires a mapping

| foo%( % {}
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: incomplete format key

| foo%2( % anything
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: unsupported format character '(' (0x28) at index 5

FWIW, the OP's message subject is TypeError but the reported message
contains ValueError ... possibly indicative of code that first builds
a format string (incorrectly) and then uses it with error messages
that can vary from run to run depending on exactly what was stuffed
into the format string.

I note that in the code shown there are examples of building an SQL
query where the table name is concocted at runtime via the %
operator ... key phrases: bad database design (one table per
store!), SQL injection attack

A proper traceback would be very nice ... at this stage it's not
certain what was the line of source that triggers the exception.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Astronomy--Programs to Compute Siderial Time?

2010-01-06 Thread John Machin

On Jan 7, 11:40 am, W. eWatson wolftra...@invalid.com wrote:
 W. eWatson wrote:
  Is there a smallish Python library of basic astronomical functions?
  There are a number of large such libraries that are crammed with
  excessive functions not needed for common calculations.

 It looks like I've entered a new era in my knowledge of Python.

Mild curiosity: this would be a wonderful outcome, but what makes it
look so?

 I found
 a module somewhat like I want, siderial.py. You can see an intro to it
 at http://infohost.nmt.edu/tcc/help/lang/python/examples/sidereal/ims//.
 It appears that I can get the code for it through section 1.2, near the
 bottom. I scooped it siderial.py up, and placed it in a corresponding
 file of the same name and type via NotePad. However, there is a xml file
 below it. I know little about it. I thought maybe I could do the same,
 but Notepad didn't like some characters in it. As I understand Python
 doc files are useful. So how do I get this done, and where do I put the
 files?

The file you need is sidereal.py, not your twice-mentioned siderial.py
(the existence of which on the referenced website is doubtful).

What you have been reading is the Internal maintenance
specification (large font, near the top of the page) for the module.
The xml file is the source of the docs, not meant to be user-legible.
A very tiny amount of googling sidereal.py (quotes included) leads
to the user documentation at 
http://infohost.nmt.edu/tcc/help/lang/python/examples/sidereal/

Where do you put the files? Well, we're now down to only one file,
sidereal.py, and you put it wherever you'd put any other module that
you'd like to call ... if there's only going to be one caller, put it
in the same directory as that caller's code. More generally, drop it
in YOUR_PYTHON_INSTALL_DIR/Lib/site-packages
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: TypeError

2010-01-06 Thread John Machin

On Jan 7, 1:38 pm, Steve Holden st...@holdenweb.com wrote:
 John Machin wrote:

 [...] I note that in the code shown there are examples of building an SQL
  query where the table name is concocted at runtime via the %
  operator ... key phrases: bad database design (one table per
  store!), SQL injection attack

 I'm not trying to defend the code overall, but most databases won't let
 you parameterize the table or column names, just the data values.

That's correct, and that's presumably why the OP is constructing whole
SQL statements on the fly e.g.

cursor.execute('select max(ID) from %sCustomerData;' % store)

What is the reason for but in but most databases won't ...? What
are you rebutting?

Let me try again: One table per store is bad design. The
implementation of that bad design may use:

cursor.execute('select max(ID) from %sCustomerData;' % store)
or (if available)
cursor.execute('select max(ID) from ?CustomerData;', (store, ))
but the implementation means is irrelevant.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Significant whitespace

2010-01-03 Thread John Machin

On Jan 2, 10:29 am, Roy Smith r...@panix.com wrote:


 To address your question more directly, here's a couple of ways Fortran
 treated whitespace which would surprise the current crop of
 Java/PHP/Python/Ruby programmers:

 1) Line numbers (i.e. the things you could GOTO to) were in column 2-7
 (column 1 was reserved for a comment indicator).  This is not quite
 significant whitespace, it's more like significant indentation.

That would also surprise former FORTRAN programmers (who rarely
referred to the language as Fortran). A comment was signified by a C
in col 1. Otherwise cols 1-5 were used for statement labels (the
things you could GOTO), col 6 for a statement continuation indicator,
cols 7-72 for statement text, and cols 73-80 for card sequence numbers.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: creating ZIP files on the cheap

2009-12-23 Thread John Machin

On Dec 24, 7:34 am, samwyse samw...@gmail.com wrote:
 I've got an app that's creating Open Office docs; if you don't know,
 these are actually ZIP files with a different extension.  In my case,
 like many other people, I generating from boilerplate, so only one
 component (content.xml) of my ZIP file will ever change.  Instead of
 creating the entire ZIP file each time, what is the cheapest way to
 accomplish my goal?  I'd kind-of like to just write the first part of
 the file as a binary blob, then write my bit, then write most of the
 table of contents as another blob, and finally write a TOC entry for
 my bit.  Has anyone ever done anything like this?  Thanks.

Option 1: set up a file that contains everything except the
content.xml. Then for each new file: copy the empty file, open the
copy with zipfile (mode 'a') and write your content.xml. This at least
is understandable and maintainable.

Option 2 (recommended): insert some timing apparatus into your script.
How much time is taken by the template stuff? Is it worth chancing
your arm on getting the binary blob stuff correct? Is it
maintainable? I.e. pretend that the next person to maintain your code
knows where you live and owns a chainsaw.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: dictionary with tuple keys

2009-12-15 Thread John Machin

Ben Finney ben+python at benfinney.id.au writes:
 
 In this case, I'll use ‘itertools.groupby’ to make a new sequence of
 keys and values, and then extract the keys and values actually wanted.

Ah, yes, Zawinski revisited ... itertools.groupby is the new regex :-)
 
 Certainly it might be clearer if written as one or more loops, instead
 of iterators. But I find the above relatively clear, and using the
 built-in iterator objects will likely make for a less buggy
 implementation.

Relative clarity like relative beauty is in the eye of the beholder,
and few parents have ugly children :-)

The problem with itertools.groupby is that unlike SQL's GROUP BY
it needs sorted input. The OP's requirement (however interpreted)
can be met without sorting.

Your interpretation can be implemented simply:

from collections import defaultdict
result = defaultdict(list)
for key, value in foo.iteritems():
result[key[:2]].append(value)

-- 
http://mail.python.org/mailman/listinfo/python-list

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3057 matches

Mail list logo