Re: [Python-Dev] csv module TODO list
Andrew McNamara wrote: There's a bunch of jobs we (CSV module maintainers) have been putting off - attached is a list (in no particular order): * unicode support (this will probably uglify the code considerably). Can you please elaborate on that? What needs to be done, and how is that going to be done? It might be possible to avoid considerable uglification. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] csv module TODO list
Martin v. Löwis wrote: Andrew McNamara wrote: There's a bunch of jobs we (CSV module maintainers) have been putting off - attached is a list (in no particular order): * unicode support (this will probably uglify the code considerably). Can you please elaborate on that? What needs to be done, and how is that going to be done? It might be possible to avoid considerable uglification. Indeed. The trick is to convert to Unicode early and to use Unicode literals instead of string literals in the code. Note that the only real-life Unicode format in use is UTF-16 (with BOM mark) written by Excel. Note that there's no standard for specifying the encoding in CSV files, so this is also the only feasable format. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 05 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.3.5 schedule, and something I'd like to get in
On 5-jan-05, at 9:33, Martin v. Löwis wrote: Bob Ippolito wrote: It doesn't for reasons I care not to explain in depth, again. Search the pythonmac-sig archives for longer explanations. The gist is that you specifically do not want to link directly to the framework at all when building extensions. Because an Apple-built extension then may pick up a user-installed Python? Why can this problem not be solved by adding -F options, as Jack Jansen proposed? It gets worse when you have a user-installed python 2.3 and a user-installed python 2.4. Those will be both be installed as /Library/Frameworks/Python.framework. This means that you cannot use the -F flag to select which one you want to link to, '-framework Python' will only link to the python that was installed the latest. This is an issue on Mac OS X 10.2. Ronald ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] csv module TODO list
Andrew McNamara wrote: There's a bunch of jobs we (CSV module maintainers) have been putting off - attached is a list (in no particular order): * unicode support (this will probably uglify the code considerably). Martin v. Löwis wrote: Can you please elaborate on that? What needs to be done, and how is that going to be done? It might be possible to avoid considerable uglification. I'm not altogether sure there. The parsing state machine is all written in C, and deals with signed chars - I expect we'll need two versions of that (or one version that's compiled twice using pre-processor macros). Quite a large job. Suggestions gratefully received. M.-A. Lemburg wrote: Indeed. The trick is to convert to Unicode early and to use Unicode literals instead of string literals in the code. Yes, although it would be nice to also retain the 8-bit versions as well. Note that the only real-life Unicode format in use is UTF-16 (with BOM mark) written by Excel. Note that there's no standard for specifying the encoding in CSV files, so this is also the only feasable format. Yes - that's part of the problem I hadn't really thought about yet - the csv module currently interacts directly with files as iterators, but it's clear that we'll need to decode as we go. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] csv module TODO list
Andrew McNamara wrote: Andrew McNamara wrote: There's a bunch of jobs we (CSV module maintainers) have been putting off - attached is a list (in no particular order): * unicode support (this will probably uglify the code considerably). Martin v. Löwis wrote: Can you please elaborate on that? What needs to be done, and how is that going to be done? It might be possible to avoid considerable uglification. I'm not altogether sure there. The parsing state machine is all written in C, and deals with signed chars - I expect we'll need two versions of that (or one version that's compiled twice using pre-processor macros). Quite a large job. Suggestions gratefully received. M.-A. Lemburg wrote: Indeed. The trick is to convert to Unicode early and to use Unicode literals instead of string literals in the code. Yes, although it would be nice to also retain the 8-bit versions as well. You can do so by using latin-1 as default encoding. Works great ! Note that the only real-life Unicode format in use is UTF-16 (with BOM mark) written by Excel. Note that there's no standard for specifying the encoding in CSV files, so this is also the only feasable format. Yes - that's part of the problem I hadn't really thought about yet - the csv module currently interacts directly with files as iterators, but it's clear that we'll need to decode as we go. Depends on your needs: CSV files tend to be small enough to do the decoding in one call in memory. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 05 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] csv module TODO list
Yes, although it would be nice to also retain the 8-bit versions as well. You can do so by using latin-1 as default encoding. Works great ! Yep, although that means we wear the cost of decoding and encoding for all 8 bit input. What does the _sre.c code do? Depends on your needs: CSV files tend to be small enough to do the decoding in one call in memory. We are routinely dealing with multi-gigabyte csv files - which is why the original 2001 vintage csv module was written as a C state machine. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] csv module TODO list
Andrew McNamara wrote: Yes, although it would be nice to also retain the 8-bit versions as well. You can do so by using latin-1 as default encoding. Works great ! Yep, although that means we wear the cost of decoding and encoding for all 8 bit input. Right, but it makes the code very clean and straight forward. Again, it depends on what you need. If performance is critical then you probably need a C version written using the same trick as _sre.c... What does the _sre.c code do? It comes in two versions: one for 8-bit the other for Unicode. Depends on your needs: CSV files tend to be small enough to do the decoding in one call in memory. We are routinely dealing with multi-gigabyte csv files - which is why the original 2001 vintage csv module was written as a C state machine. I see, but are you sure that the typical Python user will have the same requirements to make it worth the effort (and complexity) ? I've written a few CSV parsers and writers myself over the years and the requirements were different every time, in terms of being flexible in the parsing phase, the interfaces and the performance needs. Haven't yet found a one fits all solution and don't really expect to any more :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 05 2005) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] csv module TODO list
Yep, although that means we wear the cost of decoding and encoding for all 8 bit input. Right, but it makes the code very clean and straight forward. I agree it makes for a very clean solution, and 99% of the time I'd chose that option. Again, it depends on what you need. If performance is critical then you probably need a C version written using the same trick as _sre.c... What does the _sre.c code do? It comes in two versions: one for 8-bit the other for Unicode. That's what I thought. I think the motivations here are similar to those that drove the _sre developers. We are routinely dealing with multi-gigabyte csv files - which is why the original 2001 vintage csv module was written as a C state machine. I see, but are you sure that the typical Python user will have the same requirements to make it worth the effort (and complexity) ? This is open source, so I scratch my own itch (and that of my employers) - we need fast csv parsing more than we need unicode... 8-) Okay, assuming we go the produce two versions via evil macro tricks path, it's still not quite the same situation as _sre.c, which only has to deal with the internal unicode representation. One way to approach this would be to add an encoding keyword argument to the readers and writers. If given, the parser would decode the input stream to the internal representation before passing it through the unicode state machine, which would yield tuples of unicode objects. That leaves us with a bit of a problem where the source is already unicode (eg, a list of unicode strings)... hmm. -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.3.5 schedule, and something I'd like to get in
Martin v. Löwis [EMAIL PROTECTED] writes: Bob Ippolito wrote: It doesn't for reasons I care not to explain in depth, again. Search the pythonmac-sig archives for longer explanations. The gist is that you specifically do not want to link directly to the framework at all when building extensions. Because an Apple-built extension then may pick up a user-installed Python? Why can this problem not be solved by adding -F options, as Jack Jansen proposed? This is not the wrong way to do it. I'm not convinced. Martin, can you please believe that Jack, Bob, Ronald et al know what they are talking about here? Cheers, mwh -- Q: Isn't it okay to just read Slashdot for the links? A: No. Reading Slashdot for the links is like having just one hit off the crack pipe. -- http://www.cs.washington.edu/homes/klee/misc/slashdot.html#faq ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.3.5 schedule, and something I'd like to get in
On Jan 5, 2005, at 3:33 AM, Martin v. Löwis wrote: Bob Ippolito wrote: It doesn't for reasons I care not to explain in depth, again. Search the pythonmac-sig archives for longer explanations. The gist is that you specifically do not want to link directly to the framework at all when building extensions. Because an Apple-built extension then may pick up a user-installed Python? Why can this problem not be solved by adding -F options, as Jack Jansen proposed? This is not the wrong way to do it. I'm not convinced. Then you haven't done the appropriate research by searching pythonmac-sig. Do you even own a Mac? -bob ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] ast branch pragmatics
I think it would be easier to create a new branch from the current head, integrate the small number of changed files from ast-branch, and work with that branch instead. The idea is that it's an end-run around doing an automatic CVS merge and relying on someone to manually merge the changes. At the same time, since there is a groundswell of support for finishing the AST work, I'd like to propose that we stop making compiler / bytecode changes until it is done. Every change to compile.c or the bytecode ends up creating a new incompatibilty that needs to be merged. If these two plans sound good, I'll get started on the new branch. +1 -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.3.5 schedule, and something I'd like to get in
On Jan 5, 2005, at 18:46, Martin v. Löwis wrote: Bob Ippolito wrote: I just dug up some information I had written on this particular topic but never published, if you're interested: http://bob.pythonmac.org/archives/2005/01/05/versioned-frameworks- considered-harmful/ Interesting. I don't get the part why -undefined dynamic_lookup is a good idea (and this is indeed what bothered me most to begin with). As you say, explicitly specifying the target .dylib should work as well, and it also does not require 10.3. Without -undefined dynamic_lookup, your Python extensions are bound to a specific Python installation location (i.e. the system 2.3.0 and a user-installed 2.3.4). This tends to be quite a problem. With -undefined dynamic_lookup, they are not. Just search for version mismatch on pythonmac-sig: http://www.google.com/search?q=%22version+mismatch%22+pythonmac- sig+site:mail.python.orgie=UTF-8oe=UTF-8 -bob ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
RE: [Python-Dev] an idea for improving struct.unpack api
[Ilya Sandler] A problem: The current struct.unpack api works well for unpacking C-structures where everything is usually unpacked at once, but it becomes inconvenient when unpacking binary files where things often have to be unpacked field by field. Then one has to keep track of offsets, slice the strings,call struct.calcsize(), etc... Yes. That bites. Eg. with a current api unpacking of a record which consists of a header followed by a variable number of items would go like this hdr_fmt= item_fmt= item_size=calcsize(item_fmt) hdr_size=calcsize(hdr_fmt) hdr=unpack(hdr_fmt, rec[0:hdr_size]) #rec is the record to unpack offset=hdr_size for i in range(hdr[0]): #assume 1st field of header is a counter item=unpack( item_fmt, rec[ offset: offset+item_size]) offset+=item_size which is quite inconvenient... A solution: We could have an optional offset argument for unpack(format, buffer, offset=None) the offset argument is an object which contains a single integer field which gets incremented inside unpack() to point to the next byte. so with a new API the above code could be written as offset=struct.Offset(0) hdr=unpack(, offset) for i in range(hdr[0]): item=unpack( , rec, offset) When an offset argument is provided, unpack() should allow some bytes to be left unpacked at the end of the buffer.. Does this suggestion make sense? Any better ideas? Rather than alter struct.unpack(), I suggest making a separate class that tracks the offset and encapsulates some of the logic that typically surrounds unpacking: r = StructReader(rec) hdr = r('') for item in r.getgroups('', times=rec[0]): . . . It would be especially nice if it handled the more complex case where the next offset is determined in-part by the data being read (see the example in section 11.3 of the tutorial): r = StructReader(open('myfile.zip', 'rb')) for i in range(3): # show the first 3 file headers fields = r.getgroup('LLLHH', offset=14) crc32, comp_size, uncomp_size, filenamesize, extra_size = fields filename = g.getgroup('c', offset=16, times=filenamesize) extra = g.getgroup('c', times=extra_size) r.advance(comp_size) print filename, hex(crc32), comp_size, uncomp_size If you come up with something, I suggest posting it as an ASPN recipe and then announcing it on comp.lang.python. That ought to generate some good feedback based on other people's real world issues with struct.unpack(). Raymond Hettinger ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com