> > Please don't propose a grand rewrite (even it's only a single module). > > Given that the API is mostly sensible, please propose gradual > > refactoring of the implementation, perhaps some new API methods, and > > so on. Don't throw away the work that went into making it work in the > > first place! > > Well, I didn't necessarily mean it should be thrown away and started > from scratch
Well, you *did* say "rewrite". :-) > -- however, once you get all the ugly out of it, there's > not much left! Obviously there's something wrong with the way it's > written if it took years and *several passes* to correctly identify and > fix a simple format character case bug. Most of this can be blamed on > the struct module, which is more obscure and error-prone than writing > the same code in C. I think the reason is different -- it just hasn't had all that much use beyond the one use case for which it was written (zipping up the Python library). Also, don't underestimate the baroqueness of the zip spec. > One of the most useful things that could happen to the zipfile module > would be a stream interface for both reading and writing. Right now > it's slow and memory hungry when dealing with large chunks. The use > case that lead me to fix this bug is a tool that archives video to zip > files of targa sequences with a reference QuickTime movie.. so I end up > with thousands of bite sized chunks. Sounds like a use case nobody else has tried yet. > This >2GB bug really caused me some grief in that I didn't test with > such large sequences because I didn't have any. I didn't end up > finding out about it until months later because client *ignored* the > exceptions raised by the GUI and came back to me with broken zip files. > Fortunately the TOC in a zip file can be reconstructed from an > otherwise pristine stream. Of course, I had to rewrite half of the > zipfile module to come up with such a recovery program, because it's > not designed well enough to let me build such a tool on top of it. Given more typical use cases for zip files (sending around collections of source files) I'm not surprised that a bug that only occurs for files >2GB remained hidden for so long. I don't remember if you have Python CVS permissions, but you sound like you really know the module as well as the zip file spec, so I'm hoping that you'll find the time to do some reconstructive surgery on the zip module for Python 2.5, without breaking the existing APIs. I like the idea you have for a stream API; I recall that the one time I had to use it I was surprised that the API dealt with files as string buffers exclusively. > Another "bug" I ran into was that it has some crazy default for the > ZipInfo record: it assumes the platform ("create_system") is Windows > regardless of where you are! I vaguely recall that the initial author was a Windows-head; perhaps he didn't realize how useful the module would be on other platforms, or that it would make any difference at all. > This caused some really subtle and > annoying issues with some unzip tools (of course, on everyone's > machines except mine). Fortunately someone was able to figure out why > and send me a patch, but it was completely unexpected and I didn't see > such craziness documented anywhere. If it weren't for this patch, it'd > either still be broken, or I'd have switched to some other way of > creating archives! > > The zipfile module is good enough to create input files for zipimport.. > which is well tested and generally works -- barring the fact that > zipimport has quite a few rough edges of its own. I certainly wouldn't > recommend it for any heavy duty tasks in its current state. So, please fix it! -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com