On Tue, Aug 14, 2012 at 11:20 PM, Glenn Maynard <gl...@zewt.org> wrote: > On Tue, Jul 17, 2012 at 9:23 PM, Andrea Marchesini <b...@mozilla.com> wrote: > >> // The getFilenames handler receives a list of DOMString: >> var handle = this.reader.getFile(this.result[i]); > > This interface is problematic. Since ZIP files don't have a standard > encoding, filenames in ZIPs are often garbage. This API requires that > filenames round-trip uniquely, or else files aren't accessible t all.
Indeed, in the case of zip files, file names themselves are dangerous as handles that get past passed back and forth, so it seems like a good idea to be able to extract the contents of a file inside the archive without having to address the file by name. As for the filenames, after an off-list discussion, I think the best solution is that UTF-8 is tried first but the ArchiveReader constructor takes an optional second argument that names a character encoding from the Encoding Standard. This will be known as the fallback encoding. If no fallback encoding is provided by the caller of the constructor, "Windows-1252" is set as the fallback encoding. When it ArchiveReader processes a filename from the zip archive, it first tests if the byte string is a valid UTF-8 string. If it is, the byte string is interpreted as UTF-8 when converting to UTF-16. If the filename is not a valid UTF-8 string, it is decoded into UTF-16 using the fallback encoding. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/