Re: [whatwg] Archive API - proposal

Henri Sivonen Wed, 15 Aug 2012 04:14:24 -0700

On Tue, Aug 14, 2012 at 11:20 PM, Glenn Maynard <gl...@zewt.org> wrote:
> On Tue, Jul 17, 2012 at 9:23 PM, Andrea Marchesini <b...@mozilla.com> wrote:
>
>> // The getFilenames handler receives a list of DOMString:
>> var handle = this.reader.getFile(this.result[i]);
>
> This interface is problematic.  Since ZIP files don't have a standard
> encoding, filenames in ZIPs are often garbage.  This API requires that
> filenames round-trip uniquely, or else files aren't accessible t all.


Indeed, in the case of zip files, file names themselves are dangerous
as handles that get past passed back and forth, so it seems like a
good idea to be able to extract the contents of a file inside the
archive without having to address the file by name.

As for the filenames, after an off-list discussion, I think the best
solution is that UTF-8 is tried first but the ArchiveReader
constructor takes an optional second argument that names a character
encoding from the Encoding Standard. This will be known as the
fallback encoding. If no fallback encoding is provided by the caller
of the constructor, "Windows-1252" is set as the fallback encoding.
When it ArchiveReader processes a filename from the zip archive, it
first tests if the byte string is a valid UTF-8 string. If it is, the
byte string is interpreted as UTF-8 when converting to UTF-16. If the
filename is not a valid UTF-8 string, it is decoded into UTF-16 using
the fallback encoding.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Archive API - proposal

Reply via email to