On Wed, 2009-02-04 at 13:31 +0000, Simon Marlow wrote: > Duncan Coutts wrote: > > On Tue, 2009-02-03 at 11:03 -0600, John Goerzen wrote: > > > >> Will there also be something to handle the UTF-16 BOM marker? I'm not > >> sure what the best API for that is, since it may or may not be present, > >> but it should be considered -- and could perhaps help autodetect encoding. > > > > I think someone else mentioned this already, but utf16 (as opposed to > > utf16be/le) will use the BOM if its present. > > > > I'm not quite sure what happens when you switch encoding, presumably > > it'll accept and consider a BOM at that point. > > Yes; the utf16 and utf32 encodings accept a BOM (and generate a BOM in > write mode). This caused interesting bugs when doing re-decoding after > switching encodings, because the BOM constitutes state in the decoder, > which means that decoding is not necessarily repeatable unless you save the > state (which iconv doesn't provide a way to do). > > Are there other encodings that have this kind of state? If so, then they > might be restricted to NoBuffering at least when switching encodings.
Yes, I believe there are some Asian encodings that are stateful. Duncan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe