On Sat, Sep 12, 2015 at 01:40:29AM +0200, Damian Lukowski wrote: > The Encode::Unicode documentation states the following: > > When BE or LE is omitted during decode(), it checks if BOM is at the > beginning of the string; if one is found, the endianness is set to what > the BOM says. If no BOM is found, the routine dies. > > To reproduce: > --- > use Encode qw/decode/; > decode("utf-16be", "Hello World"); # does not die > decode("utf-16le", "Hello World"); # does not die > decode("utf-16", "\xFE\xFFHello World"); # does not die > decode("utf-16", "Hello World"); # dies with "UTF-16:Unrecognised BOM" > --- > > Unicode Standard version 8.0: > > The UTF-16 encoding scheme may or may not begin with a BOM. However, > when there is no BOM, and in the absence of a higher-level protocol, the > byte order of the UTF-16 encoding scheme is big-endian. > > RFC2781: > > If the first two octets of the text is not 0xFE followed by > 0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be > interpreted as being big-endian.
Thanks for the bug report; I've added your patch to the upstream bug report, and will await comment by them. Dominic.