I have a file I'm reading that has segments of potentially UTF-16, UTF-8, etc encoding in various fields. I am reading these fields and getting the right encoding, but when I encounter a UTF encoded field they start with a BOM (0xfffe or 0xfeff). This is important for UTF-16 so it can figure out the byte order and recognize the encoding as UTF-16 data. However, when I read the UTF string in using:
var title : String = stream.readMultiByte( size, textEncoding ); The BOM is included in the title string. I figured adobe would strip that off, but it's making into the final string. So to illustrate what I mean if the word is "Stay" if I do the following code: assertEquals( "Stay".length, title.length ); It will fail saying 5 is not equal to 4! Even though when you look at the string in the debugger it says "Stay" for title. However, when I send that across the wire I can see the BOM is included in that title. So my question is am I reading UTF data correctly or not? Is there a generally prescribed way of getting rid of the BOM? I do have two text encodings I'm looking for "utf-16" and "utf-16be" for little endian utf-16 and big endian utf-16 respectively. And it appears that the encoding field matches the utf-16/utf-16be according to the BOM. I'm not sure if I should parse the BOM myself, and select either encoding based on the BOM's value or not. Thanks Charlie
