Hi Josh,
Your first line doesn’t show where the BOM is located. It should be the first two characters of the first line. Note: the encoding attribute in the XML pi, doesn’t ensure it really is written in that encoding, though is a strong suggestion usually. Particularly if the file is written with XML tools/libraries. Not sure either MarkLogic handles the BOM well, but I did think so. I thought I uploaded UTF-8 files with BOM without problems. But changing the encoding of the file on the fly to match that of the MarkLogic app server setting is a good workaround too I guess. Kind regards, Geert *Van:* [email protected] [mailto: [email protected]] *Namens *Josh Warner-Burke *Verzonden:* woensdag 8 februari 2012 22:49 *Aan:* [email protected] *Onderwerp:* [MarkLogic Dev General] BOM char and UTF-16 I emailed about a week ago about a problem I was having with XCC and large files. I got some very good advice which said I needed to use session.insertContent to get the file in. I'm done with that conversion but dealing with the resulting problems due to the change. What I'm looking at right now is a file that is UTF-16 and begins with two BOM characters - which I have learned are actually relevant in telling any string parser/consumer what order the bytes in each pair will be... I wrote some code that strips out the BOMs but it seems to screw the encoding up altogether. I also put in code to set the encoding to UTF16 in the ContentCreateOptions. Without stripping BOMs, I get this: Invalid root text "ÿþ" at [uri] line 1 To deal with UTF-16 don't you *need those BOMs? What am I missing here? FYI the first line of the files looks like: <?xml version="1.0" encoding="UTF-16" standalone="yes"?> So it's clearly utf-16. There is some leeway in terms of how I create the Content object to feed to insertContent - currently I'm treating it as a byte[] - but I could do string conversion etc if that's what I need to do. Any help is appreciated. -- Josh Warner-Burke 42SIX Solutions (m): 410-493-4362 (e): [email protected] http://www.42six.com
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
