Not sure if this will fix your problem, but I had a similar problem a while ago and Mike's function solved it for me:
http://marklogic.markmail.org/search/?q=UTF-8#query:UTF-8%20from%3A%22Michael%20Blakeley%22+page:1+mid:dg7nu6sfjchpocm4+state:results -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Neil Bradley Sent: Thursday, December 03, 2009 12:42 PM To: 'General Mark Logic Developer Discussion' Subject: RE: [MarkLogic Dev General] Upload Data via Form - Invalid UTF-8 Escape Sequence Danny, Thanks for the suggestion, I had never spotted the options to xdmp:quote before, but unfortunately that still did not help. I tried "ASCII" and "ISO-8859-1", which are both valid values for the output-encoding parameter, but neither had any effect on the error message I am getting. Neil. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Danny Sokolsky Sent: 03 December 2009 19:25 To: General Mark Logic Developer Discussion Subject: RE: [MarkLogic Dev General] Upload Data via Form - Invalid UTF-8 Escape Sequence I am not sure if this will work, but you can try using the <output-encoding> option to xdmp:quote. Something like: text { xdmp:quote( xdmp:get-request-field("upload"), <options xmlns="xdmp:quote"> <output-encoding>ASCII</output-encoding> </options> ) } -Danny From: [email protected] [mailto:[email protected]] On Behalf Of Neil Bradley Sent: Thursday, December 03, 2009 1:30 AM To: [email protected] Subject: [MarkLogic Dev General] Upload Data via Form - Invalid UTF-8 Escape Sequence Hi, I have a requirement to import data from spreadsheets and databases, using tab-separated text format, which I convert to XML. The problem I am having occurs when the source data comes from Excel and contains a pound symbol (or, I suspect, any character with an ASCII value above 127). Initially, the problem was that the text file was not recognised by the browser as text, so it came in as "application/octet-stream" instead of "text/plain", but I solved that using the following technique: text { xdmp:quote( xdmp:get-request-field("upload") ) } That solved the problem when the pound symbol was not in the data, (and also works when the data arrives in "plain/text" format, so covers both scenarios). But when the pound symbols was present, I got the following error: XDMP-UTF8SEQ: xdmp:quote(binary{"46756e64204e616d650944617465094e65742041737365742056616c7 5650944..."}) -- Invalid UTF-8 escape sequence in /test/UploadData.xqy, on line 61 [1.0-ml] Now, I have opened the file I am uploading in TextPad, which tells me it is a PC format ANSI text file, so I guess that might explain the UTF-8 error. The document is NOT in UTF 8. So I think it converting from ANSI to UTF-8. Any idea how to do that in this form-upload scenario? Thanks Neil. _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
