For the invalid UTF-8 characters, if you know the encoding of the source
document, you can specify it on load using the <encoding> option of
xdmp:document-load.  You can also use the <repair> option if you have
XML repair issues.  For example, if you know the content is encoded
using ISO-8859-1, you can try something like this:

xdmp:document-load("C:\sam2\78.xml", 
    <options xmlns="xdmp:document-load">
      <uri>/documents/myFile.xml</uri>
      <repair>full</repair>
      <encoding>ISO-8859-1</encoding>
    </options>)

Another thing that might help is xdmp:tidy.  It often does a good job of
cleaning up XML when you use the <input-xml>yes</input-xml> option.

-Danny


From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Vijayasekar Palaniswamy
Sent: Thursday, September 04, 2008 12:57 AM
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] Content Repair Functionality

Hello,

XDMP-DOCUTF8SEQ: Invalid UTF-8 escape sequence at C:\sam2\78.xml line 1
-- document is not UTF-8 encoded in /load_docs.xqy

XDMP-STARTTAGCHAR: Unexpected character "&quot;" in start tag at
C:\sam2\72.xml line 1 in /load_docs.xqy

XDMP-DOCHEXCHARREF: Invalid hex character reference "0019" at
C:\sam2\21.xml line 1 in /load_docs.xqy

these are the errors thrown by MarkLogic, when i tried to upload XML
files. Is it possible to repair these errors using Content Repair
Functionality?

-- 
Regards,

Vijayasekar. P,
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to