Hi Folks,

 

I am using xdmp:document-load to insert content into MarkLogic.  Until
recently I had only been loading UTF-8 XML into the database, but recently
started encountering some ISO-8859-1 encoded content.  I was able to adjust
the xdmp:document-load options to accommodate ISO-8859-1 and for the most
part it has been working okay; however, the ISO-8859-1 content occasionally
includes HTML character entities such as ∼ which appears to be causing
the load to fail (which subsequently is generating an XDMP-DOCUNEOF error
message when the error is not trapped with a try-catch block but generates
an XDMP-DOCENTITYREF error message when the error is trapped with a
try-catch block). 

 

Is there a simple way to add a list of character entity mappings to get this
to work?  For example, I've read that ∼ maps to the Unicode character
U+0223C <http://www.fileformat.info/info/unicode/char/223c/index.htm>
(http://code.google.com/p/doctype/wiki/SimCharacterEntity).

 

Thanks ahead of time for any help with this!

 

Tim Meagher

 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to