[MarkLogic Dev General] How to handle named HTMLcharacter entities when loading an ISO-8859-1 encoded document into MarkLogic?

Tim Meagher Mon, 05 Jul 2010 03:21:40 -0700

Hi Folks,


I am using xdmp:document-load to insert content into MarkLogic.  Until
recently I had only been loading UTF-8 XML into the database, but recently
started encountering some ISO-8859-1 encoded content.  I was able to adjust
the xdmp:document-load options to accommodate ISO-8859-1 and for the most
part it has been working okay; however, the ISO-8859-1 content occasionally
includes HTML character entities such as &sim; which appears to be causing
the load to fail (which subsequently is generating an XDMP-DOCUNEOF error
message when the error is not trapped with a try-catch block but generates
an XDMP-DOCENTITYREF error message when the error is trapped with a
try-catch block). 

 

Is there a simple way to add a list of character entity mappings to get this
to work?  For example, I've read that &sim; maps to the Unicode character
U+0223C <http://www.fileformat.info/info/unicode/char/223c/index.htm>
(http://code.google.com/p/doctype/wiki/SimCharacterEntity).

 

Thanks ahead of time for any help with this!

 

Tim Meagher

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] How to handle named HTMLcharacter entities when loading an ISO-8859-1 encoded document into MarkLogic?

Reply via email to