Hi Danny,

Are there ways to pre-read the document as a string or binary (from Xquery), 
get the encoding from the declaration by using straigh forward functions, and 
use that as the value for the encoding option to a call to xdmp:document-get to 
read the document with the correct encoding?

I could pre-parse the files outside MarkLogic Server, or rely on things like 
MLJAM, but I would prefer not needing to.

Has it been considered to do support the xml declaration for this purpose, for 
instance when the xdmp:document-get was called without an explicit encoding 
option? If not, would you be willing to consider such addition? I really think 
it would improve the value.

Kind regards,
Geert

> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of 
> Danny Sokolsky
> Sent: woensdag 25 maart 2009 16:43
> To: General Mark Logic Developer Discussion
> Subject: RE: [MarkLogic Dev General] Importing xml with 
> unpredictable encoding
> 
> Hi Geert,
> 
> You can specify the encoding with the <encoding> option to 
> xdmp:document-get or xdmp:document-load.  You do have to know 
> the encoding though--it will not use an encoding in a header 
> of the document on its own, and will default to UTF-8.  
> 
> -Danny
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of 
> Geert Josten
> Sent: Wednesday, March 25, 2009 6:07 AM
> To: General Mark Logic Developer Discussion
> Subject: [MarkLogic Dev General] Importing xml with 
> unpredictable encoding
> 
> Hi,
> 
> Is it correct that the MarkLogic built-in functions 
> xdmp:document-load and xdmp:document-get do not respect the 
> encoding specification in the XML declaration? They expect 
> UTF-8 by default and otherwise try to consume the file with 
> the encoding specified in the options. Is there a way to 
> anticipate on the encoding in the XML declaration?
> 
> I tried using something like xdmp:filesystem-file and (rather 
> ugly) try parsing the string with string functions, but it 
> chokes with the message that the string contains a bad 
> codepoint (SVC-BAD: ... -- Bad CodepointIterator::_next).
> 
> Any ideas?
> 
> Kind regards,
> Geert
> 
> 
> Drs. G.P.H. Josten
> Consultant
> 
> 
> http://www.daidalos.nl/
> Daidalos BV
> Source of Innovation
> Hoekeindsehof 1-4
> 2665 JZ Bleiswijk
> Tel.: +31 (0) 10 850 1200
> Fax: +31 (0) 10 850 1199
> http://www.daidalos.nl/
> KvK 27164984
> De informatie - verzonden in of met dit emailbericht - is 
> afkomstig van Daidalos BV en is uitsluitend bestemd voor de 
> geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, 
> verzoeken wij u het te verwijderen. Aan dit bericht kunnen 
> geen rechten worden ontleend.
> 
> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to