Hi Danny, Are there ways to pre-read the document as a string or binary (from Xquery), get the encoding from the declaration by using straigh forward functions, and use that as the value for the encoding option to a call to xdmp:document-get to read the document with the correct encoding?
I could pre-parse the files outside MarkLogic Server, or rely on things like MLJAM, but I would prefer not needing to. Has it been considered to do support the xml declaration for this purpose, for instance when the xdmp:document-get was called without an explicit encoding option? If not, would you be willing to consider such addition? I really think it would improve the value. Kind regards, Geert > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of > Danny Sokolsky > Sent: woensdag 25 maart 2009 16:43 > To: General Mark Logic Developer Discussion > Subject: RE: [MarkLogic Dev General] Importing xml with > unpredictable encoding > > Hi Geert, > > You can specify the encoding with the <encoding> option to > xdmp:document-get or xdmp:document-load. You do have to know > the encoding though--it will not use an encoding in a header > of the document on its own, and will default to UTF-8. > > -Danny > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of > Geert Josten > Sent: Wednesday, March 25, 2009 6:07 AM > To: General Mark Logic Developer Discussion > Subject: [MarkLogic Dev General] Importing xml with > unpredictable encoding > > Hi, > > Is it correct that the MarkLogic built-in functions > xdmp:document-load and xdmp:document-get do not respect the > encoding specification in the XML declaration? They expect > UTF-8 by default and otherwise try to consume the file with > the encoding specified in the options. Is there a way to > anticipate on the encoding in the XML declaration? > > I tried using something like xdmp:filesystem-file and (rather > ugly) try parsing the string with string functions, but it > chokes with the message that the string contains a bad > codepoint (SVC-BAD: ... -- Bad CodepointIterator::_next). > > Any ideas? > > Kind regards, > Geert > > > Drs. G.P.H. Josten > Consultant > > > http://www.daidalos.nl/ > Daidalos BV > Source of Innovation > Hoekeindsehof 1-4 > 2665 JZ Bleiswijk > Tel.: +31 (0) 10 850 1200 > Fax: +31 (0) 10 850 1199 > http://www.daidalos.nl/ > KvK 27164984 > De informatie - verzonden in of met dit emailbericht - is > afkomstig van Daidalos BV en is uitsluitend bestemd voor de > geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, > verzoeken wij u het te verwijderen. Aan dit bericht kunnen > geen rechten worden ontleend. > > > > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
