Hi Tim,

You can pass in a format option to get the specific file, details in the api 
description: 
http://developer.marklogic.com/pubs/4.2/apidocs/Document-Conversion.html#xdmp:zip-get

It is not recommended to fix the file with string manipulation though. Perhaps 
the repair option is a better option. Best ofcourse would be to fix the problem 
at the source, but that is perhaps not an option in your case..

Kind regards,
Geert

> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of 
> Tim Meagher
> Sent: dinsdag 26 oktober 2010 15:52
> To: 'General Mark Logic Developer Discussion'
> Cc: 'Asheesh Mangla'
> Subject: Re: [MarkLogic Dev General] Can vaidation of XML 
> docs in a zipfileextraction be disabled?
> 
> Hi Geert,
> 
>  
> 
> Hmm ... you're right - there is some bad text at the end of 
> this file that is contributing to the problem, and this 
> particular document is not a well-formed XML document.
> 
>  
> 
> Any suggestions for extracting it as a non-XML document (e.g. 
> UTF-8 text) so that it can be corrected and subsequently 
> saved as an XML document?
> 
>  
> 
> Thanks!
> 
>  
> 
> Tim
> 
>  
> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of 
> Geert Josten
> Sent: Tuesday, October 26, 2010 9:33 AM
> To: General Mark Logic Developer Discussion
> Cc: 'Asheesh Mangla'
> Subject: Re: [MarkLogic Dev General] Can vaidation of XML 
> docs in a zipfileextraction be disabled?
> 
>  
> 
> Hi Tim,
> 
>  
> 
> Are you sure this is a validation error message? Could it be 
> that the zip file contains a mixture of xml and non-xml, and 
> that you are trying to extract a file from the zip as xml 
> while it is actually non-xml?
> 
>  
> 
> Kind regards,
> 
> Geert
> 
>  
> 
> > 
> 
>  
> 
>  
> 
> drs. G.P.H. (Geert) Josten
> 
> Consultant
> 
>  
> 
> Daidalos BV
> 
> Hoekeindsehof 1-4
> 
> 2665 JZ Bleiswijk
> 
>  
> 
> T +31 (0)10 850 1200
> 
> F +31 (0)10 850 1199
> 
>  
> 
> mailto:[email protected]
> 
> http://www.daidalos.nl/
> 
>  
> 
> KvK 27164984
> 
>  
> 
>  
> 
> De informatie - verzonden in of met dit e-mailbericht - is 
> afkomstig van Daidalos BV en is uitsluitend bestemd voor de 
> geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, 
> verzoeken wij u het te verwijderen. Aan dit bericht kunnen 
> geen rechten worden ontleend.
> 
>  
> 
> > From: [email protected]
> 
> > [mailto:[email protected]] On Behalf Of
> 
> > Tim Meagher
> 
> > Sent: dinsdag 26 oktober 2010 15:15
> 
> > To: 'General Mark Logic Developer Discussion'
> 
> > Cc: 'Asheesh Mangla'
> 
> > Subject: [MarkLogic Dev General] Can vaidation of XML docs in
> 
> > a zipfile extraction be disabled?
> 
> > 
> 
> > I'm loading a zipfile that contains multiple XML documents
> 
> > into MarkLogic, but it appears that MarkLogic is validating
> 
> > the embedded content against its corresponding schema in the
> 
> > Schemas database and coming up with an invalid root text
> 
> > error message when extracting the xml document:
> 
> > 
> 
> > 
> 
> > 
> 
> > <error:message>Invalid root text</error:message>
> 
> > 
> 
> >   <error:format-string>XDMP-DOCROOTTEXT:
> 
> > xdmp:zip-get(fn:doc($doc-uri)).
> 
> > 
> 
> > 
> 
> > 
> 
> > This prevents me from being able to stored a well-formed XML
> 
> > document and to be able to correct it in MarkLogic, which
> 
> > means that the content must be extracted either manually or
> 
> > via a non-MarkLogic application and then corrected before
> 
> > reinserting into MarkLogic.
> 
> > 
> 
> > 
> 
> > 
> 
> > Thanks for the help!
> 
> > 
> 
> > 
> 
> > 
> 
> > Tim Meagher
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> _______________________________________________
> 
> General mailing list
> 
> [email protected]
> 
> http://developer.marklogic.com/mailman/listinfo/general
> 
> 
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to