Hi, Abhishek:

xdmp:binary-decode() should do the conversion once you know.

xdmp:encoding-language-detect() will give you some guesses.

One popular post on encodings advises that it's best to ask
the publisher of the text instead of trying to guess:

    http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html


Hoping that's useful,


Erik Hennum

________________________________________
From: [email protected] 
[[email protected]] On Behalf Of Abhishek53 S 
[[email protected]]
Sent: Monday, June 25, 2012 7:00 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] UTF -8 Encoding Exception

Hi Geert,

Thanks for prompt reply! Is there any way to convert Non UTF 8 encoded file to 
UTF -8 encoded through some different API? The downloaded text file has invalid 
XML characters like  which needs to be pre-processed before updating this 
to a XML file.

Thanks
Abhishek Srivastav
Systems Engineer
Tata Consultancy Services
Cell:- +91-9883389968
Mailto: [email protected]
Website: http://www.tcs.com<http://www.tcs.com/>
____________________________________________
Experience certainty.        IT Services
                       Business Solutions
                       Outsourcing
____________________________________________


From:   Geert Josten <[email protected]>
To:     MarkLogic Developer Discussion <[email protected]>
Date:   06/25/2012 06:41 PM
Subject:        Re: [MarkLogic Dev General] UTF -8 Encoding Exception
Sent by:        [email protected]

________________________________



Hi Abhishek,

The encoding option is not to specify a target encoding for conversion, but to 
specify the encoding of the file you try to download. So, you should figure out 
which encoding file-location.txt itself has, and just specify that..

Kind regards,
Geert

Van: 
[email protected]<mailto:[email protected]>
 
[mailto:[email protected]<mailto:[email protected]>]
 Namens Abhishek53 S
Verzonden: maandag 25 juni 2012 14:51
Aan: MarkLogic Developer Discussion
Onderwerp: [MarkLogic Dev General] UTF -8 Encoding Exception


Hi Folks,

I am having issue in downloading non UTF 8 encoded text file from file server. 
I am using http-get method to download text files and then updating the text 
inside XML documents.

How to convert non UTF 8 to UTF 8 encoded?

Sample Code
xdmp:http-get("file-location.txt",
        <options xmlns="xdmp:document-get">
                       <encoding>utf-8</encoding>
             </options>

)

Exception: XDMP-DOCUTF8SEQ: -- document is not UTF-8 encoded
Please let me know your suggestion

Thanks
Abhishek Srivastav
Systems Engineer
Tata Consultancy Services
Cell:- +91-9883389968
Mailto: [email protected]<mailto:[email protected]>
Website: http://www.tcs.com<http://www.tcs.com/>
____________________________________________
Experience certainty.        IT Services
                       Business Solutions
                       Outsourcing
____________________________________________

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you_______________________________________________
General mailing list
[email protected]
http://community.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://community.marklogic.com/mailman/listinfo/general

Reply via email to