Thanks Mike,
Setting the format to DocumentFormat.BINARY worked here. I am now able to see 
XHTML and XML files getting generated. Is there any similar hack for WebDAV as 
well? I just drag files and push them onto WebDAV browser.


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Michael Blakeley
Sent: Tuesday, January 13, 2009 11:44 PM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] MarkLogic PDF content handling

Sundeep,

The error code XDMP-DOCUTF8SEQ suggests that MarkLogic Server sees the 
pdf document as text or XML, rather than binary. There are several ways 
to fix this, but in XCC I would specify that the content is binary.

The XCC "overview" section at 
http://developer.marklogic.com/pubs/4.0/javadoc/index.html includes 
sample code to insert content. In this API, the preferred way to build a 
ContentCreateOptions object representing a binary load is:

   ContentCreateOptions options =
     ContentCreateOptions.newBinaryInstance();

While the above is the preferred technique, you could also use the 
ContentCreateOptions() constructor, then call cco.setFormatBinary() or 
cco.setFormat(DocumentFormat.BINARY)

I hope that helps. I believe it's best to discuss one question at a 
time, so I'm only going to comment on your pdf ingestion issue in this 
email.

-- Mike

On 2009-01-13 01:38, Sundeep_Raikhelkar wrote:
> Hi,
> I am evaluating MarkLogic for content Processing capabilities. I have chosen 
> a simple use-case for evaluation: PDF upload, PDF search, and PDF generation.
>
>   1.  PDF load: This happens fine when loaded in binary format, but with 
> content processing turned on, I am not able upload any PDF. The error I get 
> is "XDMP-DOCUTF8SEQ: Invalid UTF-8 escape sequence at /cpf/pdf/xcc.pdf". I 
> tried to upload using XCC API, XDMP load and WebDAV. All three modes give the 
> same error. I tried specifying the encoding for XCC API and XDMP load to 
> ISO-8859-1, we get the error "XDMP-STARTTAGCHAR: Unexpected character "<" in 
> start tag at /cpf/pdf/xcc.pdf line 2".  We have also tried providing the 
> repair level.
>
>                                      File file = new 
> File("E:\\marklogicTech\\xcc.pdf");
>                                      ContentCreateOptions cco = new 
> ContentCreateOptions();
>                                      cco.setEncoding("ISO-8859-1");
>                                      
> cco.setRepairLevel(DocumentRepairLevel.FULL);
>                                      String uriUpload = "/cpf/pdf/xcc.pdf";
>                                      Content content = 
> ContentFactory.newContent(uriUpload, file, cco);
>                                      session.insertContent (content);
>
> I have tried uploading MS-Word and MS-Excel document, they are uploaded fine 
> and correspondingly XHTML and XML files are getting generated. Can you please 
> tell me if it is anything to do with the encoding of xcc.pdf (the file I am 
> uploading) or with my MarkLogic database server settings?
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to