Mime type entry for pdf was XML, I strongly believe that's default! I made this 
binary, restarted server and it worked. Thanks again.

Regards,
Sundeep

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Michael Blakeley
Sent: Thursday, January 15, 2009 11:29 PM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] MarkLogic PDF content handling

I believe that webdav behavior is governed entirely by the "Mimetypes" 
section in the admin server configuration. The mimetype entry for .pdf 
should be binary, but perhaps it's been changed at some point on your 
instance of MarkLogic Server?

-- Mike

On 2009-01-14 20:19, Sundeep_Raikhelkar wrote:
> Thanks Mike,
> Setting the format to DocumentFormat.BINARY worked here. I am now able to see 
> XHTML and XML files getting generated. Is there any similar hack for WebDAV 
> as well? I just drag files and push them onto WebDAV browser.
>
>
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Michael Blakeley
> Sent: Tuesday, January 13, 2009 11:44 PM
> To: General Mark Logic Developer Discussion
> Subject: Re: [MarkLogic Dev General] MarkLogic PDF content handling
>
> Sundeep,
>
> The error code XDMP-DOCUTF8SEQ suggests that MarkLogic Server sees the
> pdf document as text or XML, rather than binary. There are several ways
> to fix this, but in XCC I would specify that the content is binary.
>
> The XCC "overview" section at
> http://developer.marklogic.com/pubs/4.0/javadoc/index.html includes
> sample code to insert content. In this API, the preferred way to build a
> ContentCreateOptions object representing a binary load is:
>
>     ContentCreateOptions options =
>       ContentCreateOptions.newBinaryInstance();
>
> While the above is the preferred technique, you could also use the
> ContentCreateOptions() constructor, then call cco.setFormatBinary() or
> cco.setFormat(DocumentFormat.BINARY)
>
> I hope that helps. I believe it's best to discuss one question at a
> time, so I'm only going to comment on your pdf ingestion issue in this
> email.
>
> -- Mike
>
> On 2009-01-13 01:38, Sundeep_Raikhelkar wrote:
>> Hi,
>> I am evaluating MarkLogic for content Processing capabilities. I have chosen 
>> a simple use-case for evaluation: PDF upload, PDF search, and PDF generation.
>>
>>    1.  PDF load: This happens fine when loaded in binary format, but with 
>> content processing turned on, I am not able upload any PDF. The error I get 
>> is "XDMP-DOCUTF8SEQ: Invalid UTF-8 escape sequence at /cpf/pdf/xcc.pdf". I 
>> tried to upload using XCC API, XDMP load and WebDAV. All three modes give 
>> the same error. I tried specifying the encoding for XCC API and XDMP load to 
>> ISO-8859-1, we get the error "XDMP-STARTTAGCHAR: Unexpected character "<" in 
>> start tag at /cpf/pdf/xcc.pdf line 2".  We have also tried providing the 
>> repair level.
>>
>>                                       File file = new 
>> File("E:\\marklogicTech\\xcc.pdf");
>>                                       ContentCreateOptions cco = new 
>> ContentCreateOptions();
>>                                       cco.setEncoding("ISO-8859-1");
>>                                       
>> cco.setRepairLevel(DocumentRepairLevel.FULL);
>>                                       String uriUpload = "/cpf/pdf/xcc.pdf";
>>                                       Content content = 
>> ContentFactory.newContent(uriUpload, file, cco);
>>                                       session.insertContent (content);
>>
>> I have tried uploading MS-Word and MS-Excel document, they are uploaded fine 
>> and correspondingly XHTML and XML files are getting generated. Can you 
>> please tell me if it is anything to do with the encoding of xcc.pdf (the 
>> file I am uploading) or with my MarkLogic database server settings?
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
>
> **************** CAUTION - Disclaimer *****************
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
> for the use of the addressee(s). If you are not the intended recipient, please
> notify the sender by e-mail and delete the original message. Further, you are 
> not
> to copy, disclose, or distribute this e-mail or its contents to any other 
> person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has 
> taken
> every reasonable precaution to minimize this risk, but is not liable for any 
> damage
> you may sustain as a result of any virus in this e-mail. You should carry out 
> your
> own virus checks before opening the e-mail or attachment. Infosys reserves the
> right to monitor and review the content of all messages sent to or from this 
> e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS******** End of Disclaimer ********INFOSYS***
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to