Sent from my iPhone
On Jan 13, 2009, at 10:35 AM, "Michael Blakeley" <[email protected]
> wrote:
Sundeep,
The error code XDMP-DOCUTF8SEQ suggests that MarkLogic Server sees
the pdf document as text or XML, rather than binary. There are
several ways to fix this, but in XCC I would specify that the
content is binary.
The XCC "overview" section at http://developer.marklogic.com/pubs/4.0/javadoc/index.html
includes sample code to insert content. In this API, the preferred
way to build a ContentCreateOptions object representing a binary
load is:
ContentCreateOptions options =
ContentCreateOptions.newBinaryInstance();
While the above is the preferred technique, you could also use the
ContentCreateOptions() constructor, then call cco.setFormatBinary()
or cco.setFormat(DocumentFormat.BINARY)
I hope that helps. I believe it's best to discuss one question at a
time, so I'm only going to comment on your pdf ingestion issue in
this email.
-- Mike
On 2009-01-13 01:38, Sundeep_Raikhelkar wrote:
Hi,
I am evaluating MarkLogic for content Processing capabilities. I
have chosen a simple use-case for evaluation: PDF upload, PDF
search, and PDF generation.
1. PDF load: This happens fine when loaded in binary format, but
with content processing turned on, I am not able upload any PDF.
The error I get is "XDMP-DOCUTF8SEQ: Invalid UTF-8 escape sequence
at /cpf/pdf/xcc.pdf". I tried to upload using XCC API, XDMP load
and WebDAV. All three modes give the same error. I tried specifying
the encoding for XCC API and XDMP load to ISO-8859-1, we get the
error "XDMP-STARTTAGCHAR: Unexpected character "<" in start tag at /
cpf/pdf/xcc.pdf line 2". We have also tried providing the repair
level.
File file = new File("E:\
\marklogicTech\\xcc.pdf");
ContentCreateOptions cco = new
ContentCreateOptions();
cco.setEncoding("ISO-8859-1");
cco.setRepairLevel(DocumentRepairLevel.FULL);
String uriUpload = "/cpf/pdf/
xcc.pdf";
Content content =
ContentFactory.newContent(uriUpload, file, cco);
session.insertContent (content);
I have tried uploading MS-Word and MS-Excel document, they are
uploaded fine and correspondingly XHTML and XML files are getting
generated. Can you please tell me if it is anything to do with the
encoding of xcc.pdf (the file I am uploading) or with my MarkLogic
database server settings?
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general