Hi,
I am evaluating MarkLogic for content Processing capabilities. I have chosen a 
simple use-case for evaluation: PDF upload, PDF search, and PDF generation.

 1.  PDF load: This happens fine when loaded in binary format, but with content 
processing turned on, I am not able upload any PDF. The error I get is 
"XDMP-DOCUTF8SEQ: Invalid UTF-8 escape sequence at /cpf/pdf/xcc.pdf". I tried 
to upload using XCC API, XDMP load and WebDAV. All three modes give the same 
error. I tried specifying the encoding for XCC API and XDMP load to ISO-8859-1, 
we get the error "XDMP-STARTTAGCHAR: Unexpected character "<" in start tag at 
/cpf/pdf/xcc.pdf line 2".  We have also tried providing the repair level.

                                    File file = new 
File("E:\\marklogicTech\\xcc.pdf");
                                    ContentCreateOptions cco = new 
ContentCreateOptions();
                                    cco.setEncoding("ISO-8859-1");
                                    
cco.setRepairLevel(DocumentRepairLevel.FULL);
                                    String uriUpload = "/cpf/pdf/xcc.pdf";
                                    Content content = 
ContentFactory.newContent(uriUpload, file, cco);
                                    session.insertContent (content);

I have tried uploading MS-Word and MS-Excel document, they are uploaded fine 
and correspondingly XHTML and XML files are getting generated. Can you please 
tell me if it is anything to do with the encoding of xcc.pdf (the file I am 
uploading) or with my MarkLogic database server settings?


 1.  Documents search: This happens fine at content level for documents that 
are uploaded (MS-Word and MS-Excel). I have to plan to avoid XML and XHTML 
files generated because of CPF framework getting searched. For example, I 
uploaded a word document, this created a few XML and a XHTML, now when I search 
in /cpf/ms-word I get two results. I was looking for some mechanism to tell 
MarkLogic pipeline for MS-Word to place the internal files like XML and XHTML 
in another location, preventing them from getting picked when searched.

/cpf/ms-word/Mule-Threading & DB-Connection-Pooling Tuning _doc_parts/css.xml - 
element css:styles
/cpf/ms-word/Mule-Threading & DB-Connection-Pooling Tuning _doc.xhtml - element 
html
/cpf/ms-word/Mule-Threading & DB-Connection-Pooling Tuning _doc.xml - element 
section
/cpf/ms-word/Mule-Threading & DB-Connection-Pooling Tuning .doc - binary


 1.  PDF document generation: We have use-case to generate a PDF out of search 
results (selected ones only of course!) and save them to desktop or send as 
email. I don't see any marklogic feature for PDF (or any) document generation 
(I saw an API to generate ZIP though!). Please correct me here. I am planning 
to use a third party open source lib to do this (FOP or iText).


Thanks,
Sundeep

**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to