Hi,
I am evaluating MarkLogic for content Processing capabilities. I have chosen a
simple use-case for evaluation: PDF upload, PDF search, and PDF generation.
1. PDF load: This happens fine when loaded in binary format, but with content
processing turned on, I am not able upload any PDF. The error I get is
"XDMP-DOCUTF8SEQ: Invalid UTF-8 escape sequence at /cpf/pdf/xcc.pdf". I tried
to upload using XCC API, XDMP load and WebDAV. All three modes give the same
error. I tried specifying the encoding for XCC API and XDMP load to ISO-8859-1,
we get the error "XDMP-STARTTAGCHAR: Unexpected character "<" in start tag at
/cpf/pdf/xcc.pdf line 2". We have also tried providing the repair level.
File file = new
File("E:\\marklogicTech\\xcc.pdf");
ContentCreateOptions cco = new
ContentCreateOptions();
cco.setEncoding("ISO-8859-1");
cco.setRepairLevel(DocumentRepairLevel.FULL);
String uriUpload = "/cpf/pdf/xcc.pdf";
Content content =
ContentFactory.newContent(uriUpload, file, cco);
session.insertContent (content);
I have tried uploading MS-Word and MS-Excel document, they are uploaded fine
and correspondingly XHTML and XML files are getting generated. Can you please
tell me if it is anything to do with the encoding of xcc.pdf (the file I am
uploading) or with my MarkLogic database server settings?
1. Documents search: This happens fine at content level for documents that
are uploaded (MS-Word and MS-Excel). I have to plan to avoid XML and XHTML
files generated because of CPF framework getting searched. For example, I
uploaded a word document, this created a few XML and a XHTML, now when I search
in /cpf/ms-word I get two results. I was looking for some mechanism to tell
MarkLogic pipeline for MS-Word to place the internal files like XML and XHTML
in another location, preventing them from getting picked when searched.
/cpf/ms-word/Mule-Threading & DB-Connection-Pooling Tuning _doc_parts/css.xml -
element css:styles
/cpf/ms-word/Mule-Threading & DB-Connection-Pooling Tuning _doc.xhtml - element
html
/cpf/ms-word/Mule-Threading & DB-Connection-Pooling Tuning _doc.xml - element
section
/cpf/ms-word/Mule-Threading & DB-Connection-Pooling Tuning .doc - binary
1. PDF document generation: We have use-case to generate a PDF out of search
results (selected ones only of course!) and save them to desktop or send as
email. I don't see any marklogic feature for PDF (or any) document generation
(I saw an API to generate ZIP though!). Please correct me here. I am planning
to use a third party open source lib to do this (FOP or iText).
Thanks,
Sundeep
**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are
not
to copy, disclose, or distribute this e-mail or its contents to any other
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has
taken
every reasonable precaution to minimize this risk, but is not liable for any
damage
you may sustain as a result of any virus in this e-mail. You should carry out
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general