There are 2 issues that are intermixed here.

1) Can you write your own code
Sure.   You can write any code you want.   Once the data is in XML form
that is, I dont think (not sure) that you can install converters on the
first 'leg'.  That is for example extracting XML from PDF,
to run WITHIN the ML server.  But you are welcome to write your own code
OUTSIDE  ML. then import that data to ML.
Once in ML you can write the pipelines with your own xquery code.

2) Can you do a better job ?
Thats tough.  I know for example PDF doesnt preserve the kinds of
structure you are aiming for, like tables.  If you examine the PDF
structure itself, its not semantic, its layout oriented, not semantic
oriented data.   I dont think youll get very far doing a better job with
PDF's.
But with Word ... I havent looked at how the word is output in ML, but I
have had experience with extracting table data from word files that were
"Save As XML" (word 2003 format).
These definitely do have all the tabular structure you can stomach, and
1000x more.
It is excruciatingly painful to work at that level but its possible.



-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Byomokesh
Sahoo
Sent: Wednesday, December 02, 2009 12:39 AM
To: [email protected]
Subject: [MarkLogic Dev General] Poor coding Default Convert

Hi,

I have some doubt about MarkLogic Default Converter (PDF to XML, Word to
XML).

1. This convert files are 100% accuracy?

2. Can i write any program to transform from Marklogic convert files
to another  XML format in complex table and List Item.

3. Can we archive different output files (eBook, ePDF) frm default
conversion


I found text missing, table is coding para, List items para coding. My
Question is without any manual work how we archive good output from
Marklogic default conversion.


Can anyone suggest me.

Thanks
Byomokesh
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to