[MarkLogic Dev General] 2 column Pdf to Xml conversion

aniruddha biswas Wed, 10 Feb 2010 07:36:39 -0800

Hi All,

I am a new developer to Mark Logic. I need your help regarding the following:


I have a 2-column pdf. I have already ingested this pdf into Mark Logic. I need 
to make a docbook xml from this pdf. I am using the following query for this 
conversion:

xquery version '0.9-ml'  
          import module namespace dbk = 'http://marklogic.com/cpf/docbook'at 
'/MarkLogic/conversion/docbook.xqy'  
          let $results := 
xdmp:pdf-convert(doc('10747_2007_article_bf02760200.pdf'),'10747_2007_article_bf02760200.pdf')
  
          let $xhtml := $results[2]  
          let $options := <options xmlns='dbk:convert'>   
          <wrap-text>true</wrap-text>   
          <preserve-styles>true</preserve-styles>   
          </options>  
          return dbk:convert($xhtml, $options)[2]


I am getting the xml. But it cannot retain the column position of data. Do you 
have any idea regarding this? PFA the PDFtoXHTML.cfg file what is being used in 
this query.

Next problem what I am facing is that pdf contains many special characters(for 
scientific notation-gama,kappa,alpha) as well as table data. How do I convert 
the pdf including all these characters and data?

Please help.

Thanks in advance.

Aniruddha




      The INTERNET now has a personality. YOURS! See your Yahoo! Homepage. 
http://in.yahoo.com/

PDFtoXHTML.cfg
Description: Binary data

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

[MarkLogic Dev General] 2 column Pdf to Xml conversion

Reply via email to