RE: NonSequentialPDFParser

Hong-Thai Nguyen Mon, 02 Dec 2013 07:16:40 -0800

Latest comment of Maruan clarify about this new Parser:
https://issues.apache.org/jira/browse/PDFBOX-1787?focusedCommentId=13836591&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13836591



Hong-Thai


-----Message d'origine-----
De : Allison, Timothy B. [mailto:[email protected]] 
Envoyé : lundi 2 décembre 2013 15:59
À : [email protected]
Objet : RE: NonSequentialPDFParser

Does the speedup only help if you are trying to parse an individual page vs the 
entire document?  If so, is partial parsing a use case for Tika?  If this has 
the same performance on the full document as the regular parser, does it have 
lower memory overhead?

-----Original Message-----
From: Hong-Thai Nguyen [mailto:[email protected]] 
Sent: Monday, December 02, 2013 9:18 AM
To: [email protected]
Subject: NonSequentialPDFParser

Hi all,
NonSequentialPDFParser may increase 45% parsing performance on PDF extraction. 
Should we integrate in Tika ?
https://issues.apache.org/jira/browse/PDFBOX-1104

Thanks,

Hong-Thai

RE: NonSequentialPDFParser

Reply via email to