Latest comment of Maruan clarify about this new Parser: https://issues.apache.org/jira/browse/PDFBOX-1787?focusedCommentId=13836591&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13836591
Hong-Thai -----Message d'origine----- De : Allison, Timothy B. [mailto:[email protected]] Envoyé : lundi 2 décembre 2013 15:59 À : [email protected] Objet : RE: NonSequentialPDFParser Does the speedup only help if you are trying to parse an individual page vs the entire document? If so, is partial parsing a use case for Tika? If this has the same performance on the full document as the regular parser, does it have lower memory overhead? -----Original Message----- From: Hong-Thai Nguyen [mailto:[email protected]] Sent: Monday, December 02, 2013 9:18 AM To: [email protected] Subject: NonSequentialPDFParser Hi all, NonSequentialPDFParser may increase 45% parsing performance on PDF extraction. Should we integrate in Tika ? https://issues.apache.org/jira/browse/PDFBOX-1104 Thanks, Hong-Thai
