Hi,
I've added a comment to TIKA-1201 explaining why one should use the
NonSequentialPDFParser - not because of the speed, but because of the
much more conform parsing.
Best,
Timo
Am 02.12.2013 16:15, schrieb Hong-Thai Nguyen:
Latest comment of Maruan clarify about this new Parser:
https://issues.apache.org/jira/browse/PDFBOX-1787?focusedCommentId=13836591&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13836591
Hong-Thai
-----Message d'origine-----
De : Allison, Timothy B. [mailto:[email protected]]
Envoyé : lundi 2 décembre 2013 15:59
À : [email protected]
Objet : RE: NonSequentialPDFParser
Does the speedup only help if you are trying to parse an individual page vs the
entire document? If so, is partial parsing a use case for Tika? If this has
the same performance on the full document as the regular parser, does it have
lower memory overhead?
-----Original Message-----
From: Hong-Thai Nguyen [mailto:[email protected]]
Sent: Monday, December 02, 2013 9:18 AM
To: [email protected]
Subject: NonSequentialPDFParser
Hi all,
NonSequentialPDFParser may increase 45% parsing performance on PDF extraction.
Should we integrate in Tika ?
https://issues.apache.org/jira/browse/PDFBOX-1104
Thanks,
Hong-Thai
--
Timo Boehme
OntoChem GmbH
H.-Damerow-Str. 4
06120 Halle/Saale
T: +49 345 4780474
F: +49 345 4780471
[email protected]
_____________________________________________________________________
OntoChem GmbH
Geschäftsführer: Dr. Lutz Weber
Sitz: Halle / Saale
Registergericht: Stendal
Registernummer: HRB 215461
_____________________________________________________________________