Re: NonSequentialPDFParser

Timo Boehme Tue, 03 Dec 2013 01:27:37 -0800

Hi,

I've added a comment to TIKA-1201 explaining why one should use theNonSequentialPDFParser - not because of the speed, but because of themuch more conform parsing.



Best,
Timo


Am 02.12.2013 16:15, schrieb Hong-Thai Nguyen:

Latest comment of Maruan clarify about this new Parser:
https://issues.apache.org/jira/browse/PDFBOX-1787?focusedCommentId=13836591&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13836591


Hong-Thai


-----Message d'origine-----
De : Allison, Timothy B. [mailto:[email protected]]
Envoyé : lundi 2 décembre 2013 15:59
À : [email protected]
Objet : RE: NonSequentialPDFParser

Does the speedup only help if you are trying to parse an individual page vs the 
entire document?  If so, is partial parsing a use case for Tika?  If this has 
the same performance on the full document as the regular parser, does it have 
lower memory overhead?

-----Original Message-----
From: Hong-Thai Nguyen [mailto:[email protected]]
Sent: Monday, December 02, 2013 9:18 AM
To: [email protected]
Subject: NonSequentialPDFParser

Hi all,
NonSequentialPDFParser may increase 45% parsing performance on PDF extraction. 
Should we integrate in Tika ?
https://issues.apache.org/jira/browse/PDFBOX-1104

Thanks,

Hong-Thai



--

 Timo Boehme
 OntoChem GmbH
 H.-Damerow-Str. 4
 06120 Halle/Saale
 T: +49 345 4780474
 F: +49 345 4780471
 [email protected]

_____________________________________________________________________

 OntoChem GmbH
 Geschäftsführer: Dr. Lutz Weber
 Sitz: Halle / Saale
 Registergericht: Stendal
 Registernummer: HRB 215461
_____________________________________________________________________

Re: NonSequentialPDFParser

Reply via email to