Does the speedup only help if you are trying to parse an individual page vs the entire document? If so, is partial parsing a use case for Tika? If this has the same performance on the full document as the regular parser, does it have lower memory overhead?
-----Original Message----- From: Hong-Thai Nguyen [mailto:[email protected]] Sent: Monday, December 02, 2013 9:18 AM To: [email protected] Subject: NonSequentialPDFParser Hi all, NonSequentialPDFParser may increase 45% parsing performance on PDF extraction. Should we integrate in Tika ? https://issues.apache.org/jira/browse/PDFBOX-1104 Thanks, Hong-Thai
