[
https://issues.apache.org/jira/browse/TIKA-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081226#comment-15081226
]
Tim Allison commented on TIKA-1436:
-----------------------------------
I'm finally getting around to looking at this.
I think this would be a good thing to address in Tika 2.0 because it would be a
fairly large departure from the current "bit awkward and generally shouldn't be
recommended" code flow that we have now.
Chris noted that the patch doesn't apply cleanly... It looks from the new
import statement in the PDFParser that you refactored
org.apache.tika.sax.WriteLimitReachedException into a standalone class, but I
don't see that in the patch (I could very well be missing it).
I'm looking at the raw patch now (not applied), and I'm a bit concerned that
there is special handling for catching and swallowing a WriteLimitReached
within the PDFParser. I may be misunderstanding your proposal, but the nice
thing about the exception was that it put the burden/opportunity on the client
to handle it, and we didn't have to add catch blocks to every parser (this
point was already made by Jukka).
> improvement to PDFParser
> ------------------------
>
> Key: TIKA-1436
> URL: https://issues.apache.org/jira/browse/TIKA-1436
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.6
> Reporter: Stefano Fornari
> Labels: parser, pdf
> Fix For: 1.12
>
> Attachments: ste-20140927.patch
>
>
> with regards to the thread "[PDFParser] - read limited number of characters"
> on Mar 29, I would like to propose the attached patch. I noticed that in Tika
> 1.6 there have been some work around a better handling of the
> WriteLimitReachedException condition, but I believe it could be even
> improved.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)