[ 
https://issues.apache.org/jira/browse/TIKA-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081226#comment-15081226
 ] 

Tim Allison commented on TIKA-1436:
-----------------------------------

I'm finally getting around to looking at this.

I think this would be a good thing to address in Tika 2.0 because it would be a 
fairly large departure from the current "bit awkward and generally shouldn't be 
recommended" code flow that we have now.

Chris noted that the patch doesn't apply cleanly...  It looks from the new 
import statement in the PDFParser that you refactored  
org.apache.tika.sax.WriteLimitReachedException into a standalone class, but I 
don't see that in the patch (I could very well be missing it).

 I'm looking at the raw patch now (not applied), and I'm a bit concerned that 
there is special handling for catching and swallowing a WriteLimitReached 
within the PDFParser.  I may be misunderstanding your proposal, but the nice 
thing about the exception was that it put the burden/opportunity on the client 
to handle it, and we didn't have to add catch blocks to every parser (this 
point was already made by Jukka).

> improvement to PDFParser
> ------------------------
>
>                 Key: TIKA-1436
>                 URL: https://issues.apache.org/jira/browse/TIKA-1436
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Stefano Fornari
>              Labels: parser, pdf
>             Fix For: 1.12
>
>         Attachments: ste-20140927.patch
>
>
> with regards to the thread "[PDFParser] - read limited number of characters" 
> on Mar 29, I would like to propose the attached patch. I noticed that in Tika 
> 1.6 there have been some work around a better handling of the 
> WriteLimitReachedException condition, but I believe it could be even 
> improved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to