> On 29 Mar 2016, at 04:11, Andreas Lehmkühler <[email protected]> wrote:
> 
>> "Allison, Timothy B." <[email protected] <mailto:[email protected]>> hat 
>> am 28. März 2016 um 21:02
>> geschrieben:
>> 
>> 
>> Oh, wow, so it really might be possible without too much work?  I'm more than
>> happy to supply examples. :) 
> Ups, it isn't as simply as it sounds. If we simply swallow the exception 
> pdfbox
> most likel runs into a NPE. IMHO we have to implement some sort of an on 
> demand
> parser which is able to handle null-values for specific parts of a pdf without
> throwing any exception.

One thought: instead of null it might be possible to return an empty string, 
empty
dictionary, empty array, empty stream, etc. That way we don’t have to look for 
null
everywhere.

— John

> 
>> Should I open an issue?
> Thanks, but I'm going to do that soon, as some other things should be done as
> well.
> 
> BR
> Andreas
>> 
>> 
>> -----Original Message-----
>> From: Andreas Lehmkuehler [mailto:[email protected]] 
>> Sent: Monday, March 28, 2016 10:58 AM
>> To: [email protected]
>> Subject: Re: shading/relocating 1.8.x?
>> 
>> Am 25.03.2016 um 17:39 schrieb John Hewson:
>>> 
>>>> On 23 Mar 2016, at 06:20, Allison, Timothy B. <[email protected]> wrote:
>>>> 
>>>> All,
>>>>  We've upgraded to 2.0.0 on Tika.  Many thanks again!
>>>>  One of our users is interested in continuing to use the
>>>> classic/SequentialParser, or at least having it available as a back-off
>>>> parser for corrupt pdfs [0].
>>> 
>>> Using the old parser really isn’t a good idea, it’s known to be pretty
>>> broken. I think that we would be much better off making sure the new parser
>>> can handle truncated files. We already do a lot of repair in the new parser,
>>> so this doesn’t seem like to much work? Maybe Andreas can comment further?
>> The biggest issue here is the truncated stream or dictionary. The current
>> version simply throws an exception when running into such constellations. We
>> have to implement some algorithm to ignore such incomplete parts of a pdf if
>> possible.
>> 
>> BR
>> Andreas
>> 
>>> 
>>> Do we have some JIRA issues which identify some of these cases?
>>> 
>>> — John
>>> 
>>>>  Would you be willing to distribute a shaded/relocated 1.8.x app so that
>>>> we could load both 1.8.x and 2.0.0 in the same jvm without collisions?  Or,
>>>> is there a better solution?
>>> 
>>> I wouldn’t recommend doing that, because you’re going to be stuck with using
>>> 1.8 for everything, not just parsing, at least as far as corrupt/truncated
>>> files are concerned.
>>> 
>>> — John
>>> 
>>>>  Thank you!
>>>> 
>>>>              Cheers,
>>>> 
>>>>                         Tim
>>>> 
>>>> [0]
>>>> https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208360#comment-15208360
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] 
> <mailto:[email protected]>
> For additional commands, e-mail: [email protected] 
> <mailto:[email protected]>

Reply via email to