Am 25.03.2016 um 17:39 schrieb John Hewson:

On 23 Mar 2016, at 06:20, Allison, Timothy B. <[email protected]> wrote:

All,
  We've upgraded to 2.0.0 on Tika.  Many thanks again!
  One of our users is interested in continuing to use the 
classic/SequentialParser, or at least having it available as a back-off parser 
for corrupt pdfs [0].

Using the old parser really isn’t a good idea, it’s known to be pretty broken. 
I think that we would be much better off making sure the new parser can handle 
truncated files. We already do a lot of repair in the new parser, so this 
doesn’t seem like to much work? Maybe Andreas can comment further?
The biggest issue here is the truncated stream or dictionary. The current version simply throws an exception when running into such constellations. We have to implement some algorithm to ignore such incomplete parts of a pdf if possible.

BR
Andreas


Do we have some JIRA issues which identify some of these cases?

— John

  Would you be willing to distribute a shaded/relocated 1.8.x app so that we 
could load both 1.8.x and 2.0.0 in the same jvm without collisions?  Or, is 
there a better solution?

I wouldn’t recommend doing that, because you’re going to be stuck with using 
1.8 for everything, not just parsing, at least as far as corrupt/truncated 
files are concerned.

— John

  Thank you!

              Cheers,

                         Tim

[0] 
https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208360#comment-15208360

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to