Oh, wow, so it really might be possible without too much work? I'm more than happy to supply examples. :)
Should I open an issue? -----Original Message----- From: Andreas Lehmkuehler [mailto:[email protected]] Sent: Monday, March 28, 2016 10:58 AM To: [email protected] Subject: Re: shading/relocating 1.8.x? Am 25.03.2016 um 17:39 schrieb John Hewson: > >> On 23 Mar 2016, at 06:20, Allison, Timothy B. <[email protected]> wrote: >> >> All, >> We've upgraded to 2.0.0 on Tika. Many thanks again! >> One of our users is interested in continuing to use the >> classic/SequentialParser, or at least having it available as a back-off >> parser for corrupt pdfs [0]. > > Using the old parser really isn’t a good idea, it’s known to be pretty > broken. I think that we would be much better off making sure the new parser > can handle truncated files. We already do a lot of repair in the new parser, > so this doesn’t seem like to much work? Maybe Andreas can comment further? The biggest issue here is the truncated stream or dictionary. The current version simply throws an exception when running into such constellations. We have to implement some algorithm to ignore such incomplete parts of a pdf if possible. BR Andreas > > Do we have some JIRA issues which identify some of these cases? > > — John > >> Would you be willing to distribute a shaded/relocated 1.8.x app so that we >> could load both 1.8.x and 2.0.0 in the same jvm without collisions? Or, is >> there a better solution? > > I wouldn’t recommend doing that, because you’re going to be stuck with using > 1.8 for everything, not just parsing, at least as far as corrupt/truncated > files are concerned. > > — John > >> Thank you! >> >> Cheers, >> >> Tim >> >> [0] >> https://issues.apache.org/jira/browse/TIKA-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208360#comment-15208360 >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
