tika-user  

Re: PDF content extraction takes lot of time

Daniel Knapp
Mon, 23 Nov 2009 07:24:49 -0800

Hello Jukka,

sorry for the late reply.

It seems the problem no longer appear with the released 0.5 version.

Thanks again for your help!

Regards,
Daniel

Am 16.11.2009 um 14:48 schrieb Jukka Zitting:

> Hi,
> 
> On Mon, Nov 16, 2009 at 2:43 PM, Daniel Knapp
> <daniel.kn...@mni.fh-giessen.de> wrote:
>> Actually, i use Tika 0.5 from SVN. Most of my PDF Files are parsed in a 
>> second,
>> but at a few files it took very long. What a pity! :(
> 
> Can you file a Tika improvement issue about PDF parsing speed and
> attach an example file that illustrates the problem (if you don't want
> to share the files in public, you can also send one to me in private)?
> I can profile the parsing process and report the issue back to the
> PDFBox project.
> 
> BR,
> 
> Jukka Zitting

Attachment: smime.p7s
Description: S/MIME cryptographic signature