Hi Manish,
  Lots of things can go wrong in parsing PDFs.  Can you share links to
files showing specific problems?

On Mon, Mar 15, 2021 at 11:50 AM Chris Mattmann <[email protected]> wrote:
>
> Hi Manish, I think you should ask this one upstream on the Tika Dev lists. 
> I’ve cc’ed them for you.
>
>
>
>
>
>
>
>
>
> From: manish mathur <[email protected]>
> Date: Monday, March 15, 2021 at 4:41 AM
> To: <[email protected]>
> Subject: Re: Python-tika: issues related to memory consumption
>
>
>
> Hi Chris,
>
>
>
>     I am using python-tika library to extract the content from pdf. but  lot 
> of junks are coming due to tables or graphs etc. so is there have any way to 
> ignore while parsing pdf to get the content.
>
>
>
> Thanks in advance
>
>
>
> Thanks
>
> Manish Mathur
>
>
>
>
>
>
>
>
>
> On Mon, Feb 1, 2021 at 4:18 PM manish mathur <[email protected]> 
> wrote:
>
> Hi Chris,
>
>
>
>     I am using python-tika library for reading pdf urls, but gradually memory 
> consumption is increasing so much. is there have any way to release the 
> memory after reading one pdf url. Please let me know.
>
>
>
> Thanks in advance
>
>
>
> Thanks
>
> Manish Mathur
>
>
>
>
>

Reply via email to