Hi Manish, Lots of things can go wrong in parsing PDFs. Can you share links to files showing specific problems?
On Mon, Mar 15, 2021 at 11:50 AM Chris Mattmann <[email protected]> wrote: > > Hi Manish, I think you should ask this one upstream on the Tika Dev lists. > I’ve cc’ed them for you. > > > > > > > > > > From: manish mathur <[email protected]> > Date: Monday, March 15, 2021 at 4:41 AM > To: <[email protected]> > Subject: Re: Python-tika: issues related to memory consumption > > > > Hi Chris, > > > > I am using python-tika library to extract the content from pdf. but lot > of junks are coming due to tables or graphs etc. so is there have any way to > ignore while parsing pdf to get the content. > > > > Thanks in advance > > > > Thanks > > Manish Mathur > > > > > > > > > > On Mon, Feb 1, 2021 at 4:18 PM manish mathur <[email protected]> > wrote: > > Hi Chris, > > > > I am using python-tika library for reading pdf urls, but gradually memory > consumption is increasing so much. is there have any way to release the > memory after reading one pdf url. Please let me know. > > > > Thanks in advance > > > > Thanks > > Manish Mathur > > > > >
