Hi Christian, Thanks for bringing this up! Would you be able to share the PDF which causes this? Or, one with a similar structure?
Thanks, Tyler On Mon, Jun 16, 2014 at 6:46 AM, Christian Reuschling < [email protected]> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I currently migrate to Tika 1.5, and fall into this behaviour, which leads > to double entries in my > database for one pdf file as I work directly with the handler. > > Here are the two calls: > > First call is in PDF2HTML, line 197: handler.endDocument(); > this is part of the PDF2XHTML.process(pdfDocument, handler, context, > metadata, localConfig); > invocation from PDFParser, line 143. > > > The second call is then directly in PDFParser, line 151: > handler.endDocument(); > > > Will stay at Tika 1.4 for now - still thanks for good work! > > > Christian > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.19 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iEYEARECAAYFAlOe9S4ACgkQ6EqMXq+WZg/1dwCcD/OHrKb287FqLMw8T93ma+rk > Pn4An0WBWan0afV34aDbCWTtyJ5zlMw2 > =Pzrf > -----END PGP SIGNATURE----- >
