Re: Interesting PDF on stackoverflow

Tilman Hausherr Wed, 21 Jul 2021 10:59:07 -0700

Maybe this could be done with the ExtractTextByArea example. HoweverIIRC the coordinates are awt-like (y 0 on top) coordinates, so the PDFcoordinates should somehow be mapped to this.


Tilman


Am 21.07.2021 um 18:21 schrieb Tim Allison:

https://stackoverflow.com/questions/68402058/tika-isnt-reading-pdf-properly

Not sure there's much we should do on the Tika side.

How hard would it be to add an "extract only text that is on the page" feature?

Best,

        Tim

Re: Interesting PDF on stackoverflow

Reply via email to