SchwingSK opened a new pull request #89: URL: https://github.com/apache/pdfbox/pull/89
The problem lied with the fact that maxHeightForLine is kept, even when the text font changes (which is intentional so as not to trigger a new line when there is sub/superscript). This leads in this case to PDFTextStripper merging two lines that should be separate. The patch assumes that when the current character is separated from the previous one, the maxHeightForLine has to be reset. This breaks only one test: eu-001.pdf, and it should as the new code correctly detects two lines where there was only one detected before. (the patch has been tested with mvn clean test on the 2.0.21 branch with commit bdf2ae77e693cc73d4cdeb9a95c6ac2845d11ead applied, as the current 2.0 branch does not pass tests) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
