[
https://issues.apache.org/jira/browse/PDFBOX-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041109#comment-14041109
]
Tim Allison commented on PDFBOX-2160:
-------------------------------------
Test file is not Apache licensable, as far as I can tell.
[581654.pdf|http://digitalcorpora.org/corp/nps/files/govdocs1/581/581654.pdf]
java -jar pdfbox-app-1.8.6.jar ExtractText -html 581654.pdf
yields a closing </p> with no matching <p> here: "<b>P25DSR</b></p>"
> PDFTextStripper doesn't always write paragraph start
> ----------------------------------------------------
>
> Key: PDFBOX-2160
> URL: https://issues.apache.org/jira/browse/PDFBOX-2160
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.8.6
> Reporter: Tim Allison
> Priority: Trivial
> Attachments: PDFBOX-2160.patch
>
>
> In some cases PDFTextStripper writes more paragraph ends than paragraph
> starts.
--
This message was sent by Atlassian JIRA
(v6.2#6252)