[
https://issues.apache.org/jira/browse/PDFBOX-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Bonniot de Ruisselet updated PDFBOX-1351:
------------------------------------------------
Attachment: superscript.pdf
This file is generated using PDFedit for deleting most stuff from the original
document, both to create a minimal testcase and to remove potentially
confidential information. The generated file triggers a warning, but displays
fine in acrobat reader. This particular bug is the same on the original and on
this simplified version. I could not recreate this case from scratch, but maybe
someone will know better.
> False paragraph caused by superscript (1.7 regression)
> ------------------------------------------------------
>
> Key: PDFBOX-1351
> URL: https://issues.apache.org/jira/browse/PDFBOX-1351
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.7.0
> Reporter: Daniel Bonniot de Ruisselet
> Attachments: superscript.pdf
>
>
> On the attached minimal example document, text extraction seems to be
> confused by the superscript, and generates three paragraphs where there is
> only one.
> Note that 1.6 is processing this case well:
> {noformat}
> $ java -jar /dev/shm/pdfbox-app-1.6.0.jar ExtractText /tmp/superscript.pdf
> Jun 29, 2012 4:52:24 PM org.apache.pdfbox.pdfparser.PDFParser parseObject
> WARNING: expected='%%EOF' actual='5 0 obj '
> $ cat /tmp/superscript.txt
>
> Multiple synthetic routes have been described by R. Filler et al.11 regarding
> 1,3-
> Bis(perfluorophenyl)propane-1,3-dione. The synthesis and
>
>
> $ java -jar /dev/shm/pdfbox-app-1.7.0.jar ExtractText /tmp/superscript.pdf
> Jun 29, 2012 4:52:39 PM org.apache.pdfbox.pdfparser.PDFParser parseObject
> WARNING: expected='%%EOF' actual='5 0 obj '
> $ cat /tmp/superscript.txt
>
> Multiple synthetic routes have been described by R. Filler et al.
> 11
> regarding 1,3-
> Bis(perfluorophenyl)propane-1,3-dione. The synthesis and
>
>
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira