[
https://issues.apache.org/jira/browse/PDFBOX-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446752#comment-17446752
]
Tilman Hausherr edited comment on PDFBOX-5327 at 11/20/21, 4:27 AM:
--------------------------------------------------------------------
You'd have to replace the text stripper with your own, that does line
separation differently. Look for the code around the "handleLineSeparation"
call.
If you extract always the same PDFs (i.e. always from this creator), then
consider using PDFTextStripperByArea instead.
was (Author: tilman):
You'd have to replace the text stripper with your own, that does line
separation differently. Look for the code around the "handleLineSeparation"
call.
If you extract always the same PDFs, then consider using PDFTextStripperByArea
instead.
> Parse text from two rectangles to one rectangle
> -----------------------------------------------
>
> Key: PDFBOX-5327
> URL: https://issues.apache.org/jira/browse/PDFBOX-5327
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.24
> Reporter: zzz
> Priority: Major
> Attachments: 072B006805-P32939I-(2)(1).pdf, 1-1.png, 2-1.png, 3-1.png
>
>
> Rectangle(CONSIGN TO) and Rectangle(PACKING LIST) regard as one
> Rectangle(CONSIGN TO PACKING LIST)
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]