[ 
https://issues.apache.org/jira/browse/PDFBOX-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446752#comment-17446752
 ] 

Tilman Hausherr commented on PDFBOX-5327:
-----------------------------------------

You'd have to replace the text stripper with your own, that does line 
separation differently. Look for the code around the "handleLineSeparation" 
call.

If you extract always the same PDFs, then consider using PDFTextStripperByArea 
instead.

> Parse text from two rectangles to one rectangle
> -----------------------------------------------
>
>                 Key: PDFBOX-5327
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5327
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.24
>            Reporter: zzz
>            Priority: Major
>         Attachments: 072B006805-P32939I-(2)(1).pdf, 1-1.png, 2-1.png, 3-1.png
>
>
> Rectangle(CONSIGN TO) and Rectangle(PACKING LIST) regard as one 
> Rectangle(CONSIGN TO PACKING LIST)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to