[ 
https://issues.apache.org/jira/browse/PDFBOX-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446752#comment-17446752
 ] 

Tilman Hausherr edited comment on PDFBOX-5327 at 11/20/21, 4:27 AM:
--------------------------------------------------------------------

You'd have to replace the text stripper with your own, that does line 
separation differently. Look for the code around the "handleLineSeparation" 
call.

If you extract always the same PDFs (i.e. always from this creator), then 
consider using PDFTextStripperByArea instead.


was (Author: tilman):
You'd have to replace the text stripper with your own, that does line 
separation differently. Look for the code around the "handleLineSeparation" 
call.

If you extract always the same PDFs, then consider using PDFTextStripperByArea 
instead.

> Parse text from two rectangles to one rectangle
> -----------------------------------------------
>
>                 Key: PDFBOX-5327
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5327
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.24
>            Reporter: zzz
>            Priority: Major
>         Attachments: 072B006805-P32939I-(2)(1).pdf, 1-1.png, 2-1.png, 3-1.png
>
>
> Rectangle(CONSIGN TO) and Rectangle(PACKING LIST) regard as one 
> Rectangle(CONSIGN TO PACKING LIST)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to