[jira] [Commented] (PDFBOX-4764) When a PDF has table with blank entries in the column the stripper just ignores the column and moves to next field in the coulmn

karthik guns (Jira) Mon, 10 Feb 2020 16:43:41 -0800


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034050#comment-17034050
 ]


karthik guns commented on PDFBOX-4764:
--------------------------------------

I was testing with nearly 10 different pdfs the pdf stripper takes the value 
from the table looking like below 

Order#   PO Number           
11111     TL12

Extracted output

Line1:

Order#

Line2:

PO Number

Line3:

11111

Line4:

TL12

 

But in one unique Pdf the strip value is getting displayed as below for the 
same table structure

Line1:

Order#   PO Number

Line2:

11111     TL12

In this case even if we delimit with space again the string  of PO and number 
is getting split as its one column (Any thoughts on this)

 

 

> When a PDF has table with blank entries in the column the stripper just 
> ignores the column and moves to next field in the coulmn
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4764
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4764
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.8
>            Reporter: karthik guns
>            Priority: Major
>
> When a PDF has tables with columns with empty values,the stripper ignores the 
> field and moves to next column which has records(if its blank it should 
> capture)
>  
> PDFTextStripperByArea stripper = new PDFTextStripperByArea();
>  stripper.setSortByPosition(true);
> PDFTextStripper tStripper = new PDFTextStripper();
> String pdfFileInText = tStripper.getText(document);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4764) When a PDF has table with blank entries in the column the stripper just ignores the column and moves to next field in the coulmn

Reply via email to