[jira] [Commented] (PDFBOX-4764) When a PDF has table with blank entries in the column the stripper just ignores the column and moves to next field in the coulmn

Michael Klink (Jira) Wed, 05 Feb 2020 06:20:29 -0800


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030692#comment-17030692
 ]


Michael Klink commented on PDFBOX-4764:
---------------------------------------

You might be interested in the {{LayoutTextStripper}} (which extends the PDFBox 
{{PDFTextStripper}} to return text lines which attempt to reflect the PDF file 
layout) from [this stack overflow 
answer|https://stackoverflow.com/a/45842515/1729265].

Beware, though, that class is based on PDFBox 1.8.x, not yet 2.0.x.

Or Jonathan Link's 
[PDFLayoutTextStripper|https://jonathanlink.ch/PDFLayoutTextStripper.html] 
which is a similar class.


> When a PDF has table with blank entries in the column the stripper just 
> ignores the column and moves to next field in the coulmn
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4764
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4764
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.8
>            Reporter: karthik guns
>            Priority: Major
>
> When a PDF has tables with columns with empty values,the stripper ignores the 
> field and moves to next column which has records(if its blank it should 
> capture)
>  
> PDFTextStripperByArea stripper = new PDFTextStripperByArea();
>  stripper.setSortByPosition(true);
> PDFTextStripper tStripper = new PDFTextStripper();
> String pdfFileInText = tStripper.getText(document);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-4764) When a PDF has table with blank entries in the column the stripper just ignores the column and moves to next field in the coulmn

Reply via email to