[jira] [Commented] (PDFBOX-4764) When a PDF has table with blank entries in the column the stripper just ignores the column and moves to next field in the coulmn

Tilman Hausherr (Jira) Tue, 04 Feb 2020 20:35:27 -0800


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030350#comment-17030350
 ]


Tilman Hausherr commented on PDFBOX-4764:
-----------------------------------------

This isn't a bug. This is a text extraction tool and if there is no text, not 
even blanks, then there is nothing to extract.

PDF isn't like HTML where there is a TABLE syntax. What you, as a human, see as 
a "table" is just vector graphics.

If you want to extract tables, use products for that, e.g. Tabula. Or use 
ExtractTextByArea with the coordinates of your table cells.

> When a PDF has table with blank entries in the column the stripper just 
> ignores the column and moves to next field in the coulmn
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4764
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4764
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.8
>            Reporter: karthik guns
>            Priority: Major
>
> When a PDF has tables with columns with empty values,the stripper ignores the 
> field and moves to next column which has records(if its blank it should 
> capture)
>  
> PDFTextStripperByArea stripper = new PDFTextStripperByArea();
>  stripper.setSortByPosition(true);
> PDFTextStripper tStripper = new PDFTextStripper();
> String pdfFileInText = tStripper.getText(document);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-4764) When a PDF has table with blank entries in the column the stripper just ignores the column and moves to next field in the coulmn

Reply via email to