[jira] [Commented] (PDFBOX-4764) When a PDF has table with blank entries in the column the stripper just ignores the column and moves to next field in the coulmn

Michael Klink (Jira) Tue, 11 Feb 2020 10:13:27 -0800


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034680#comment-17034680
 ]


Michael Klink commented on PDFBOX-4764:
---------------------------------------

*A)* For sensible answers in the context of such questions please don't merely 
show pasted text examples, also share the source PDF and describe the text 
extraction code you used.
*B)* Table data extraction from PDFs is a complicated topic and focus of large 
projects, see tabula for example. Hoping for a decent table data extraction 
routine based solely on extracted text is overly optimistic.
*C)* This should not be discussed in a _Bug_ issue here because this is no 
_bug_ to start with, it is simply not a feature of PDFBox. There are other 
places to discuss ways to implement new features based on PDFBox.

> When a PDF has table with blank entries in the column the stripper just 
> ignores the column and moves to next field in the coulmn
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4764
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4764
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.8
>            Reporter: karthik guns
>            Priority: Major
>
> When a PDF has tables with columns with empty values,the stripper ignores the 
> field and moves to next column which has records(if its blank it should 
> capture)
>  
> PDFTextStripperByArea stripper = new PDFTextStripperByArea();
>  stripper.setSortByPosition(true);
> PDFTextStripper tStripper = new PDFTextStripper();
> String pdfFileInText = tStripper.getText(document);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4764) When a PDF has table with blank entries in the column the stripper just ignores the column and moves to next field in the coulmn

Reply via email to