[ https://issues.apache.org/jira/browse/PDFBOX-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034680#comment-17034680 ]
Michael Klink commented on PDFBOX-4764: --------------------------------------- *A)* For sensible answers in the context of such questions please don't merely show pasted text examples, also share the source PDF and describe the text extraction code you used. *B)* Table data extraction from PDFs is a complicated topic and focus of large projects, see tabula for example. Hoping for a decent table data extraction routine based solely on extracted text is overly optimistic. *C)* This should not be discussed in a _Bug_ issue here because this is no _bug_ to start with, it is simply not a feature of PDFBox. There are other places to discuss ways to implement new features based on PDFBox. > When a PDF has table with blank entries in the column the stripper just > ignores the column and moves to next field in the coulmn > -------------------------------------------------------------------------------------------------------------------------------- > > Key: PDFBOX-4764 > URL: https://issues.apache.org/jira/browse/PDFBOX-4764 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 2.0.8 > Reporter: karthik guns > Priority: Major > > When a PDF has tables with columns with empty values,the stripper ignores the > field and moves to next column which has records(if its blank it should > capture) > > PDFTextStripperByArea stripper = new PDFTextStripperByArea(); > stripper.setSortByPosition(true); > PDFTextStripper tStripper = new PDFTextStripper(); > String pdfFileInText = tStripper.getText(document); -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org