[
https://issues.apache.org/jira/browse/PDFBOX-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034551#comment-17034551
]
karthik guns commented on PDFBOX-4764:
--------------------------------------
Thanks for the Comments,
In this case most of the table header and values in them all have spaces and
looks like this below
Order# description id PO Number notes value Invoice Number order amount
tax amount
1111 test info PSTR test this 345543
300 15
In this cases since the value displayed in first row as
Order# description id PO Number notes value Invoice Number order amount
tax amount
second row value as
1111 test info PSTR test this 345543
300 15
if we split with spaces we get test info as test in first line and info in
second line but if the same pdf comes again with test as description we cannot
write the logic to extract the values,Got struck to think the logic for this
template as the same table values can come dynamic with or without spaces..
Any delimit logic or thought to approach for this template alone please Thought
of returning the index but even the index could differ when the field values
are with or without spaces.
> When a PDF has table with blank entries in the column the stripper just
> ignores the column and moves to next field in the coulmn
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: PDFBOX-4764
> URL: https://issues.apache.org/jira/browse/PDFBOX-4764
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.8
> Reporter: karthik guns
> Priority: Major
>
> When a PDF has tables with columns with empty values,the stripper ignores the
> field and moves to next column which has records(if its blank it should
> capture)
>
> PDFTextStripperByArea stripper = new PDFTextStripperByArea();
> stripper.setSortByPosition(true);
> PDFTextStripper tStripper = new PDFTextStripper();
> String pdfFileInText = tStripper.getText(document);
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]