[ 
https://issues.apache.org/jira/browse/PDFBOX-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034551#comment-17034551
 ] 

karthik guns commented on PDFBOX-4764:
--------------------------------------

Thanks for the Comments,

 

In this case most of the table header and values in them all have spaces and 
looks like this below

Order#  description id  PO Number  notes value  Invoice Number  order amount 
tax amount

1111      test info           PSTR             test this       345543           
       300                       15

 

In this cases since the value displayed in first row as 

Order#  description id  PO Number  notes value  Invoice Number  order amount 
tax amount

second row value as 

1111      test info           PSTR             test this       345543           
       300                       15

 

if we split with spaces we get test info as test in first line and info in 
second line but if the same pdf comes again with test as description we cannot 
write the logic to extract the values,Got struck to think the logic for this 
template as the same table values can come dynamic with or without spaces..

 

Any delimit logic or thought to approach for this template alone please Thought 
of returning the index but even the index could differ when the field values 
are with or without spaces.

 

 

 

 

 

 

 

 

 

> When a PDF has table with blank entries in the column the stripper just 
> ignores the column and moves to next field in the coulmn
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4764
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4764
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.8
>            Reporter: karthik guns
>            Priority: Major
>
> When a PDF has tables with columns with empty values,the stripper ignores the 
> field and moves to next column which has records(if its blank it should 
> capture)
>  
> PDFTextStripperByArea stripper = new PDFTextStripperByArea();
>  stripper.setSortByPosition(true);
> PDFTextStripper tStripper = new PDFTextStripper();
> String pdfFileInText = tStripper.getText(document);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to