On 7/05/2013 8:08, shailendra3009 wrote:
> I used itextshap to extract text from pdf. i used below code to extract text
> line by line. It is extracting code perfectly only it is not reading white
> spaces in PDF. specially i need to read white spaces using this.
Your question sounds like "I have no money in my wallet; how can I fetch 
the zero dollar notes from my wallet?"

In a PDF, all text is added at absolute positions.
For instance: one word is added at position x = 36, y = 806; another 
word is added on position x = 300, y = 806. Some other text is added at 
position x = 36, y = 790; x = 36, y = 774; x = 36; 742;...

Where are the spaces? There are none!

But by doing the math, you can see that there's a gap between the text 
that starts at position x = 36 and the one that starts at position x = 300.

Also, you see a pattern in the y positions: 806 - 16 = 790; 790 - 16 = 
774; 774 - 16 = 758; 758 - 16 = 742; ...
This looks like a line was skipped at position 758.

However, as explained multiple times, the concept of a line doesn't 
exist in PDF.

See for instance: 
http://stackoverflow.com/questions/16392886/need-to-extract-text-line-by-line-from-pdf-using-itextsharp-and-put-enter-at-eve

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to