I don't have time to actually test this right now (sorry - big deadline going on).
 
But you might want to check the current transformation matrix in the graphics state:
 

PdfContentStreamProcessor.gs().ctm

 

see if this is different from one piece of text to the next.

 

The text matrix is actually applied to the CTM to determine actual on-screen location.  For the simple parser, we aren't doing this additional transformation because it isn't necessary for the really simple stuff.

 

But if there are adjustments being made to the CTM between text operations, you could see the behavior you are seeing.

 

 

If the CTM is indeed changing, then the fix is pretty easy:  just take the text matrices and multiply them by the CTM, then use the resulting CTM for your spatial analysis:

 

Matrix paperSpaceTextMatrix = textMatrix.multiply(ctm);

 

 

Unlike the text matrix, the CTM can (and often does) involve rotational components.  In your specific file, I doubt that is the case - just keep it in mind...  The implication here is that you want to multiply out to paper space *after* you've done all of the text processing that you possibly can (determining where spaces occur, etc...).  Otherwise you have to do some pretty funky handling to determine rotated inter-character spacing and line breaks.

 

hope that helps - if the CTM isn't changing, let me know and I'll try to take a look at the file and see what's up.  What would help is if you sent me the actual content stream of the second page.  PdfContentReaderTool will get that for you (along with a ton of other stuff).

 

- K


 
----------------------- Original Message -----------------------
  
From: "Neil Aggarwal" <[EMAIL PROTECTED]>
To: "'Post all your questions about iText here'" <[email protected]>
Cc: 
Date: Wed, 19 Nov 2008 23:06:35 -0600
Subject: [iText-questions] Got the same x and y locations for different pieces of text
  
Kevin:

This is strange.  I am trying to process this
file through my text parser:
http://www.dallascad.org/forms/2008rate.pdf

I am getting the same x and y coordinates for
these two pieces of text on page 2 (They are in the notes on the
bottom of the page):

Taxes for this entity are collected by the Dallas County Tax Office.
If the optional homestead exemption is offered, it must be a minimum of
$5,000.

Both of them have x=101.16 and y=198.00116 for coordinates.  If they have
the
same coordinates, shouldn't they overlap?

Looking at the page rendered in Adobe reader, the two pieces of text do
not overlap each other. The look like they should have different y values.

Did I goof something up?

I am attaching a piece of test code that illustrates what I am seeing.

Any input would be helpful.

Th anks,
    Neil
    
--
Neil Aggarwal, (832)245-7314, www.JAMMConsulting.com
Eliminate junk email and reclaim your inbox.
Visit http://www.spammilter.com for details.


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url="">

_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Reply via email to