The null is a bug in the sense that the pdf has an empty ToUnicode map and 
iText expected a good map. It's fixed in the SVN.

Paulo
  ----- Original Message ----- 
  From: 1T3XT BVBA 
  To: [email protected] 
  Sent: Saturday, February 18, 2012 3:19 PM
  Subject: Re: [iText-questions] iText 5.1.4 text extraction issue


  On 18/02/2012 15:25, sselvia wrote:
  > I sent a request to get pricing for commercial support.  I brought up the 
PDF
  > in Preview and saved the page in question to a separate PDF.  When I process
  > the file after the "Save As" the word "null" is returned by the getText()
  > method.
  > http://itext-general.2136553.n4.nabble.com/file/n4399827/iTextTest.pdf
  > iTextTest.pdf
  >
  > Thanks for all of the help to this point.

  I didn't see the question on the internal support list yet.
  Am I correct that you posted the question twice, once on the free list,
  once on the customer's list?
  Because I can't see it on the customer's support list.

  Anyway, this is the syntax inside your PDF:

  q Q q 0 0.03 612 791.97 re W n /Gs1 gs /Cs1 cs 1 1 1 sc /Gs2 gs 0
  0.029999 612 791.97
  re f Q q 0 594.42 612 197.58 re W n /Gs2 gs q 619.2599 0 0 198.48
  -3.600012 594.42
  cm /Im1 Do Q Q q 0 395.94 612 198.48 re W n /Gs2 gs q 619.2599 0 0
  198.48 -3.600012 395.94
  cm /Im2 Do Q Q q 0 197.46 612 198.48 re W n /Gs2 gs q 619.2599 0 0
  198.48 -3.600012 197.46
  cm /Im3 Do Q Q q 0 0.03 612 197.43 re W n /Gs2 gs q 619.2599 0 0 198.42
  -3.600012 -0.959936
  cm /Im4 Do Q Q q 0 0 612 792 re W n 0.754 sc /Gs1 gs /Gs2 gs BT -0.0004 Tc
  28.02 0 0 28.02 69.12 742.98 Tm /TT1.1 1 Tf (!"#$%&'\(%\)*+) Tj 0 Tc ET BT
  28.02 0 0 28.02 211.5008 742.98 Tm /G2 1 Tf <0001> Tj ET BT -0.0001 Tc
  28.02 0 0 28.02 217.8614 742.98
  Tm /TT1.1 1 Tf (\),) Tj 0 Tc ET BT 28.02 0 0 28.02 241.74 742.98 Tm /G2 1
  Tf <0001> Tj ET BT 0.0005 Tc 28.02 0 0 28.02 248.1006 742.98 Tm /TT1.1 1 Tf
  (-.&.*\() Tj 0 Tc ET BT 28.02 0 0 28.02 328.6216 742.98 Tm /G2 1 Tf <0001>
  Tj ET BT -0.0011 Tc 28.02 0 0 28.02 334.9822 742.98 Tm /TT1.1 1 Tf
  (/012\)$\)3%&)
  Tj 0 Tc ET BT 28.02 0 0 28.02 459.5423 742.98 Tm /G2 1 Tf <0001> Tj ET BT
  -0.0001 Tc 28.02 0 0 28.02 465.9028 742.98 Tm /TT1.1 1 Tf (42.*1+) Tj 0 Tc
  ET 0 sc BT -0.0004 Tc 28.02 0 0 28.02 67.98239 744.1204 Tm /TT1.1 1 Tf
  (!"#$%&'\(%\)*+)
  Tj 0 Tc ET BT 28.02 0 0 28.02 210.3632 744.1204 Tm /G2 1 Tf <0001> Tj ET BT
  -0.0001 Tc 28.02 0 0 28.02 216.7238 744.1204 Tm /TT1.1 1 Tf (\),) Tj 0 Tc
  ET BT 28.02 0 0 28.02 240.6024 744.1204 Tm /G2 1 Tf <0001> Tj ET BT 0.0005
  Tc 28.02 0 0 28.02 246.9629 744.1204 Tm /TT1.1 1 Tf (-.&.*\() Tj 0 Tc ET BT
  28.02 0 0 28.02 327.484 744.1204 Tm /G2 1 Tf <0001> Tj ET BT -0.0011 Tc
  28.02 0 0 28.02 333.8445 744.1204
  Tm /TT1.1 1 Tf (/012\)$\)3%&) Tj 0 Tc ET BT 28.02 0 0 28.02 458.4047
  744.1204
  Tm /G2 1 Tf <0001> Tj ET BT -0.0001 Tc 28.02 0 0 28.02 464.7652 744.1204 Tm
  /TT1.1 1 Tf (42.*1+) Tj 0 Tc ET 0.754 sc BT -0.0023 Tc 28.02 0 0 28.02
  112.5062 709.3196
  Tm /TT1.1 1 Tf (\)*) Tj 0 Tc ET BT 28.02 0 0 28.02 142.566 709.3196 Tm /G2
  1 Tf <0001> Tj ET BT -0.0006 Tc 28.02 0 0 28.02 148.9266 709.3196 Tm /TT1.1
  1 Tf (\(5.) Tj 0 Tc ET BT 28.02 0 0 28.02 187.7455 709.3196 Tm /G2 1 Tf
  <0001>
  Tj ET BT 0.0018 Tc 28.02 0 0 28.02 194.106 709.3196 Tm /TT1.1 1 Tf (6',.)
  Tj 0 Tc ET BT 28.02 0 0 28.02 244.2646 709.3196 Tm /G2 1 Tf <0001> Tj ET BT
  -0.0001 Tc 28.02 0 0 28.02 250.6252 709.3196 Tm /TT1.1 1 Tf (7%.$1) Tj 0 Tc
  ET BT 28.02 0 0 28.02 308.1054 709.3196 Tm /G2 1 Tf <0001> Tj ET BT -0.0003
  Tc 28.02 0 0 28.02 314.4043 709.3196 Tm /TT1.1 1 Tf (8+\(%"'\(.) Tj 0 Tc ET
  BT 28.02 0 0 28.02 416.2234 709.3196 Tm /G2 1 Tf <0001> Tj ET BT 0.0002 Tc
  28.02 0 0 28.02 422.5839 709.3196 Tm /TT1.1 1 Tf (,\)2) Tj 0 Tc ET BT
  28.02 0 0 28.02 456.4853 709.3196
  Tm /G2 1 Tf <0001> Tj ET BT 0.0006 Tc 28.02 0 0 28.02 462.8458 709.3196 Tm
  /TT1.1 1 Tf (\(5.) Tj 0 Tc ET BT 28.02 0 0 28.02 501.7852 709.3196 Tm /G2
  1 Tf <0001> Tj ET 0 sc BT -0.0023 Tc 28.02 0 0 28.02 111.3658 710.46 Tm
  /TT1.1
  1 Tf (\)*) Tj 0 Tc ET BT 28.02 0 0 28.02 141.4256 710.46 Tm /G2 1 Tf <0001>
  Tj ET BT -0.0006 Tc 28.02 0 0 28.02 147.7861 710.46 Tm /TT1.1 1 Tf (\(5.)
  Tj 0 Tc ET BT 28.02 0 0 28.02 186.6051 710.46 Tm /G2 1 Tf <0001> Tj ET BT
  0.0018 Tc 28.02 0 0 28.02 192.9656 710.46 Tm /TT1.1 1 Tf (6',.) Tj 0 Tc ET
  BT 28.02 0 0 28.02 243.127 710.46 Tm /G2 1 Tf <0001> Tj ET BT -0.0001 Tc
  28.02 0 0 28.02 249.4875 710.46
  Tm /TT1.1 1 Tf (7%.$1) Tj 0 Tc ET BT 28.02 0 0 28.02 306.9678 710.46 Tm /G2
  1 Tf <0001> Tj ET BT -0.0003 Tc 28.02 0 0 28.02 313.2667 710.46 Tm /TT1.1
  1 Tf (8+\(%"'\(.) Tj 0 Tc ET BT 28.02 0 0 28.02 415.0858 710.46 Tm /G2 1 Tf
  <0001> Tj ET BT 0.0002 Tc 28.02 0 0 28.02 421.4463 710.46 Tm /TT1.1 1 Tf
  (,\)2)
  Tj 0 Tc ET BT 28.02 0 0 28.02 455.3477 710.46 Tm /G2 1 Tf <0001> Tj ET BT
  0.0006 Tc 28.02 0 0 28.02 461.7082 710.46 Tm /TT1.1 1 Tf (\(5.) Tj 0 Tc ET
  BT 28.02 0 0 28.02 500.6476 710.46 Tm /G2 1 Tf <0001> Tj ET 0.754 sc BT
  0.0004
  Tc 28.02 0 0 28.02 209.7076 675.7208 Tm /TT1.1 1 Tf (9&:$';'5') Tj 0 Tc ET
  BT 28.02 0 0 28.02 338.2269 675.7208 Tm /G2 1 Tf <0001> Tj ET BT -0.0004 Tc
  28.02 0 0 28.02 344.5874 675.7208 Tm /TT1.1 1 Tf (-%<.2) Tj 0 Tc ET 0 sc BT
  0.0004 Tc 28.02 0 0 28.02 208.5671 676.8612 Tm /TT1.1 1 Tf (9&:$';'5') Tj
  0 Tc ET BT 28.02 0 0 28.02 337.0865 676.8612 Tm /G2 1 Tf <0001> Tj ET BT
  -0.0004
  Tc 28.02 0 0 28.02 343.447 676.8612 Tm /TT1.1 1 Tf (-%<.2) Tj 0 Tc ET 1 sc
  BT 0.0008 Tc 13.98 0 0 13.98 393.12 29.1 Tm /TT3.0 1 Tf [ (SDI ) -1
  (Environmental )
  -1 (Services, ) -1 (Inc. ) ] TJ 0 Tc ET q 97.44208 0 0 65.99999 44.99999
  51.48006
  cm /Im5 Do Q BT 0.0001 Tc 13.98 0 0 13.98 45.18 28.86 Tm /TT3.0 1 Tf [
  (Putnam )
  1 (County ) 1 (Environmental ) 1 (Council, ) 1 (Inc.) -3 ( ) ] TJ 0 Tc ET
  BT -0.0007 Tc 16.02 0 0 16.02 274.56 119.58 Tm /TT3.0 1 Tf [ (June ) 2
  (2010)
  ] TJ 0 Tc ET q 103.7244 0 0 64.49999 488.16 52.98005 cm /Im6 Do Q Q


  This is the result when iText parses this syntax for text:

  Implications nullof nullRecent nullHydrologic nullTrends
  Implications nullof nullRecent nullHydrologic nullTrends
  on nullthe nullSafe nullYield nullEstimate nullfor nullthe null
  on nullthe nullSafe nullYield nullEstimate nullfor nullthe null
  Ocklawaha nullRiver
  Ocklawaha nullRiver
  June 2010
  SDI Environmental Services, Inc.
  Putnam County Environmental Council, Inc.


  Why is some of the text duplicated?
  Because the text occurs twice in the PDF syntax.
  For instance: (!"#$%&'\(%\)*+) stands for "Implications".
  It occurs on two places:
  coordinate 69.12,        742.98; and
  coordinate 67.98239, 744.1204.

  Because the two separate instances of the word are so close to each
  other, you see it only once in the PDF, but that doesn't mean it isn't
  there twice.

  As for the 'null', you have some odd String <0001> that separates the
  words in those first sentences.

  Hope this helps.

  ------------------------------------------------------------------------------
  Virtualization & Cloud Management Using Capacity Planning
  Cloud computing makes use of virtualization - but cloud computing
  also focuses on allowing computing to be delivered as a service.
  http://www.accelacomm.com/jaw/sfnl/114/51521223/
  _______________________________________________
  iText-questions mailing list
  [email protected]
  https://lists.sourceforge.net/lists/listinfo/itext-questions

  iText(R) is a registered trademark of 1T3XT BVBA.
  Many questions posted to this list can (and will) be answered with a 
reference to the iText book: http://www.itextpdf.com/book/
  Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to