I'm working on the HTML Parser and I've came across an odd bug which iText randomly squashes together text that's being parsed, for example:
 
"This is my text line"
 
Would appear as:
 
"This ismy text line"
 
in the PDF, i've tracked down the problem to the char[] ch variable in the characters method in SAXiTextHandler. For some reason the characters method executes and breaks up the text randomly (I assume where its buffer fills up) however it'll put a new line in front of the text randomly which is what is causing the problem. Where is the content data (ch) being gathered from? Is this something that's pulled in directly from SAX?
 
Here's an example of the HTML code I had and the debug I got back from iText:
HTML:
<html>
<head>
</head>
<body>
Text Before Bold. <b>Badda Bing Badda Boom Badda BOLD!</b> Test Now Normal, no line breaks.
This is the next line, but there should still not be a line break.
Once again the next line, it seems that there is a problem with text ramming up against each other.
One Two Three Four Five Six Seven Eight Nine Ten Eleven Tweleve Thirteen Fourteen Fifteen Sixteen
 
Seventeen Eighteen Nineteen Twenty TwentyOne TwentyTwo TwentyThree TwentyFour TwentyFive<p>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
</body>
</html>
 
DEBUG:
Starting tag: itext
Starting tag: paragraph
Content: 'Text Before Bold. '
Starting tag: phrase
* Attribute [fontstyle]: bold
Content: 'Badda Bing Badda Boom Badda BOLD!'
Stop: phrase
Content: ' Test'
Content: '
Now Normal, no line breaks. This is the next line, but there should'
Content: '
still not be a line break. Once again the next line, it seems that'
Content: '
there is a problem with text ramming up against each other. One Two'
Content: '
Three Four Five Six Seven Eight Nine Ten Eleven Tweleve Thirteen'
Content: '
Fourteen Fifteen Sixteen Seventeen Eighteen Nineteen Twenty'
Content: '
TwentyOne TwentyTwo TwentyThree TwentyFour TwentyFive'
Stop: paragraph
Starting tag: paragraph
Content: '1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24'
Content: '
25 26 27 28 29 30 31 32 33 34 35 36'
Stop: paragraph
Stop: itext
 
 
 

Reply via email to