I'm working on the
HTML Parser and I've came across an odd bug which iText randomly squashes
together text that's being parsed, for example:
"This is my text
line"
Would appear
as:
"This ismy text
line"
in the PDF, i've
tracked down the problem to the char[] ch variable in the characters method in
SAXiTextHandler. For some reason the characters method executes and breaks up
the text randomly (I assume where its buffer fills up) however it'll put a new
line in front of the text randomly which is what is causing the problem.
Where is the content data (ch) being gathered from? Is this something that's
pulled in directly from SAX?
Here's an example of
the HTML code I had and the debug I got back from iText:
HTML:
<html>
<head>
</head>
<body>
Text Before Bold. <b>Badda Bing Badda Boom Badda BOLD!</b> Test Now Normal, no line breaks.
This is the next line, but there should still not be a line break.
Once again the next line, it seems that there is a problem with text ramming up against each other.
One Two Three Four Five Six Seven Eight Nine Ten Eleven Tweleve Thirteen Fourteen Fifteen Sixteen
<head>
</head>
<body>
Text Before Bold. <b>Badda Bing Badda Boom Badda BOLD!</b> Test Now Normal, no line breaks.
This is the next line, but there should still not be a line break.
Once again the next line, it seems that there is a problem with text ramming up against each other.
One Two Three Four Five Six Seven Eight Nine Ten Eleven Tweleve Thirteen Fourteen Fifteen Sixteen
Seventeen Eighteen Nineteen Twenty TwentyOne
TwentyTwo TwentyThree TwentyFour TwentyFive<p>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
</body>
</html>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
</body>
</html>
DEBUG:
Starting tag:
itext
Starting tag:
paragraph
Content: 'Text Before Bold. '
Content: 'Text Before Bold. '
Starting tag:
phrase
* Attribute
[fontstyle]: bold
Content: 'Badda Bing
Badda Boom Badda BOLD!'
Stop:
phrase
Content: ' Test'
Content: '
Now Normal, no line breaks. This is the next line, but there should'
Now Normal, no line breaks. This is the next line, but there should'
Content: '
still not be a line break. Once again the next line, it seems that'
still not be a line break. Once again the next line, it seems that'
Content: '
there is a problem with text ramming up against each other. One Two'
there is a problem with text ramming up against each other. One Two'
Content: '
Three Four Five Six Seven Eight Nine Ten Eleven Tweleve Thirteen'
Three Four Five Six Seven Eight Nine Ten Eleven Tweleve Thirteen'
Content: '
Fourteen Fifteen Sixteen Seventeen Eighteen Nineteen Twenty'
Fourteen Fifteen Sixteen Seventeen Eighteen Nineteen Twenty'
Content: '
TwentyOne TwentyTwo TwentyThree TwentyFour TwentyFive'
TwentyOne TwentyTwo TwentyThree TwentyFour TwentyFive'
Stop:
paragraph
Starting tag:
paragraph
Content: '1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24'
Content: '
25 26 27 28 29 30 31 32 33 34 35 36'
25 26 27 28 29 30 31 32 33 34 35 36'
Stop:
paragraph
Stop:
itext
