Hi,
I am doing a project
that requires producing some html pages and some pdf pages.
My workmate chose your itext library.
We write in html and then pass the html pages through itext.
I thought that it the following may interest you.
My workmate chose your itext library.
We write in html and then pass the html pages through itext.
I thought that it the following may interest you.
1. On the whole we
have had a positive experience.
2. My workmate used
tidy as a pre-filter to itext (to convert the html to xhtml). This model is
mentioned in your site.
As we are writing our own pages, I decided that it would be better to write xhtml, rather then use tidy.
There were a couple of reasons for this.
a. While testing I had a problem with my internet connection, and then there was a problem getting the dtd page that tidy adds
b. Tidy writes things to the console, this is great when there is an error, but not when running a program.
c. Tidy is an extra level, this costs cpu time and adds a potential for bugs (this does not mean that I found any).
For development, tidy can be useful to check that there are no errors in the xhtml - so we may leave as an option, "&tidy=run;" in url.
(Remember that writing xhtml <br/> causes problems in some browsers - who knows why - but <br /> is ok.)
As we are writing our own pages, I decided that it would be better to write xhtml, rather then use tidy.
There were a couple of reasons for this.
a. While testing I had a problem with my internet connection, and then there was a problem getting the dtd page that tidy adds
b. Tidy writes things to the console, this is great when there is an error, but not when running a program.
c. Tidy is an extra level, this costs cpu time and adds a potential for bugs (this does not mean that I found any).
For development, tidy can be useful to check that there are no errors in the xhtml - so we may leave as an option, "&tidy=run;" in url.
(Remember that writing xhtml <br/> causes problems in some browsers - who knows why - but <br /> is ok.)
A reason to
use tidy, or rather the dtd it adds, is to allow
etc.
3. I found a few of
negative-features in itext. (As recommended in the itext site, I used the
HtmlWriter for debug).
For most I found a "solution", but maybe the cases can help in debug (if you do solve, I would like the results)
3.1 problem: html compounds spaces, itext does not
solution: I extended "SAXmyHtmlHandler" and placed the following override
For most I found a "solution", but maybe the cases can help in debug (if you do solve, I would like the results)
3.1 problem: html compounds spaces, itext does not
solution: I extended "SAXmyHtmlHandler" and placed the following override
public void characters(char[]
ch, int start, int length)
{
String content = new String(ch, start, length);
content = content.replaceAll("\\s+"," ");
super.characters(content.toCharArray(), 0, content.length());
}
{
String content = new String(ch, start, length);
content = content.replaceAll("\\s+"," ");
super.characters(content.toCharArray(), 0, content.length());
}
3.2
problem: free text before a list can become part of a list
item
e.g.
<html>
<body>
hello
<ol>
<li>there</li>
</ol>
</body>
</html>
e.g.
<html>
<body>
hello
<ol>
<li>there</li>
</ol>
</body>
</html>
is translated to
<html>
<head>
<!-- Producer: iTextXML by lowagie.com -->
<!-- CreationDate: Thu Aug 12 22:24:42 GMT+02:00 2004 -->
</head>
<body leftmargin="36.0" rightmargin="36.0" topmargin="36.0" bottommargin="36.0">
<ol>
<li style="font-family: unknown; ">hellothere
</li>
</ol>
</body>
</html>
<head>
<!-- Producer: iTextXML by lowagie.com -->
<!-- CreationDate: Thu Aug 12 22:24:42 GMT+02:00 2004 -->
</head>
<body leftmargin="36.0" rightmargin="36.0" topmargin="36.0" bottommargin="36.0">
<ol>
<li style="font-family: unknown; ">hellothere
</li>
</ol>
</body>
</html>
solution (partial - I hope that you
can give a better one, I have not yet had time to
debug)
<html>
<body>
<span>hello</span>
<ol>
<li>there</li>
</ol>
</body>
</html>
gives
<html>
<body>
<span>hello</span>
<ol>
<li>there</li>
</ol>
</body>
</html>
gives
<html>
<head>
<!-- Producer: iTextXML by lowagie.com -->
<!-- CreationDate: Thu Aug 12 22:26:05 GMT+02:00 2004 -->
</head>
<body leftmargin="36.0" rightmargin="36.0" topmargin="36.0" bottommargin="36.0">
<span style="font-family: unknown; ">hello
</span>
<ol>
<li style="font-family: unknown; ">there
</li>
</ol>
</body>
</html>
<head>
<!-- Producer: iTextXML by lowagie.com -->
<!-- CreationDate: Thu Aug 12 22:26:05 GMT+02:00 2004 -->
</head>
<body leftmargin="36.0" rightmargin="36.0" topmargin="36.0" bottommargin="36.0">
<span style="font-family: unknown; ">hello
</span>
<ol>
<li style="font-family: unknown; ">there
</li>
</ol>
</body>
</html>
The html in the browser and the pdf
is ok.
3.3
problem: the following list comes out correctly in
html
<html>
<body>
<ol>
<li>line 1</li>
<li><ul>
<li>line 2.0</li>
<li>line 2.1</li>
</ul>
</li>
<li>3</li>
</ol>
</body>
</html>
<html>
<body>
<ol>
<li>line 1</li>
<li><ul>
<li>line 2.0</li>
<li>line 2.1</li>
</ul>
</li>
<li>3</li>
</ol>
</body>
</html>
in pdf - the "2." is missing off the
second line (if using bullets, instead of numbers - this may be good
:-))
solution: In the pdf I could solve this by doing (not a nice solution)
<html>
<body>
<ol>
<li>line 1</li>
<li><font color="#ffffff">a</font><ul>
<li>line 2.0</li>
<li>line 2.1</li>
</ul>
</li>
<li>3</li>
</ol>
</body>
</html>
solution: In the pdf I could solve this by doing (not a nice solution)
<html>
<body>
<ol>
<li>line 1</li>
<li><font color="#ffffff">a</font><ul>
<li>line 2.0</li>
<li>line 2.1</li>
</ul>
</li>
<li>3</li>
</ol>
</body>
</html>
HOWEVER in the html version that
was produced, I got <span>a</span>, instead of the <font>
tag
(using <span style="color:#ffffff">a</span>, gave the same result !)
(using <span style="color:#ffffff">a</span>, gave the same result !)
3.4 problem:
embedding a multi-level list in a table, removes the bullets/
numbers from all but first level in pdf
<html>
<body>
<table>
<tr>
<td>
<ol>
<li>line 1</li>
<li><span>embedded list</span>
<ol>
<li>line 2.0</li>
<li>line 2.1</li>
</ol>
</li>
<li>3</li>
</ol>
</td>
</tr>
</table>
</body>
</html>
<body>
<table>
<tr>
<td>
<ol>
<li>line 1</li>
<li><span>embedded list</span>
<ol>
<li>line 2.0</li>
<li>line 2.1</li>
</ol>
</li>
<li>3</li>
</ol>
</td>
</tr>
</table>
</body>
</html>
note the SMALL change in the html
output
<html>
<head>
<!-- Producer: iTextXML by lowagie.com -->
<!-- CreationDate: Thu Aug 12 22:46:06 GMT+02:00 2004 -->
</head>
<body leftmargin="36.0" rightmargin="36.0" topmargin="36.0" bottommargin="36.0">
<table width="80.0%" align="Center" cellpadding="0.0" cellspacing="0.0" border="1.0">
<tr>
<td border="0.5">
<ol>
<li style="font-family: unknown; ">line 1
</li>
<li style="font-family: unknown; ">embedded list <!--- NO SPAN TAGS, no visual change in browser !!!!!!!!!-->
<ol>
<li>line 2.0
</li>
<li>line 2.1
</li>
</ol>
</li>
<li style="font-family: unknown; ">3
</li>
</ol>
</td>
</tr>
</table>
</body>
</html>
<html>
<head>
<!-- Producer: iTextXML by lowagie.com -->
<!-- CreationDate: Thu Aug 12 22:46:06 GMT+02:00 2004 -->
</head>
<body leftmargin="36.0" rightmargin="36.0" topmargin="36.0" bottommargin="36.0">
<table width="80.0%" align="Center" cellpadding="0.0" cellspacing="0.0" border="1.0">
<tr>
<td border="0.5">
<ol>
<li style="font-family: unknown; ">line 1
</li>
<li style="font-family: unknown; ">embedded list <!--- NO SPAN TAGS, no visual change in browser !!!!!!!!!-->
<ol>
<li>line 2.0
</li>
<li>line 2.1
</li>
</ol>
</li>
<li style="font-family: unknown; ">3
</li>
</ol>
</td>
</tr>
</table>
</body>
</html>
solution 1: (did not work)
I placed a <span> around the lists - and, in pdf, I got
line 1embedded listline 2.0line 2.13
(yes one line with missing numbers)
html was, still, fine.
solution 2:add <span>
/ <div> etc. - they were just removed in html, in pdf gave a big empty
space, or were removed.
solution 3: place inner list in a
table (I think this proves that I missed understanding something
pdf)
<html>
<body>
<table>
<tr>
<td>
<ol>
<li>line 1</li>
<li><span>embedded list</span>
<table><tr><td>
<ol>
<li>line 2.0</li>
<li>line 2.1</li>
</ol>
</td></tr></table>
</li>
<li>3</li>
</ol>
</td>
</tr>
</table>
</body>
</html>
<html>
<body>
<table>
<tr>
<td>
<ol>
<li>line 1</li>
<li><span>embedded list</span>
<table><tr><td>
<ol>
<li>line 2.0</li>
<li>line 2.1</li>
</ol>
</td></tr></table>
</li>
<li>3</li>
</ol>
</td>
</tr>
</table>
</body>
</html>
result in pdf ---- the whole inner
table was removed !!!! (in html a border was defined, else ok, besides extra
space.)
solution 4 (desperate - but ALMOST works
in pdf):
<html>
<body>
<table>
<tr>
<td>
1. line 1<br />
2.
<table cellspacing="0" cellpadding="0"><tr><td>
<ol>
<li>line 2.0</li>
<li>line 2.1</li>
</ol>
</td></tr></table>
3. 3<br />
</td>
</tr>
</table>
</body>
</html>
<html>
<body>
<table>
<tr>
<td>
1. line 1<br />
2.
<table cellspacing="0" cellpadding="0"><tr><td>
<ol>
<li>line 2.0</li>
<li>line 2.1</li>
</ol>
</td></tr></table>
3. 3<br />
</td>
</tr>
</table>
</body>
</html>
in html you get (notice the
extras)
<html>
<head>
<!-- Producer: iTextXML by lowagie.com -->
<!-- CreationDate: Thu Aug 12 23:07:16 GMT+02:00 2004 -->
</head>
<body leftmargin="36.0" rightmargin="36.0" topmargin="36.0" bottommargin="36.0">
<table width="80.0%" align="Center" cellpadding="0.0" cellspacing="0.0" border="1.0">
<tr>
<td border="0.5" colspan="3">1. line 1<br />2.
</td>
</tr>
<tr>
<td border="0.5">
</td>
<td border="0.5">
<ol>
<li style="font-family: unknown; ">line 2.0
</li>
<li style="font-family: unknown; ">line 2.1
</li>
</ol>
</td>
<td border="0.5">
</td>
</tr>
<tr>
<td border="0.5" colspan="3">
</td>
</tr>
<tr>
<td border="0.5" colspan="3">3. 3
</td>
</tr>
<tr>
<td border="0.5" colspan="3"><br />
</td>
</tr>
</table>
</body>
</html>
<html>
<head>
<!-- Producer: iTextXML by lowagie.com -->
<!-- CreationDate: Thu Aug 12 23:07:16 GMT+02:00 2004 -->
</head>
<body leftmargin="36.0" rightmargin="36.0" topmargin="36.0" bottommargin="36.0">
<table width="80.0%" align="Center" cellpadding="0.0" cellspacing="0.0" border="1.0">
<tr>
<td border="0.5" colspan="3">1. line 1<br />2.
</td>
</tr>
<tr>
<td border="0.5">
</td>
<td border="0.5">
<ol>
<li style="font-family: unknown; ">line 2.0
</li>
<li style="font-family: unknown; ">line 2.1
</li>
</ol>
</td>
<td border="0.5">
</td>
</tr>
<tr>
<td border="0.5" colspan="3">
</td>
</tr>
<tr>
<td border="0.5" colspan="3">3. 3
</td>
</tr>
<tr>
<td border="0.5" colspan="3"><br />
</td>
</tr>
</table>
</body>
</html>
solution 5 (the other way
round):
<html>
<body>
<table>
<tr>
<td>
<ol>
<li>line 1</li>
<li>embedded list\n
* line 2.0\n
* line 2.1
</li>
<li>3</li>
</ol>
</td>
</tr>
</table>
</body>
</html>
<html>
<body>
<table>
<tr>
<td>
<ol>
<li>line 1</li>
<li>embedded list\n
* line 2.0\n
* line 2.1
</li>
<li>3</li>
</ol>
</td>
</tr>
</table>
</body>
</html>
in pdf - I got big spaces
where the <br /> were.
In any case, we are
using your library, and it is helping us a lot.
Thanks for your good work.
Thanks for your good work.
Be Well and
Happy.
Shalom Deitch.
Shalom Deitch.
