Title: Message
Hi,
 
I am doing a project that requires producing some html pages and some pdf pages.
My workmate chose your itext library.
We write in html and then pass the html pages through itext.
I thought that it the following may interest you.
 
1. On the whole we have had a positive experience.
 
2. My workmate used tidy as a pre-filter to itext (to convert the html to xhtml). This model is mentioned in your site.
   As we are writing our own pages, I decided that it would be better to write xhtml, rather then use tidy.
   There were a couple of reasons for this.
     a. While testing I had a problem with my internet connection, and then there was a problem getting the dtd page that tidy adds
     b. Tidy writes things to the console, this is great when there is an error, but not when running a program.
     c. Tidy is an extra level, this costs cpu time and adds a potential for bugs (this does not mean that I found any).
   For development, tidy can be useful to check that there are no errors in the xhtml - so we may leave as an option, "&tidy=run;" in url.
   (Remember that writing xhtml <br/> causes problems in some browsers - who knows why - but <br /> is ok.)
 
  A reason to use tidy, or rather the dtd it adds, is to allow &nbsp; etc.
 
3. I found a few of negative-features in itext. (As recommended in the itext site, I used the HtmlWriter for debug).
   For most I found a "solution", but maybe the cases can help in debug (if you do solve, I would like the results)
 
   3.1 problem: html compounds spaces, itext does not
       solution: I extended "SAXmyHtmlHandler" and placed the following override
 
        public void characters(char[] ch, int start, int length)
        {
            String content = new String(ch, start, length);
            content = content.replaceAll("\\s+"," ");
            super.characters(content.toCharArray(), 0, content.length());
        }
 
   3.2 problem: free text before a list can become part of a list item
 
                e.g.
                    <html>   
                    <body>
                       hello
                       <ol>
                         <li>there</li>
                       </ol>
                    </body>
                    </html>
 
 
               is translated to
 
                <html>
                    <head>
                        <!-- Producer: iTextXML by lowagie.com -->
                        <!-- CreationDate: Thu Aug 12 22:24:42 GMT+02:00 2004 -->
                    </head>
                    <body leftmargin="36.0" rightmargin="36.0" topmargin="36.0" bottommargin="36.0">
                        <ol>
                            <li style="font-family: unknown; ">hellothere
                            </li>
                        </ol>
                    </body>
                </html>
 
       solution (partial - I hope that you can give a better one, I have not yet had time to debug)
            <html>   
            <body>
               <span>hello</span>
               <ol>
                 <li>there</li>
               </ol>
            </body>
            </html>
       gives
 
        <html>
            <head>
                <!-- Producer: iTextXML by lowagie.com -->
                <!-- CreationDate: Thu Aug 12 22:26:05 GMT+02:00 2004 -->
            </head>
            <body leftmargin="36.0" rightmargin="36.0" topmargin="36.0" bottommargin="36.0">
                <span style="font-family: unknown; ">hello
                </span>
                <ol>
                    <li style="font-family: unknown; ">there
                    </li>
                </ol>
            </body>
        </html>
 
       The html in the browser and the pdf is ok.
 
   3.3 problem: the following list comes out correctly in html
        <html>   
        <body>
          <ol>
            <li>line 1</li>
            <li><ul>
                 <li>line 2.0</li>
                 <li>line 2.1</li>
                </ul>
            </li>
            <li>3</li>
          </ol>
        </body>
        </html>
 
      in pdf - the "2." is missing off the second line (if using bullets, instead of numbers - this may be good :-))
  
      solution: In the pdf I could solve this by doing (not a nice solution)
      <html>   
        <body>
          <ol>
            <li>line 1</li>
              <li><font color="#ffffff">a</font><ul>
                 <li>line 2.0</li>
                 <li>line 2.1</li>
                </ul>
            </li>
            <li>3</li>
          </ol>
        </body>
      </html>
 
      HOWEVER in the html version that was produced, I got <span>a</span>, instead of the <font> tag
      (using <span style="color:#ffffff">a</span>, gave the same result !)
 
  3.4 problem: embedding a multi-level list in a table, removes the bullets/ numbers from all but first level in pdf
 
        <html>   
        <body>
         <table>
          <tr>
           <td>
              <ol>
                <li>line 1</li>
                  <li><span>embedded list</span>
                    <ol>
                     <li>line 2.0</li>
                     <li>line 2.1</li>
                    </ol>
                </li>
                <li>3</li>
              </ol>
            </td>
           </tr>
          </table>
        </body>
        </html>
 
      note the SMALL change in the html output
        <html>
            <head>
                <!-- Producer: iTextXML by lowagie.com -->
                <!-- CreationDate: Thu Aug 12 22:46:06 GMT+02:00 2004 -->
            </head>
            <body leftmargin="36.0" rightmargin="36.0" topmargin="36.0" bottommargin="36.0">
                <table width="80.0%" align="Center" cellpadding="0.0" cellspacing="0.0" border="1.0">
                    <tr>
                        <td border="0.5">
                            <ol>
                                <li style="font-family: unknown; ">line 1
                                </li>
                                <li style="font-family: unknown; ">embedded list      <!--- NO SPAN TAGS, no visual change in browser !!!!!!!!!-->
                                    <ol>
                                        <li>line 2.0
                                        </li>
                                        <li>line 2.1
                                        </li>
                                    </ol>
                                </li>
                                <li style="font-family: unknown; ">3
                                </li>
                            </ol>
                        </td>
                    </tr>
                </table>
            </body>
        </html>
 

      solution 1: (did not work)
        I placed a <span> around the lists - and, in pdf,  I got
          line 1embedded listline 2.0line 2.13
        (yes one line with missing numbers)
        html was, still, fine.
 
       solution 2:add <span> / <div> etc. - they were just removed in html, in pdf gave a big empty space, or were removed.
 
       solution 3: place inner list in a table (I think this proves that I missed understanding something pdf)
            <html>   
            <body>
             <table>
              <tr>
               <td>
                  <ol>
                    <li>line 1</li>
                      <li><span>embedded list</span>
                        <table><tr><td>
                        <ol>
                         <li>line 2.0</li>
                         <li>line 2.1</li>
                        </ol>
                        </td></tr></table>
                    </li>
                    <li>3</li>
                  </ol>
                </td>
               </tr>
              </table>
            </body>
            </html>
 
       result in pdf ---- the whole inner table was removed !!!! (in html a border was defined, else ok, besides extra space.)
 
      solution 4 (desperate - but ALMOST works in pdf):
        <html>   
        <body>
         <table>
          <tr>
           <td>
                1. line 1<br />
                2.
                 <table cellspacing="0" cellpadding="0"><tr><td>
                    <ol>
                     <li>line 2.0</li>
                     <li>line 2.1</li>
                    </ol>
                 </td></tr></table>
                3. 3<br />
            </td>
           </tr>
          </table>
        </body>
        </html>
 
      in html you get (notice the extras)
        <html>
            <head>
                <!-- Producer: iTextXML by lowagie.com -->
                <!-- CreationDate: Thu Aug 12 23:07:16 GMT+02:00 2004 -->
            </head>
            <body leftmargin="36.0" rightmargin="36.0" topmargin="36.0" bottommargin="36.0">
                <table width="80.0%" align="Center" cellpadding="0.0" cellspacing="0.0" border="1.0">
                    <tr>
                        <td border="0.5" colspan="3">1. line 1<br />2.
                        </td>
                    </tr>
                    <tr>
                        <td border="0.5">&nbsp;
                        </td>
                        <td border="0.5">
                            <ol>
                                <li style="font-family: unknown; ">line 2.0
                                </li>
                                <li style="font-family: unknown; ">line 2.1
                                </li>
                            </ol>
                        </td>
                        <td border="0.5">&nbsp;
                        </td>
                    </tr>
                    <tr>
                        <td border="0.5" colspan="3">&nbsp;
                        </td>
                    </tr>
                    <tr>
                        <td border="0.5" colspan="3">3. 3
                        </td>
                    </tr>
                    <tr>
                        <td border="0.5" colspan="3"><br />
                        </td>
                    </tr>
                </table>
            </body>
        </html>
 
        solution 5 (the other way round):
            <html>   
            <body>
             <table>
              <tr>
               <td>
                  <ol>
                    <li>line 1</li>
                      <li>embedded list\n
                         * line 2.0\n
                         * line 2.1
                    </li>
                    <li>3</li>
                  </ol>
                </td>
               </tr>
              </table>
            </body>
            </html>
 
        in pdf - I got big spaces where the <br /> were.
 
In any case, we are using your library, and it is helping us a lot.
Thanks for your good work.
 
Be Well and Happy.
Shalom Deitch.

Reply via email to