Title: Mensagem
Has anyone succeed in parsing an HTML file to PDF File? I have tried to use Tiny HTML (JTidy) but it doesnt work. It produces the right (X)Html but the IText returns me an error:
 
ExceptionConverter: org.xml.sax.SAXException: String index out of range: -30
     at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:878)
     at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:900)
     at com.lowagie.text.html.HtmlParser.go(Unknown Source)
     at com.lowagie.text.html.HtmlParser.parse(Unknown Source)
     at Start.main(Start.java:90)
String index out of range: -30
This error occurs in the following HTML code and seens to be on the second line, because if I delete this line (<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">), JBuilder will return another error:
 
ExceptionConverter: org.xml.sax.SAXParseException: The entity "Ccedil" was referenced, but not declared.
     at org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:969)
     at org.apache.xerces.readers.DefaultEntityHandler.startReadingFromEntity(DefaultEntityHandler.java:596)
     at org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1315)
     at org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:380)
     at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:861)
     at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:900)
     at com.lowagie.text.html.HtmlParser.go(Unknown Source)
     at com.lowagie.text.html.HtmlParser.parse(Unknown Source)
     at Start.main(Start.java:90)
The entity "Ccedil" was referenced, but not declared.
 
And if I replace all & entities, another error occurs:   
 
ExceptionConverter: org.xml.sax.SAXParseException: The element type "p" must be terminated by the matching end-tag "</p>".   
     at org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:969)
     at org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XMLDocumentScanner.java:634)
     at org.apache.xerces.framework.XMLDocumentScanner.abortMarkup(XMLDocumentScanner.java:683)
     at org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1187)
     at org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:380)
     at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:861)
     at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:900)
     at com.lowagie.text.html.HtmlParser.go(Unknown Source)
     at com.lowagie.text.html.HtmlParser.parse(Unknown Source)
     at Start.main(Start.java:90)
 The element type "p" must be terminated by the matching end-tag "</p>".

Thanks a lot for any help!
 
Glauco
 
 
The Html code I am trying to parse:
 
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html
 xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title></title>
  </head>
 
  <body>
    <p
     style="text-align: center"><strong><img
     border="0"
     height="61"
     src="http://localhost:8080/protocolo/docs_upload/modelos/brasao_alagoas.jpg"
     width="49" /></strong></p>
 
    <p
     style="text-align: center"><strong>SECRETARIA DA FAZENDA DO
    ESTADO DE ALAGOAS<br />
    PROJETO DE MODERNIZA&Ccedil;&Atilde;O FAZEND&Aacute;RIA -
    PROMOFAZ.<br />
    UNIDADE DE CONTROLE ESTADUAL - UCE<br />
    COMPONENTE ORGANIZA&Ccedil;&Atilde;O E
    GEST&Atilde;O</strong></p>
 
    <p
     style="text-align: center"> </p>
 
    <p
     style="text-align: center"><strong> OF&Iacute;CIO  teste2
    35/2002                                                       
    Macei&oacute;, 25 de Junho de 2002</strong></p>
 
    <p
     style="text-align: center"> </p>
 
    <p
     style="text-align: left">Senhor Coordenador,</p>
 
    <p
     style="text-align: left"> </p>
 
    <p
     style="text-align: left">                      Servimo-nos do
    presente para submeter a superior considera&ccedil;&atilde;o de
    V.Sa. a solicita&ccedil;&atilde;o de di&aacute;rias que nos
    encaminha o coordenador Regional da 3&deg; CRAF  atrav&eacute;s
    do of&iacute;cio &ccedil;lkj lkj lk de klj lkj lk
    pr&oacute;ximo passado, em anexo.<br />
                           Sendo s&oacute; para o momento, 
    aproveitamos para reiterar nossos protestos da<br />
    Mais alta considera&ccedil;&atilde;o e
    apre&ccedil;o.                                                       
    .</p>
 
    <p
     style="text-align: center"><strong>Administrador do
    Protocolo<br />
    </strong>L&Iacute;DER DO COMPONENTE O&amp;G<br />
     </p>
 
    <p
     style="text-align: left">      <br />
    Ao<br />
    ILMO. SR.<br />
    MARCOS ANTONIO GARCIA<br />
    DD.Coordenador Geral da Uni&atilde;o de
    Coordena&ccedil;&atilde;o Estadual - UCE/AL.<br />
         </p>
 
    <p
     style="text-align: left">
                                                                             </p>
 
    <table
     border="1"
     cellpadding="0"
     cellspacing="0"
     style="HEIGHT: 81px; WIDTH: 638px">
      <tbody>
        <tr>
          <td>
            <p
             style="text-align: center"><strong>SEFAZ/
            AL</strong></p>
 
            <p
             style="text-align: center"><strong>" EXCEL&Ecirc;NCIA
            NA GEST&Atilde;O FAZEND&Aacute;RIA, PROPICIANDO
            MELHOR           QUALIDADE DE VIDA EM 
            ALAGOAS"</strong></p>
          </td>
        </tr>
      </tbody>
    </table>
    <br />
                                                              
    <br />
    <br />
  </body>
</html>
 

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.370 / Virus Database: 205 - Release Date: 05/06/2002

Reply via email to