(maillist maintainer/jonas: I wrote a similar message from a non-subscribed email addr. It can be discarded, sorry)
I needed a html parser, and am not in a hurry, so I decided to check FPC's own first, in the hope that I can at least make some documentation in the wiki /examples during the experience. The first project is simple, see program below, executed on FPC's html documentation. I noticed that it failed like this: An unhandled exception occurred at $004284EC : EDOMError : EDOMError in DOMDocument.CreateElement hr/0 $004284EC $00411A86 THTMLTODOMCONVERTER__READERSTARTELEMENT, line 500 of src/sax_html.pp $0042648A TSAXREADER__DOSTARTELEMENT, line 738 of src/sax.pp $004110DC THTMLREADER__ENTERNEWSCANNERCONTEXT, line 391 of src/sax_html.pp $00410C80 THTMLREADER__PARSE, line 358 of src/sax_html.pp $0042612C TSAXREADER__PARSESTREAM, line 647 of src/sax.pp $00411F3D READHTMLFILE, line 609 of src/sax_html.pp $00411E91 READHTMLFILE, line 593 of src/sax_html.pp $004015DE main, line 21 of saxattempt.dpr Some debugging seems that it fails on <hr/>, doctype of the doc in question is <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> Some questions for the more xmlable: 1. is this correct? I think <hr/> is more xml notation than html notation? 2. can I somehow convince (override) DOM to accept it? (since modifying the generator (tex4ht) might prove to be difficult). It could be genera 3. Is there a way to have line numbers in the exceptions? Modifying the source with writeln's to find out which tag exactly goes wrong is a bit ugly. Note that I'm already happy with pointers where to start. Anybody willing to share private examples or documentation would be great too. program saxattempt; {$mode delphi} Uses Sax_HTML,sysutils,classes,dom_html; var d:TSearchRec; sx : THTMLDocument; Htmls: TStringList; begin htmls:=TStringList.create; if findfirst('*.html',faanyfile,d)=0 then begin repeat writeln(d.name); sx:=THtmlDocument.create; ReadHtmlFile(sx,d.name); htmls.addobject(d.name,sx); until findnext(d)<>0; findclose(d); end; end. _______________________________________________ fpc-devel maillist - [email protected] http://lists.freepascal.org/mailman/listinfo/fpc-devel
