Sam,

Le 6 déc. 2006 à 23:13, Sam Ruby a écrit :
My original interest was to write a replacement for Python's SGMLLIB, i.e., one that was not based on the theoretical ideal of how SGML vocabularies work, but one based on the practical notion of how HTML actually is parsed.

I'm not sure sgmllib would be the best target. Specifically if it's used in many other products. But maybe you are talking about a new library altogether.


    http://docs.python.org/lib/module-sgmllib.html
    8.2 sgmllib -- Simple SGML parser

This module defines a class SGMLParser which serves as the basis for
    parsing text files formatted in SGML (Standard Generalized Mark-up
Language). In fact, it does not provide a full SGML parser -- it only parses SGML insofar as it is used by HTML, and the module only exists as a base for the htmllib module. Another HTML parser which supports
    XHTML and offers a somewhat different interface is available in the
    HTMLParser module.

It seems a better candidate.

    http://docs.python.org/lib/module-HTMLParser.html
    8.1 HTMLParser -- Simple HTML and XHTML parser

     New in version 2.2.

This module defines a class HTMLParser which serves as the basis for parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML. Unlike the parser in htmllib, this parser is not based on the
    SGML parser in sgmllib.


I'm adding them to the list of HTML parsers.
http://esw.w3.org/topic/HTMLAsSheAreSpoke




--
Karl Dubost - http://www.w3.org/People/karl/
W3C Conformance Manager, QA Activity Lead
  QA Weblog - http://www.w3.org/QA/
     *** Be Strict To Be Cool ***



Reply via email to