On 7/7/06, Jim Fulton <[EMAIL PROTECTED]> wrote: > > +1 on static pages. I don't, however, see a reason to require > > valid XML. Or rather, I don't expect to implement XML parsing in > > easy_install; if the spec is too complex to implement with regular > > expression matching, it's probably too complex for people to throw > > together an index with what's at hand. In particular, I'd like it > > to be practical to put together a simple index just using Apache's > > built-in directory indexes, as long as they use the right URL > > hierarchy. That means that class or rel attributes should only be > > required for links that are requesting non-index pages to be spidered. > > I would find parsing much easier with an XML parser than with > regular expressions. > I think it would be much more robust too.
XHTML would be best, though I agree we shouldn't care about validity so much as just well-formedness (which is required). I think it should be possible to do it with valid XHTML, though, since whether that's desired or not is a python.org policy concern. (Not that I suspect we'll ever really care about that.) Of course, it should be possible to parse with htmllib and HTMLParser as well. -Fred -- Fred L. Drake, Jr. <fdrake at gmail.com> "Every sin is the result of a collaboration." --Lucius Annaeus Seneca _______________________________________________ Catalog-sig mailing list Catalog-sig@python.org http://mail.python.org/mailman/listinfo/catalog-sig