On Jul 7, 2006, at 12:18 PM, Phillip J. Eby wrote: > At 06:55 AM 7/7/2006 -0400, Jim Fulton wrote: >> From a design perspective: >> >> a. screen scraping is bad > > As long as you define "screen scraping" as "dependency on visible > characteristics of HTML", then I agree. easy_install shouldn't be > relying on the visible bits of HTML that it currently uses to scope > out PyPI.
Yup > Relying on a particular URL layout is not screen-scraping, though, > and using the URL layout as part of the API has some good > properties for ease of implementation in static form. So does > using href's to obtain link information. Yes. > What we should be doing is adding non-visible markup (e.g. class="" > or rel="") information to the links to allow index creators to > direct easy_install without affecting visible page characteristics. Yes >> b. the web API should be simple and well defined. >> >> I suggest, as others have suggested, that we create an *alternate* >> web API for reading an index focussed on cleanliness and on making >> the API as easy as possible to implement for both index and client >> developers. If we agree with all of the goals stated above, I think >> this should be static HTTP interface using XHTML or some other XML >> dialect. Perhaps we could even use specific HTML class attrs to >> make it possible to combine the pypi and user interfaces if an index >> implementor desires. >> >> Thoughts? > > +1 on static pages. I don't, however, see a reason to require > valid XML. Or rather, I don't expect to implement XML parsing in > easy_install; if the spec is too complex to implement with regular > expression matching, it's probably too complex for people to throw > together an index with what's at hand. In particular, I'd like it > to be practical to put together a simple index just using Apache's > built-in directory indexes, as long as they use the right URL > hierarchy. That means that class or rel attributes should only be > required for links that are requesting non-index pages to be spidered. I would find parsing much easier with an XML parser than with regular expressions. I think it would be much more robust too. I do want to see something that is well documented and pretty easy to implement. Jim -- Jim Fulton mailto:[EMAIL PROTECTED] Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org _______________________________________________ Catalog-sig mailing list Catalog-sig@python.org http://mail.python.org/mailman/listinfo/catalog-sig