At 01:03 AM 7/7/2006 +1000, [EMAIL PROTECTED] wrote: > > Phillip J. Eby <[EMAIL PROTECTED]> wrote: > > Why not? ;) > >That was actually what I was afraid the reasoning was ;) > >I guess I just go all wobbly in the knees at the thought of having to >maintain a "screen scraping" interface.
You don't need to -- at least not in the long term. Once setuptools 0.7 supports the XML-RPC interface, it won't need the other scraping tricks to read PyPI. Those would be left in for people who are creating their own package indexes, not constraining further development of PyPI itself. Please keep in mind that easy_install makes *extremely* minimal assumptions about PyPI's interface: 1. It assumes that baseURL/projectname will get to the current version of projectname, or a page with a list of projectname's active versions 2. It assumes that links within PyPI of the form baseURL/something1/something2 are links to version 'something2' of a project named 'something1' 3. It assumes that going to baseURL directly will result in a page with links to all available packages in the form described in #2. 4. It assumes that if baseURL/projectname returns a page containing the text "Index of Packages</title>", it is a list of links of the form described in #2. 5. It looks for and follows the first links following the strings "<th>Home Page" and "<th>Download URL" in a project page. 6. It makes assumptions about how to find MD5 data on a PyPI page, but if it fails to do so, it simply won't check the MD5 of downloads. Also note that even with an XML-RPC interface, easy_install will *still* need to read an HTML page to gather links, because it's valid for people to provide links in their long_description using reStructuredText. It's just that assumptions 1, 3, and 4 (and maybe 5) would not be necessary. Also note that in a pinch, you can put the strings easy_install is looking for inside HTML comments. Easy_install really isn't that bright. ;) However, if you can provide *all* of this data via the API (including an html-formatted long description), then the screen scraping can go away as far as PyPI is concerned. >Funnily enough, Johannes Gisjbers, Andrew Dalke and I were talking about >this very issue last night. I proposed that we detect the user-agent of >the setuptools client, and in response send back really minimalist HTML >(no surrounding page template). Probably overkill, but this may have been >after we'd had beer :) There's a simpler solution that could be implemented: adding a 'rel="easy-install"' attribute to links that easy_install should follow. Currently, those links are the project's home page URL, download URL, and the links to specific versions that show up when you go to a project that has multiple active versions. Adding it to these, and *only* these links would give easy_install enough information to do the right thing. However, support would have to wait for setuptools 0.7 anyhow, so there's little reason to do this. Hm. I just tried to make multiple versions of PEAK active, and it seems like you can't get the page that lists multiple versions any more. No wonder some people have been having problems downloading older versions of certain packages. :( How are people supposed to get to older package versions now? That is, what's the point of being able to have multiple active versions if you can't find them? Is this an intended change, or a bug? >Could you provide a clear list of all the specific changes you wish for us >to make at the Sprint? I've provided a list above of what changes I want you *not* to make. How's that? ;) > > Nonetheless, there are various aspects of easy_install's behavior and > > performance that could be significantly improved by using XML-RPC, so I > > definitely want it to do that in 0.7. I'm just wary of removing the > > existing behavior until it's clear that it's unnecessary for it to. > >Oh - another thing that occurred to me -- does setuptools auto update itself? What do you mean? You can run "easy_install -u setuptools" to upgrade to the latest release at any time. But it doesn't go out looking for updates on its own. _______________________________________________ Catalog-sig mailing list Catalog-sig@python.org http://mail.python.org/mailman/listinfo/catalog-sig