At 07:33 AM 7/10/2007 +0200, Martin v. Löwis wrote: > > Yes, in order to find the correct spelling for a package's name. If a > > user types, say "pylons" when the package is listed on PyPI as "Pylons", > > setuptools looks at the root after the lookup of /pypi/pylons fails. > >I don't understand. How does it help to look at /pypi in this case?
It doesn't. It looks at /pypi/ (note the trailing /) -- which lists all packages. >The right spelling of Pylons is not listed there, unless there was >a release of Pylons recently. > >If you want to correct the spelling, you need to look at > >http://cheeseshop.python.org/pypi?%3Aaction=index Which is also spelled /pypi/ - the advantage of this is that a purely static index consisting of Apache directory indexes produces an equally useful result for setuptools. > > A case-insensitive match by safe_name would be ideal, and could also be > > used to prevent users from registering packages whose names differ only > > by case or punctuation. > >Would it be acceptable to do an HTTP redirect in that case, ie. >redirect /pypi/pylons/0.9.5 to /pypi/Pylons/0.9.5? Yes, although setuptoools at the moment looks at /pypi/pylons/ (again, with a trailing /) and does not go to individual version pages unless the base page contains only links to individual version pages. It will handle a redirect correctly, as far as interpreting relative links on result pages. > I would not >want to have multiple URLs to render the same page, in general >(I know it already does that in some cases). > >I can see how lower-casing helps; I'm doubtful about replacing >spaces. I.e. why is it better to look for > >python-ftp-server-library--pyftpdlib- That '--' would actually just be one '-' >than > >Python FTP server library (pyftpdlib) It's not much better, however, there are a lot of packages with shorter names for which it does help. Mainly, though, setuptools just uses this for purposes of determining distribution filenames. >IOW, if you have a mis-spelling of the latter, what are the >chances that it is so misspelled that the safe_name is still >the former? Shouldn't the package owner just correct the >package name, to pyftpdlib, and put the other string into >the summary? > >In any case, if it where postgres 8.1 or later, I could simply do > >select name from packages where >regexp_replace(lower(name),'[^a-z0-9.]','-')='gnosis-utilities'; > >to do the lookup; with 7.4, I would have to download all names >and do the safe matching myself. I think this will work instead: select name from packages where name ~* 'gnosis[^a-z0-9.]+utilities' i.e., replace all '-' in the safe_name() with the appropriate regex. '~*' is the case-insensitive regular expression match operator, according to: http://www.postgresql.org/docs/7.4/interactive/functions-matching.html Of course, it may also suffice to do: select lower(name) from packages where name like 'gnosis_%utilities' i.e. replace all '-' in the safe_name with '_%', which is sort of like '.+' in a regex. You would still have to postprocess the result to catch the difference between say, "gnosis-utilities" and "gnosis3utilities" or some such, but there should be very few such matches. The "like" query may be easier for postgres to use an index on - an expression index on lower(name) would do the trick. Of course, I'm used to trying to optimize much larger databases than PyPI - with only a few thousand entries, a non-index query here may be just fine. In any case, this query should also be used to check for uniqueness when adding packages. _______________________________________________ Catalog-SIG mailing list [email protected] http://mail.python.org/mailman/listinfo/catalog-sig
