At 05:17 AM 4/9/2007 +0200, Martin v. Löwis wrote:
> > It looks like the __init__ builds a data structure and then
> > get_matches() and list_choices() run off this structure. It would
> > certainly be worth fixing this!
>
>I have now committed a fix in PyPI (r441) which performs the computation
>of selected entries in SQL, considerably improving browsing if
>categories are selected.
>
>If no categories are selected, browsing is still slow. This could be
>improved by caching a tally table, as the standard browse page does
>not need to report any package names (just a tally).
>
>Is there any efficient way to compute a tally in PostgreSQL on the
>fly?
Perhaps this?
select rc.trove_id, count(*)
from releases r, release_classifiers rc
where r.name=rc.name and r.version=rc.version
and r._pypi_hidden=FALSE
group by rc.trove_id
I'm basing this strictly off the other query you posted and with no real
knowledge of the schema, so I could be way off here. But it seems like
this should be very efficient if there is an index on (trove_id, name,
version) in the release classifiers table and one on (name, version,
_pypi_hidden) in the releases table. That would allow the query to be
executed entirely on the indexes without needing any table contents. Of
course, given the relatively small space of trove identifiers compared to
releases, some other scan pattern might be equally efficient I suppose.
_______________________________________________
Catalog-sig mailing list
[email protected]
http://mail.python.org/mailman/listinfo/catalog-sig