Re: [Catalog-sig] Flamenco queries

Phillip J. Eby Sun, 08 Apr 2007 20:27:55 -0700

At 05:17 AM 4/9/2007 +0200, Martin v. Löwis wrote:
> > It looks like the __init__ builds a data structure and then
> > get_matches() and list_choices() run off this structure.  It would
> > certainly be worth fixing this!
>
>I have now committed a fix in PyPI (r441) which performs the computation
>of selected entries in SQL, considerably improving browsing if
>categories are selected.
>
>If no categories are selected, browsing is still slow. This could be
>improved by caching a tally table, as the standard browse page does
>not need to report any package names (just a tally).
>
>Is there any efficient way to compute a tally in PostgreSQL on the
>fly?


Perhaps this?

select rc.trove_id, count(*)
   from releases r, release_classifiers rc
  where r.name=rc.name and r.version=rc.version
    and r._pypi_hidden=FALSE
group by rc.trove_id

I'm basing this strictly off the other query you posted and with no real 
knowledge of the schema, so I could be way off here.  But it seems like 
this should be very efficient if there is an index on (trove_id, name, 
version) in the release classifiers table and one on (name, version, 
_pypi_hidden) in the releases table.  That would allow the query to be 
executed entirely on the indexes without needing any table contents.  Of 
course, given the relatively small space of trove identifiers compared to 
releases, some other scan pattern might be equally efficient I suppose.

_______________________________________________
Catalog-sig mailing list
[email protected]
http://mail.python.org/mailman/listinfo/catalog-sig

Re: [Catalog-sig] Flamenco queries

Reply via email to