Hiya Emma,
Good luck on your project. A couple of things to be weary of are disk
I/O, metadata cache backends and overlays.
Disk I/O can be a significant bottleneck. Loading up a lot of files
from disk (be it the metadata cache or whatever) can take a long time
initially, but then be cached in RAM and so be much faster to access in
the future.
Portage allows for its internal metadata cache to be stored in a
variety of formats, as long as there's a backend to support it. This
means simple speedups can be achieved using cdb or sqlite (if you google
these and portage you'll get gentoo-wiki tips, which unfortunately
you'll have to read from google's cache at the moment). It also means
that if you want to make use of this metadata from within portage,
you'll have to rely on the API to tell the backend to get you all the
data (and it may be difficult to speed up without writing your own backend).
Finally there are overlays, and since these can change outside of an
"emerge --sync" (as indeed can the main tree), you'll have to reindex
these before each search request, or give the user stale data until they
manually reindex.
If you're interesting in implementing this in python, you may be
interested in another package manager that can handle the main tree,
also implemented in python, called pkgcore. From what I understand,
it's a similar code-base to portage, but its internal architecture may
have changed a lot.
I hope some of that helps, and isn't off putting. I look forward to
seeing the results! 5:)
Mike 5:)