Re: [gentoo-portage-dev] search functionality in emerge

tvali Sun, 23 Nov 2008 16:53:08 -0800

There is daemon, which notices about filesystem changes -
http://pyinotify.sourceforge.net/ would be a good choice.


In case many different applications use portage tree directly without using
any portage API (which is a bad choice, I think, and should be deprecated),
then there is a kind of "hack" - using
http://www.freenet.org.nz/python/lufs-python/ to create a new filesystem
(damn now I would like to have some time to join this game). I hope it's
possible to build it everywhere where gentoo should work, but it'n no
problem if it's not - you can implement it in such way that it's not needed.
I totally agree, that filesystem is a bottleneck, but this suffix trie would
check for directories first, I guess. Now, having this custom filesystem,
which actually serves portage tree like some odd API, you can have backwards
compability and still create your own thing.

Having such classes (numbers show implementation order; this is not
specified here if proxies are abstract classes, base classes or smth. other,
just it shows some relations between some imaginary objects):

   - *1. PortageTreeApi* - Proxy for different portage trees on FS or SQL or
   other.
   - *2. PortageTreeCachedApi *- same, as previous, but contains boosted
   memory cache. It should be able to save it's state, which is simply writing
   it's inner variables into file.
   - *3. PortageTreeDaemon *- has interface compatible with PortageTreeAPI,
   this daemon serves portage tree to PortageTreeFS and portage tree itself. In
   reality, it should be base class of *PortageTreeApi* and *
   PortageTreeCachedApi* so that they could be directly used as daemons.
   When cached API is used as daemon, it should be able to check filesystem
   changes - thus, implementations should contain change trigger callbacks.
   - *4. PortageTreeFS *- filesystem, which can be used to map any of those
   to filesystem. Connectable with PortageTreeApi or PortageTreeDaemon. This
   creates filesystems, which can be used for backwards-compability. This
   cannot be used on architectures, which dont implement lufs-python or analog.
   - *6. PortageTreeServer *- server, which serves data from
   PortageTreeDaemon, PortageTreeCachedApi or PortageTreeApi to some other
   computer.
   - Implementations can be proxied through *PortageTreeApi*, *
   PortageTreeCachedApi* or *PortageTreeDaemon*.
      - *5. PortageTreeImplementationAsSqlDb*
      - *1. PortageTreeImplementationAsFilesystem*
      - *3. PortageTreeImplementationAsDaemon* - client, actually.
      - *6. PortageTreeImplementationAsServer* - client, too.

So, *1* - creating PortageTreeApi and PortageTreeImplementationAsFilesystem
is pure refactoring task, at first. Then, adding more advanced functions to
PortageTreeApi is basically refactoring, too. PortageTreeApi should not
become too complex or contain any advanced tasks, which are not purely
db-specific, so some common baseclass could implement more high-level
things.
Then, *2* - this is finishing your schoolwork, but not yet in most powerful
way as we are having only index then, and first search is still slow. At
beginning this cache is unable to provide data about changes in portage tree
(which could be implemented by some versioning after this new api is only
place to update it), so it should have index update command and be only used
in search.
Then, *3* - having portage tree daemon means that things can really be
cached now and this cache can be kept in memory; also it means updates on
filesystem changes.
Then, *4* - having PortageTreeFS means that now you can easily implement
portage tree on faster medium without losing backwards-compability.
Now, *5* - implementation as SQL DB is logical as SQL is standardized and
common language for creating fast databases.
Eventually, *6* - this has really nothing to do with boosting search, but in
fast network it could still boost emerge by removing need for emerge --sync
for local networks.

I think that then it would be considered to have synchronization also in
those classes - CachedApi almost needs it to be faster with server-client
connections. After that, ImplementationAsSync and ImplementationAsWebRsSync
could be added and sync server built onto this daemon. As I really doubt
that emerge --sync is currently also ultraslow - I see no meaning in waiting
a long time to get few new items as currently seems to happen -, it would
boost another life-critical part of portage.

So, hope that helps a bit - have luck!

2008/11/23 René 'Necoro' Neumann <[EMAIL PROTECTED]>

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Mike Auty schrieb:
> >     Finally there are overlays, and since these can change outside of an
> > "emerge --sync" (as indeed can the main tree), you'll have to reindex
> > these before each search request, or give the user stale data until they
> > manually reindex.
>
> Determining whether there has been a change to the ebuild system is a
> major point in the whole thing. What does a great index serves you, if
> it does not notice the changes the user made in his own local overlay?
> :) Manually re-indexing is not a good choice I think...
>
> If somebody comes up here with a good (and fast) solution, this would be
> a nice thing ;) (need it myself).
>
> Regards,
> René
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkkp0kAACgkQ4UOg/zhYFuAhTACfYDxNeQQG6dysgU5TrNEZGOiH
> 3CoAn2wV6g8/8uj+T99cxJGdQBxTtZjI
> =2I2j
> -----END PGP SIGNATURE-----
>
>


-- 
tvali

Kuskilt foorumist: http://www.cooltests.com - kui inglise keelt oskad.
Muide, üle 120 oled väga tark, üle 140 oled geenius, mingi 170 oled ju mingi
täica pea nagu prügikast...

Re: [gentoo-portage-dev] search functionality in emerge

Reply via email to