Re: File search progress: database review and question on triggers

Julien Lepiller Wed, 12 Aug 2020 14:31:17 -0700

Why wouldn't it help? Can't you make it a trie from basename -> complete name? 
If I'm looking for "libcord.so" (which is a key in the trie), I don't think I 
need to look for every path. I only need to follow the trie until I find a 
pointer to some structure that contains the data I look for (ex: a list of 
complete filenames).


On 2020年8月12日 16:43:37 GMT-04:00, Pierre Neidhardt <m...@ambrevar.xyz> wrote:
>Julien Lepiller <jul...@lepiller.eu> writes:
>
>> Have you tried something more structured? I have some code for
>creating a binary search tree and even compressing/decompressing
>strings with huffman, as well as code to serialize all that (my
>deserialization is in Java though, so not very useful to you):
>https://framagit.org/nani-project/nani-website
>>
>> See modules/nani/trie.scm for instance.
>>
>> The idea is to have a binary search tree whose keys are filenames and
>value is a pointer in the file to a structure that holds data for this
>filerame (packages and versions of guix for instance). It's super fast
>to look it up, because you don't load the whole file, you only seek to
>the right position. It's a lot longer to build, and not so easy to
>update though.
>
>Assuming we'd have only 1 Guix generation per file, I'm not sure a trie
>would help because we _always_ need to search _all_ file paths, so in
>all cases we've got some 10,000+ entries to load in memory and loop
>over
>them.
>
>The total number of entries is the bottleneck here, both for the
>database load and the individual search queries.
>
>An obvious optimization is to memoize the database load.  This has a
>significant memory cost though.
>
>The trie could be helpful for multiple Guix generations in the same
>database, but I'm not sure it warrant the increased complexity.
>
>Thoughts?
>
>-- 
>Pierre Neidhardt
>https://ambrevar.xyz/

Re: File search progress: database review and question on triggers

Reply via email to