On Jun 15, 2012, at 11:58 PM, Alexey Tourbin wrote:

> On Fri, Jun 1, 2012 at 6:07 AM, Jeffrey Johnson <n3...@me.com> wrote:
>> I asked 2 very specific questions … the rest is quite important also,
>> but I need to understand precisely what properties set:versions have in order
>> to implement correctly (and I don't fully understand your reply).
>> 
>> Specifically:
>> 
>>        1) Is the set:versions VERSION independent of the order of the
>>        calls to rpmsetAdd()? (you know the routine as set_add())
> 
> Completely independent - you can add symbols in any order. The symbols
> are then hashed and sorted by their numeric values. The underlying
> idea is that a set-version is just a (sorted) set of numbers. You can
> add whatever symbol to it, possibly twice, the symbol will be hashed
> to a number, in a unique manner, and finally you can get the string
> representation of the set of numbers. This involves much fuss under
> the hood, but basically, you should think of the set of symbols, which
> is just the set of numbers, after each symbol has been hashed
> individually.
> 
>>        2) Can the set:versions encoding be compared for more than equality?
>>        What set/arithmetic property is the basis for the comparison? What
>>        circumstances/constraints are there related to
>>                        … You cannot always compare
>>                set-versions in terms of "greater or equal" (but when you 
>> can, it's
>>                important).
> 
> Set-versions compare as sets. There are Euler diagrams to visaulise
> set comparsion, which is an undergraduate matetrial. The idea is that,
> real numbers are linear order: you can always tell either V1<V2 or
> V1>=V2. Sets are quite another matter: you cannot always apply for
> "tertium non datur" (either lt or ge). Which is to say that sets can
> be quite different and do not compare easily. The order can be imposed
> on the sets, though, by requiring the "greater" sets to have at least
> the same elements they compare against (perhaps I'm starting to retell
> the undergraduate material, which is not going to last). To sum up,
> there IS a mathematical basis behind "Requires: foo >= set:asdf"
> dependencies.
> 
>> I can of course answer my own questions with try-and-see test cases.

Good: the above confirmation of the characteristics allows a set:versions
implementation to proceed.

What I am hearing is that set:versions is not fundamentally different from 
these transforms
        Create a Bloom filter (with acceptable false positive rate)
        Compress the Bloom filter parameters and bits.
        Encode the compression in base{16,32,62,64} for transport.

The "contained in" or subset semantic that applies to the operations "<" and 
">="
is rather easy to do as well. E.g. if (assuming on;y existence, not versioned 
inequality ranges)
        P == Bloom filter of Provides: tokens
        R == Bloom filter of a (possible) subset of tokens
then the subset computation is nothing more than
        (P & R) == R

The important difference is that Bloom filters have a false positive failure 
probability which
is likely not present in the Golomb-Rice coding in the current implementation.

I'm expecting that Bloom+Compression+Encoding is comparable in size to the 
Golomb-Rice
encoding. The only way to tell (that I know of) is to try-and-see, which I will 
likely also do in
order to tell how good the compression is with Golomb-Rice coding.

Meanwhile there's lots of usage cases for the set:versions implementation in 
RPM, not just
as a container of Elf symbols that is a de facto ABI sub-set.

73 de Jeff

> ______________________________________________________________________
> RPM Package Manager                                    http://rpm5.org
> Developer Communication List                        rpm-devel@rpm5.org

______________________________________________________________________
RPM Package Manager                                    http://rpm5.org
Developer Communication List                        rpm-devel@rpm5.org

Reply via email to