I agree that an optimized bitmap can come after the initial implementation. I'm more concerned with getting this rigged into mercurial correctly and with appropriate cache mechanics than I am about shrinking the 100k.

On 9/2/16 2:05 PM, Jun Wu wrote:
Given the fact that the plain bitmap is just 1/512 the size of changelog
revlog index (which is small) and we don't compress the latter, I won't
worry too much about the file size at this time. If we use the plain bitmap,
we can mmap it so the memory usage is easily optimized.

What we need here are a) "testing a bit" and b) "setting a bit". With a
plain bitmap, they are both O(1), ignoring file system fragmentation.

EWAH seems to bring a linear time complexity like O(size of the compressed
bitmap) for both operations, which is probably not what we want.

The roaringbitmap uses "sorted array" in 2 places, which makes insertion
(setting a bit from 0 to 1) *much* harder imo. In the worst case, you have
to "memmove" large part of the file.

If we decide to optimize file size, I'd strongly like both operations (but
especially the read) to remain O(1). Maybe use a fixed-sized first-level
index (like git packfile fanout table) for the first 8 bits, and have a
fixed-sized second-level index for the next 8 bits. And then have a plain
bitmap for the next 16 bits. Avoid sorted arrays, they make insertion
painful.

Excerpts from Bryan O'Sullivan's message of 2016-09-02 09:00:26 -0700:
I agree that it would make a lot of sense to not use a plain bitmap. The
current state of the art for compressed high-performance bitmaps is here:
https://urldefense.proofpoint.com/v2/url?u=http-3A__roaringbitmap.org_&d=DQIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=5shTFMQ7fvpISnJIrsSb2qnpiwW03Zd8gW2Ke-udef8&s=AywMSSY4J06X2Sd5CJc0LKswrdPngNYU57LWLYSspkc&e=

On Fri, Sep 2, 2016 at 8:09 AM, Augie Fackler <r...@durin42.com> wrote:

I've considered doing a bitmap index for hidden revs, but lacked the round
tuits. Have you looked at any of the more interesting compressed bitmap
storage schemes like EWAH? I know git uses that for some stuff now and it's
been very efficient both speed and space wise.

_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_mailman_listinfo_mercurial-2Ddevel&d=DQIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=5shTFMQ7fvpISnJIrsSb2qnpiwW03Zd8gW2Ke-udef8&s=zzzWKkfQeX7y17dc5vJSXKCPaQD5kMMSKlF3chAhuYc&e=

_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to