On Fri, Feb 17, 2017 at 10:30 AM, Jun Wu <qu...@fb.com> wrote: > Excerpts from Stanislau Hlebik's message of 2017-02-17 11:24:34 +0000: > > As I said before we will load all non-public revs in one set and all > > The problem is, loading a Python set from disk is O(size-of-the-set). > > Bitmap's loading cost should be basically 0 (with mmap). I think that's why > we want bitmap at the first place. There are other choices like packfile > index, hash tables, but bitmap is the simplest and most efficient. >
Hey folks, I haven't yet seen mention of some considerations that seem very important in driving the decision-making, so I'd appreciate it if someone could fill me in. Firstly, what's our current understanding of the sizes and compositions of these sets of numbers? In theory, we have a lot of data from practical application at Facebook, but nobody's brought it up. To clarify why this matters, if an obsstore contains say 100 entries, then a very naive implementation should be fine. If it's related to something else, such as the size of, amount of activity in, or local age of a repo, then maybe we have a different set of decisions to make. Secondly, what operations are being performed on these sets today, and do we know if those operations are going to change? For instance, if a may be extended via new elements added to its end, then an mmap-based implementation immediately becomes awkward. If a set is traversed or we perform operations like union or intersection, then a bitmap might make sense if the set is very small and dense, but not if it's big and sparse. (Jun's idea that mmap is effectively free is also not borne out in practice, but that's a second-order issue here.) So tell us what the needed API is, what the characteristics of the data are, and that'll help understand how to make a good decision.
_______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel