Excerpts from Jun Wu's message of 2017-02-17 19:14:12 -0800: > Excerpts from Bryan O'Sullivan's message of 2017-02-17 13:29:58 -0800: > > On Fri, Feb 17, 2017 at 10:30 AM, Jun Wu <qu...@fb.com> wrote: > > > > > Excerpts from Stanislau Hlebik's message of 2017-02-17 11:24:34 +0000: > > > > As I said before we will load all non-public revs in one set and all > > > > > > The problem is, loading a Python set from disk is O(size-of-the-set). > > > > > > Bitmap's loading cost should be basically 0 (with mmap). I think that's > > > why > > > we want bitmap at the first place. There are other choices like packfile > > > index, hash tables, but bitmap is the simplest and most efficient. > > > > > > > Hey folks, > > > > I haven't yet seen mention of some considerations that seem very important > > in driving the decision-making, so I'd appreciate it if someone could fill > > me in. > > > > Firstly, what's our current understanding of the sizes and compositions of > > these sets of numbers? In theory, we have a lot of data from practical > > application at Facebook, but nobody's brought it up. > > > > To clarify why this matters, if an obsstore contains say 100 entries, then > > a very naive implementation should be fine. If it's related to something > > else, such as the size of, amount of activity in, or local age of a repo, > > then maybe we have a different set of decisions to make. > > First, I agree that if N is small, O(N) or O(log(N)), or O(1), do not matter > much in practice. I tend to always reason about future-proof solutions, > which may or may not be a good habit - I'll adjust in the future. > > I think there are multiple topics being discussed: > > 1. How to solve the overhead loading hiddenrevs with minimal changes? > 2. Why is the bitmap format interesting (future use-cases)? > > For 1, > > I once tried to make one bit of revlog flag as the source of truth of > "hidden", which looks a nice solution. But it got rejected because revlog > has to be append-only. Moving the flag bits out formed the bitmap idea > naturally. > > Bitmap fits here but it does not mean we have to use it to solve this > particular problem. For example, stateful chg will get us some easy wins, > and see the problem analysis and proposed solution below. > > Note that we actually have a simple caching format for the "hiddenrevs". > See repoview.py, computehidden. > > But what's wrong with "computehidden" is the choice of the cache key - it > calls "hideablerevs", which takes 0.5 seconds in obsolete.getrevs(repo, > 'obsolete'), which reads the giant obsstore (in hg-committed) and do some > complex computation. The result is then hashed and used as a cache key of > "computehidden". > > So, if we do not want to know the "obsolete()" revisions, but just want to > know what's hidden. Changing the cache key would be a very straightforward > and effective solution which does not require a new file format.
Hm, changing the cache key to smth like phaseroots + obsstore + changelog mtime + size may be an easy and good win. That's a good idea. Although it will still get invalidated after each write. > > That'll help commands like "hg st", "hg log -T foo", etc. But it won't > help complex "log"s which do require the "obsolete()" revisions. To speed > up the latter, we can probably just copy the caching infrastructure from > repoview.py. > > For 2, > > len(filteredrevs) in my hg-committed is 2155. It's small. I tend to think > about solutions that scale longer and are not too complex to build. That > may or may not be a good habit. > > That said, I agree if the set is small and perf, then whatever easier to > implement makes sense. > > I have some crazy ideas about the future where a giant bitmap is > practically useful. For example, source controlling "visible heads" and > the obsstore, then adding some C code to quickly fill a bitmap (requires > "set" and "setancestors/descendants"), then we can show the user whenever > their repo used to look like back in any time by using the bitmap filter. > That'll help investigate hard issues, and implement "hg undo". > > > Secondly, what operations are being performed on these sets today, and do > > we know if those operations are going to change? > > > > For instance, if a may be extended via new elements added to its end, then > > an mmap-based implementation immediately becomes awkward. If a set is > > Not that bad if we preallocate space. > > > traversed or we perform operations like union or intersection, then a > > bitmap might make sense if the set is very small and dense, but not if it's > > big and sparse. > > Not necessarily true if we use an advanced variant, like the "roaringbitmap" > mentioned by you last year [1]. > > [1]: > https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-September/087864.html > > > (Jun's idea that mmap is effectively free is also not borne out in > > practice, but that's a second-order issue here.) > > > > So tell us what the needed API is, what the characteristics of the data > > are, and that'll help understand how to make a good decision. _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel