On 02/14/2017 02:04 AM, Sean Farley wrote:
Jun Wu <qu...@fb.com> writes:

In general, I think this is a good direction. Some random thoughts:

  - general purposed

    I think the bitmap is not always a cache, so it should only have
    operations like set/unset/readfromdisk/writetodisk. Practically, I won't
    couple cache invalidation with the bitmap implementation.

    In additional, I'll try to avoid using Python-only types in the
    interface. So once we decide to rewrite part of the implementation in
    native C, we won't have trouble.

    See "revset" below for a possibility that bitmap is used as a non-set.

  - revset

    This is a possibility that probably won't happen any time soon.

    The revset currently uses Python set for maintaining its state. For huge
    sets, Python sets may not be a good option. And various operations could
    benefit from an always-topologically-sorted set, which is the bitmap.

  - mmap

    My intuition is that bitmaps fit better with mmap which can reduce the
    reading loading cost. I think "vfs.mmapread" could be a thing, and
    various places can benefit from it - Gabor recently showed interest in
    loading revlog data by mmap, I had patches that uses mmap to read revlog
    index.

In additional, not directly related to this series, I'm a big fan of
single direction data flow. But the current code base does not seem to do a
good job in this area. As we are adding more caching layers to the code
base, it'd be nice if we have some tiny framework maintaining the dependency
of all kinds of data, to be able to understand the data flow easily, and
just to be more confident about loading orders. I think people more
experienced on architecture may want to share some ideas here.

I was thinking about a more high-level approach (please feel free to
pick apart):

r = repo.filtered("bitmap1")
r2 = r.filtered("bitmap2")

So that r2 would be an intersection of bitmap1 and bitmap2 (haven't
thought about a union nor the inverse).


This double filtering idea is interresting. maybe we could have the 'repoview' API understant repo.filtered("foo+bar") as a combination of filtering of foo+bar. The smart part of repoview (eg: filter hierarchy for cache inheritance, cache key, etc) should be able to automatically compute what do to for a combinaison.

Exposing the bitmap at that level seems strange. I think it is better to have the internal implementation of the filtering rely on a bitmat than to have the repository/repoview API to expose bitmap directly.

Cheers,

--
Pierre-Yves David
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to