On Tue, 6 Jun 2017 18:17:49 -0700, Jun Wu wrote: > Excerpts from Yuya Nishihara's message of 2017-06-07 00:53:09 +0900: > > > +class radixlink(object): > > > + """multimap. keys are variable-length bytes, values are lists of > > > uint32s. > > > > Do we really need to support variable-length key? IIUC, that's the main > > reason > > why both radix-entry and leaf-entry have link-offset field. > > For linkrev use-case, we might need to store paths.
Store path as a key? What I meant is "b) Do not allow key1 to be a prefix of key2." > > > + link-entry := next-link-offset (4B) + value (4B) > > Nit: I slightly prefer "link-entry := value + next-link-offset" for > > consistency with the other entries. > > (mentioned below that we might want to support multiple types in link-entry) Got why you made a separate entry for value node, thanks. > revlog.c does a really well job in terms of both performance and space > usage. It uses int[16] for an index entry. Stores +index or -rev. And does > not have a separate leaf format. Yea, it's crazily space efficient. > We might want to just dump indexObject->nt, if we put endian-ness in the > file name. > > (some format changing ideas mentioned below) > > > Suppose our data structure is at least 8-byte aligned, we have 3bits from > > LSB which can be (ab)used as flags (as glibc malloc does.) > > Space-wise, the leaf-entry padding is the main issue. Removing that might > cut the index file size in half. > > If we want to further reduce the size, possible changes are: > > a) Do not store full key, use int32. This makes the interface more > complex - people have to provide a function to convert int32 to a key. > b) Do not allow key1 to be a prefix of key2. This removes "link-offset" in > radix entry. > c) Support different kinds of values other than a linked list, like just > an int32 representing something. > > "a" will make us closer to revlog.c but it requires some low-level changes > to work with obsstore (that I don't want to do it now for fm1 obsstore). > > "b" seems a weird limitation. But may be worthy to catch up with revlog.c. > > "c" could be implemented later. "link" and "index" are relatively separate. I was thinking that the size of the obsstore index would be quite big, and reading/writing/jumping-around-in-memory it would be somewhat costly. If you have an idea to mitigate these costs, shrinking the cache size wouldn't be important. > > [...] > > Nit: I believe we'll soon need "hg debugradixlink" command. How about > > splitting > > the module into core radixlink + pure impl + cext impl? > > The radixlink type needs to be in C to make overhead minimal (so __getitem__ > won't need a hash lookup). I think it might be fine to duplicate struct > definitions in the debug command. I meant mercurial/radixlink.py could import pure/ attributes to implement dumpradixlink(). _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel