My notes. By the way, Matt I reviewed your change it is mostly ok, at least one trivial change is needed.
On 9/20/11 11:23 AM, "Matt Corgan" <[email protected]> wrote: >inline below: > >On Mon, Sep 19, 2011 at 10:08 PM, Stack <[email protected]> wrote: > >> Excellent summary Matt. Some notes in the below. >> >> On Sun, Sep 18, 2011 at 6:43 PM, Matt Corgan <[email protected]> >>wrote: >> > ... All of this is relatively easy for the data >> > and index blocks because they're immutable. Doing it for the >>memstore is >> > another story... >> > >> >> We'd need another data structure completely, wouldn't we? Have you >> given it any thought? >> > >i've given it some thought, but i think it can wait till after the block >format stuff is in place. a lot of the utility methods from that can be >used for the memstore implementation, and there should be some interfaces >in >place by then too. the memstore changes are easier than the block format >changes in a few regards: the fact that it's not persisted anywhere and >doesn't have to be backwards compatible. Right now, delta encoding integration aims to support that use case. There will be three options per column family: -on disk encoding -in block cache encoding -whether it should use custom seekers As long as on disk encoding is set to none. There will be no change in block format on disk. Moreover, no encoding/decoding will be used for compaction. There is a tiny penalty for encoding whenever block is read from disk, but that way it is far more flexible. Of course, this options are currently designed primary for testing & developing. We may want to hide, disable some of the configuration for end users. > >the closest published thing i've found to what i envision is this >paper<http://www.google.com/url?sa=t&source=web&cd=1&sqi=2&ved=0CBwQFjAA&u >rl=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1. >85.3498%26rep%3Drep1%26type%3Dpdf&rct=j&q=cache%20efficient%20string%20sor >ting%20with%20copying&ei=HNd4TvLQGeWqsQKt65SsDQ&usg=AFQjCNFGNRlPcu9xykCGcn >mCkn9tEhdtLg>about >a copying burstsort. it builds the memstore into a trie structure, >but where the leaves are big byte[]'s that hold many keys until they fill >up. then you split the leaves (burst them). it's cpu, memory, and >garbage >collector friendly. Hmm, interesting, but our use case is a bit different. We don't need dynamic structure . Perhaps we have something else in mind (memstore vs. HfileBlock). >> >> >> > Here's some contrived data to illustrate. Row key is a (reversed >> > domain) url. Column family is called "familyName". ColQualifier is a >> > browser type. Value is the number of views from that browser type. >> Here's >> > the current representation <http://pastebin.com/7ks8kzJ2>, the prefix >> trie's >> > toString output <http://pastebin.com/cL4AkCPC> for the row keys, and >> removing >> > the whitespace <http://pastebin.com/4qiSXNh9> from the toString >>output. >> >> Thats a big diff. Right now I do something similar. A simple algorithm which just avoid storing common prefix gives a very good compression ratio. >> >> > (random access inside HFileBlock) >> > >> >> I'd imagine that we'd not want the index always, just when keys per >> block went over some size. >> >> hfilev2 should help here because we don't load all indices all the time. >> > >i dug into HFileV2 yesterday. it's really great, solving the biggest >problem i currently face: the problem where all indexes are held in >memory. > it also addresses my point about lack of random access inside blocks by >adding a "secondary index", but only for the index blocks as far as i >could >tell. adding a secondary index to the data blocks would be great, but >it's >not that interesting to me because i'd rather go a step further and fix >the >prefix compression problem at the same time Right now, HFileV2 address that issue. As long as blocks are small enough, linear seeking inside of block shouldn't be big deal. Potentially, prefix compression is another way to solve that problem, but it is not my mine focus. > > >> > But, the prefix trie should actually be way faster than a binary >>search >> > because you don't have to keep comparing the beginning of the key to >>the >> one >> > you're looking for. I'll save the details for another email or for >>code >> > comments. In general, with a smarter block/index format we could be >>much >> > more efficient than the current method of comparing full key byte[] >>over >> and >> > over again. >> > >> > Same random access problem also applies to the block index i think >> (correct >> > me if i'm wrong here too). >> > >> >> You should stick the above in the issue Matt, or at least refer to >> this mail in the issue; its great stuff. >> > >thanks. i'll create some issues after i learn a little more from you guys It seems that we already start talking about a few different things. It would be great if we could have specific issues for them. Jacek
