I just chatted with Erik and I missed what he was saying altogether. His point was that we can drop the columnqualifier length IF we know the key overall length which I think is going to be true in near all cases.
Chatting further (Jon and Erik just came by the house), they argue that the native regionserver entity should be a KeyValue blob whose format is: keylength valuelength key value ..where the key is then further decomposable as suggested below (rowlength-int, familylength-int, row, family, qualifier, timestamp, type). The blob would be carried in a ByteBuffer. On the way in, we'd make one of these out of the proffered row, column, etc. and shove it into the Memcache (Memcache would change from TreeMap to TreeSet). Flushing would be append of this KeyValue to hfile. On way out, we'd pick the KeyValue blob from hfile and move this through the system out to the RPC. (One day we might put the KeyValue blob on nio if we use something other than hadoop's RPC). St.Ack On Wed, Mar 4, 2009 at 9:06 AM, stack <[email protected]> wrote: > On Wed, Mar 4, 2009 at 8:19 AM, Erik Holstad <[email protected]>wrote: > >> Was thinking this morning that me might have to do some adjustments in the >> format, >> we wanted <int><int><int><row><fam><qf><ts><type> for the key and >> <int><value> or >> something like that, right? >> But what is stored in HFile right now is if I'm not mistaken >> <int><int><key><val>, so if >> we want to match that I think we need to do some small adjustments >> probably >> to something >> like: >> <keyLen><valLen><rowLen><famLent><row><fam><qf><ts><type><val> > > > > I think we are saying the same thing (if your omission of columnqualifier > length was not intentional). > > In hfile currently its as you say: > > keylength > vallength > key > value > > where key expands to > > vint // Length of the row as vint > row > vint // Length of the column -- family + qualifier -- as vint > column > timestamp > > The proposal is that hfile is as it was only the key now exands to: > > int // rowlength as a short > int // column family length in a byte > int // column qualifier length in a short > row > columnfamily > columnqualifier > timestamp > type > > St.Ack > > >
