+1 I like it. HFile would only require modest modifications to support this.
-ryan On Wed, Mar 4, 2009 at 11:12 AM, stack <[email protected]> wrote: > I just chatted with Erik and I missed what he was saying altogether. His > point was that we can drop the columnqualifier length IF we know the key > overall length which I think is going to be true in near all cases. > > Chatting further (Jon and Erik just came by the house), they argue that the > native regionserver entity should be a KeyValue blob whose format is: > > keylength > valuelength > key > value > > ..where the key is then further decomposable as suggested below > (rowlength-int, familylength-int, row, family, qualifier, timestamp, type). > The blob would be carried in a ByteBuffer. > > On the way in, we'd make one of these out of the proffered row, column, > etc. > and shove it into the Memcache (Memcache would change from TreeMap to > TreeSet). Flushing would be append of this KeyValue to hfile. > > On way out, we'd pick the KeyValue blob from hfile and move this through > the > system out to the RPC. > > (One day we might put the KeyValue blob on nio if we use something other > than hadoop's RPC). > > St.Ack > > On Wed, Mar 4, 2009 at 9:06 AM, stack <[email protected]> wrote: > > > On Wed, Mar 4, 2009 at 8:19 AM, Erik Holstad <[email protected] > >wrote: > > > >> Was thinking this morning that me might have to do some adjustments in > the > >> format, > >> we wanted <int><int><int><row><fam><qf><ts><type> for the key and > >> <int><value> or > >> something like that, right? > >> But what is stored in HFile right now is if I'm not mistaken > >> <int><int><key><val>, so if > >> we want to match that I think we need to do some small adjustments > >> probably > >> to something > >> like: > >> <keyLen><valLen><rowLen><famLent><row><fam><qf><ts><type><val> > > > > > > > > I think we are saying the same thing (if your omission of columnqualifier > > length was not intentional). > > > > In hfile currently its as you say: > > > > keylength > > vallength > > key > > value > > > > where key expands to > > > > vint // Length of the row as vint > > row > > vint // Length of the column -- family + qualifier -- as vint > > column > > timestamp > > > > The proposal is that hfile is as it was only the key now exands to: > > > > int // rowlength as a short > > int // column family length in a byte > > int // column qualifier length in a short > > row > > columnfamily > > columnqualifier > > timestamp > > type > > > > St.Ack > > > > > > >
