I wanted to try PREFIX_TREE because it is supposed to be fastest on seek/reseek.
On Sat, Oct 19, 2013 at 9:12 PM, lars hofhansl <[email protected]> wrote: > I found FAST_DIFF to be the fastest of the block encoders. > (Prefix tree is in 0.96+ only as far as I know.) > > -- Lars > > > > ----- Original Message ----- > From: Vladimir Rodionov <[email protected]> > To: "[email protected]" <[email protected]>; lars hofhansl < > [email protected]> > Cc: > Sent: Saturday, October 19, 2013 9:08 PM > Subject: Re: Beware of PREFIX_TREE block encoding > > *Now, which encoder did you test specifically? I seen a 20-40% slowdown > when everything is in the blockcache (which is the worst case scenario > here), certainly not a 10x slowdown.* > > I have 1.3M rows (very small - 48 bytes) in a block cache which I read > sequentially, using encoding NONE, PREFIX_TREE and > StoreScanner/StoreFileScanner (close to metal - block cache :) > > Time to read all 1.3M rows reported in ms. > > encoding = NONE, scanner = StoreScanner; time = 300 > ms > encoding = PREFIX_TREE, scanner = StoreScanner; time = 860 ms > encoding = NONE , scanner = StoreFileScanner; time = 52 ms > encoding = PREFIX_TREE, scanner = StoreFileScanner; time = 545 ms > > -Vladimir > > > > > On Sat, Oct 19, 2013 at 8:50 PM, lars hofhansl <[email protected]> wrote: > > > That is (unfortunately) a known issue. The main problem is that HBase > > expects each KV to be backed by a contiguous byte[]. For any prefix > > encoding it is thus necessary to rematerialize the KV (i.e. copy all the > > partial bytes into a new location). > > That is inefficient. Nobody has taken on to fix this (we're 1/2 there > with > > Cells in 0.96, though). > > > > There a jiras out there to fix this like HBASE-7320 and more recently > > HBASE-9794. > > > > Now, which encoder did you test specifically? I seen a 20-40% slowdown > > when everything is in the blockcache (which is the worst case scenario > > here), certainly not a 10x slowdown. > > > > Note that with block encoding the block are stored encoded in the > > blockcache, so more data fits into the cache, and (obviously) there's > less > > IO when the data is not in the cache). So the extra work CPU cycles and > > memory bandwidth used are offset by that. > > > > There're other problems too. I just filed an issue (HBASE-9807) where > with > > block encoders we make a copy of the key portion of the KV on each > reseek, > > just to compare it the current scan key. > > > > -- Lars > > ________________________________ > > From: Vladimir Rodionov <[email protected]> > > To: "[email protected]" <[email protected]> > > Sent: Saturday, October 19, 2013 7:34 PM > > Subject: RE: Beware of PREFIX_TREE block encoding > > > > > > What I wanted to say by this? HBase still does not have block encoding > > which is optimal for both scan and seek (re-seek). > > I do not think these goals are mutually exclusive. > > > > > > Best regards, > > Vladimir Rodionov > > Principal Platform Engineer > > Carrier IQ, www.carrieriq.com > > e-mail: [email protected] > > > > ________________________________________ > > > > From: Vladimir Rodionov [[email protected]] > > Sent: Saturday, October 19, 2013 7:32 PM > > To: [email protected] > > Subject: Beware of PREFIX_TREE block encoding > > > > The scan performance is bad. 10 x slower on my tests than for blocks with > > NONE encoding. I scan data directly from block cache through > > StoreFileScanner (bypassing all StoreScanner/KeyValueHeap stuff). It > should > > be clearly stated that this encoding degrades overall performance > > significantly in favor of data size reduction and is suitable only for > Gets > > - not for Scans. > > > > Best regards, > > -Vladimir Rodionov > > > > - > > > > Confidentiality Notice: The information contained in this message, > > including any attachments hereto, may be confidential and is intended to > be > > read only by the individual or entity to whom this message is addressed. > If > > the reader of this message is not the intended recipient or an agent or > > designee of the intended recipient, please note that any review, use, > > disclosure or distribution of this message or its attachments, in any > form, > > is strictly prohibited. If you have received this message in error, > please > > immediately notify the sender and/or [email protected] and > > delete or destroy any copy of this message and its attachments. > > > >
