*Now, which encoder did you test specifically? I seen a 20-40% slowdown when everything is in the blockcache (which is the worst case scenario here), certainly not a 10x slowdown.*
I have 1.3M rows (very small - 48 bytes) in a block cache which I read sequentially, using encoding NONE, PREFIX_TREE and StoreScanner/StoreFileScanner (close to metal - block cache :) Time to read all 1.3M rows reported in ms. encoding = NONE, scanner = StoreScanner; time = 300 ms encoding = PREFIX_TREE, scanner = StoreScanner; time = 860 ms encoding = NONE , scanner = StoreFileScanner; time = 52 ms encoding = PREFIX_TREE, scanner = StoreFileScanner; time = 545 ms -Vladimir On Sat, Oct 19, 2013 at 8:50 PM, lars hofhansl <[email protected]> wrote: > That is (unfortunately) a known issue. The main problem is that HBase > expects each KV to be backed by a contiguous byte[]. For any prefix > encoding it is thus necessary to rematerialize the KV (i.e. copy all the > partial bytes into a new location). > That is inefficient. Nobody has taken on to fix this (we're 1/2 there with > Cells in 0.96, though). > > There a jiras out there to fix this like HBASE-7320 and more recently > HBASE-9794. > > Now, which encoder did you test specifically? I seen a 20-40% slowdown > when everything is in the blockcache (which is the worst case scenario > here), certainly not a 10x slowdown. > > Note that with block encoding the block are stored encoded in the > blockcache, so more data fits into the cache, and (obviously) there's less > IO when the data is not in the cache). So the extra work CPU cycles and > memory bandwidth used are offset by that. > > There're other problems too. I just filed an issue (HBASE-9807) where with > block encoders we make a copy of the key portion of the KV on each reseek, > just to compare it the current scan key. > > -- Lars > ________________________________ > From: Vladimir Rodionov <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Saturday, October 19, 2013 7:34 PM > Subject: RE: Beware of PREFIX_TREE block encoding > > > What I wanted to say by this? HBase still does not have block encoding > which is optimal for both scan and seek (re-seek). > I do not think these goals are mutually exclusive. > > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: [email protected] > > ________________________________________ > > From: Vladimir Rodionov [[email protected]] > Sent: Saturday, October 19, 2013 7:32 PM > To: [email protected] > Subject: Beware of PREFIX_TREE block encoding > > The scan performance is bad. 10 x slower on my tests than for blocks with > NONE encoding. I scan data directly from block cache through > StoreFileScanner (bypassing all StoreScanner/KeyValueHeap stuff). It should > be clearly stated that this encoding degrades overall performance > significantly in favor of data size reduction and is suitable only for Gets > - not for Scans. > > Best regards, > -Vladimir Rodionov > > - > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or [email protected] and > delete or destroy any copy of this message and its attachments. >
