On Mon, Nov 28, 2011 at 8:05 PM, Lars <[email protected]> wrote: > Hmm, interesting. It's used (among others) in server side scanners to hold > the current row. Could just keep a reference to the KeyValue around instead. > Need to make we don't hold on to the current blocks buffer forever, though. >
Yep - another option is for the scanner to have its own byte[] where it copies the current row key into - since it's already doing comparisons with it for every advance of the scanner, it should be in L2 cache if not L1, and the copy would be minimally expensive. -Todd > Todd Lipcon <[email protected]> schrieb: > >>If I recall correctly, we put this in more for the benefit of the >>client side, with the assumption that the server side would never call >>this API. Then, we ended up writing some bad code somewhere in the >>server which calls this function. >> >>-Todd >> >>On Mon, Nov 28, 2011 at 5:42 PM, lars hofhansl <[email protected]> wrote: >>> Did some (unscientific) tests... >>> The test scenario is 1m (wide) rows (50m cells) and then scanning along all >>> cells. >>> >>> The difference in runtime is within the noise. When I measure GC stats with >>> jstat I see a ~3% reduction is young collections, and a ~10% reduction in >>> overall GC time. >>> >>> At the same time I set a counting breakpoint on KeyValue.getRow that fires >>> when rowcache is found not-null. I found this triggered about every 16 key >>> values, which suggests >>> the optimization saves a lot of copying of the row key. >>> >>> It is not entirely clear under what circumstances the rowCache would a be >>> win, outweighing the extra static memory by every KV. >>> >>> So it looks like it is not worth making the change, although I suppose >>> anything reducing GC pressure is a win. >>> >>> -- Lars >>> >>> >>> ----- Original Message ----- >>> From: lars hofhansl <[email protected]> >>> To: Stack <[email protected]>; "[email protected]" <[email protected]> >>> Cc: >>> Sent: Thursday, November 24, 2011 11:57 AM >>> Subject: Re: Size of KeyValue >>> >>> Hmm... Might be hard to prove whether removing that would be a net win or >>> net loss in the current code base. >>> I'll do some tests and report back. >>> >>> >>> >>> ________________________________ >>> From: Stack <[email protected]> >>> To: [email protected]; lars hofhansl <[email protected]> >>> Sent: Wednesday, November 23, 2011 8:41 PM >>> Subject: Re: Size of KeyValue >>> >>> On Wed, Nov 23, 2011 at 3:40 PM, lars hofhansl <[email protected]> wrote: >>>> Looking at KeyValue I see three variable purely used for caching: >>>> timestampCache(long), rowCache(byte[]), and keyLength(int). >>>> >>>> From a quick glance over the code I do not see many spots where we >>>> repeatedly get the TS, rowKey, of keyLength from the same KV. >>>> Together these consume 24 bytes (almost 1/2 of KeyValue's constant memory >>>> overhead) on every key value created, and we create >>>> a *lot* KVs (real and "fake" ones) during scanning and seeking. >>>> >>>> Were these added to address specific performance concerns? If not, we >>>> might consider removing these. >>>> >>> >>> IIRC, I added them after watching stuff in a profiler (a long time >>> ago). Things change. Thats a lot of static mem to give up. >>> >>> St.Ack >>> >> >> >> >>-- >>Todd Lipcon >>Software Engineer, Cloudera > -- Todd Lipcon Software Engineer, Cloudera
