Hmm, interesting. It's used (among others) in server side scanners to hold the current row. Could just keep a reference to the KeyValue around instead. Need to make we don't hold on to the current blocks buffer forever, though.
Todd Lipcon <[email protected]> schrieb: >If I recall correctly, we put this in more for the benefit of the >client side, with the assumption that the server side would never call >this API. Then, we ended up writing some bad code somewhere in the >server which calls this function. > >-Todd > >On Mon, Nov 28, 2011 at 5:42 PM, lars hofhansl <[email protected]> wrote: >> Did some (unscientific) tests... >> The test scenario is 1m (wide) rows (50m cells) and then scanning along all >> cells. >> >> The difference in runtime is within the noise. When I measure GC stats with >> jstat I see a ~3% reduction is young collections, and a ~10% reduction in >> overall GC time. >> >> At the same time I set a counting breakpoint on KeyValue.getRow that fires >> when rowcache is found not-null. I found this triggered about every 16 key >> values, which suggests >> the optimization saves a lot of copying of the row key. >> >> It is not entirely clear under what circumstances the rowCache would a be >> win, outweighing the extra static memory by every KV. >> >> So it looks like it is not worth making the change, although I suppose >> anything reducing GC pressure is a win. >> >> -- Lars >> >> >> ----- Original Message ----- >> From: lars hofhansl <[email protected]> >> To: Stack <[email protected]>; "[email protected]" <[email protected]> >> Cc: >> Sent: Thursday, November 24, 2011 11:57 AM >> Subject: Re: Size of KeyValue >> >> Hmm... Might be hard to prove whether removing that would be a net win or >> net loss in the current code base. >> I'll do some tests and report back. >> >> >> >> ________________________________ >> From: Stack <[email protected]> >> To: [email protected]; lars hofhansl <[email protected]> >> Sent: Wednesday, November 23, 2011 8:41 PM >> Subject: Re: Size of KeyValue >> >> On Wed, Nov 23, 2011 at 3:40 PM, lars hofhansl <[email protected]> wrote: >>> Looking at KeyValue I see three variable purely used for caching: >>> timestampCache(long), rowCache(byte[]), and keyLength(int). >>> >>> From a quick glance over the code I do not see many spots where we >>> repeatedly get the TS, rowKey, of keyLength from the same KV. >>> Together these consume 24 bytes (almost 1/2 of KeyValue's constant memory >>> overhead) on every key value created, and we create >>> a *lot* KVs (real and "fake" ones) during scanning and seeking. >>> >>> Were these added to address specific performance concerns? If not, we might >>> consider removing these. >>> >> >> IIRC, I added them after watching stuff in a profiler (a long time >> ago). Things change. Thats a lot of static mem to give up. >> >> St.Ack >> > > > >-- >Todd Lipcon >Software Engineer, Cloudera
