Hmm, interesting. It's used (among others) in server side scanners to hold the 
current row. Could just keep a reference to the KeyValue around instead. Need 
to make we don't hold on to the current blocks buffer forever, though. 

Todd Lipcon <[email protected]> schrieb:

>If I recall correctly, we put this in more for the benefit of the
>client side, with the assumption that the server side would never call
>this API. Then, we ended up writing some bad code somewhere in the
>server which calls this function.
>
>-Todd
>
>On Mon, Nov 28, 2011 at 5:42 PM, lars hofhansl <[email protected]> wrote:
>> Did some (unscientific) tests...
>> The test scenario is 1m (wide) rows (50m cells) and then scanning along all 
>> cells.
>>
>> The difference in runtime is within the noise. When I measure GC stats with 
>> jstat I see a ~3% reduction is young collections, and a ~10% reduction in 
>> overall GC time.
>>
>> At the same time I set a counting breakpoint on KeyValue.getRow that fires 
>> when rowcache is found not-null. I found this triggered about every 16 key 
>> values, which suggests
>> the optimization saves a lot of copying of the row key.
>>
>> It is not entirely clear under what circumstances the rowCache would a be 
>> win, outweighing the extra static memory by every KV.
>>
>> So it looks like it is not worth making the change, although I suppose 
>> anything reducing GC pressure is a win.
>>
>> -- Lars
>>
>>
>> ----- Original Message -----
>> From: lars hofhansl <[email protected]>
>> To: Stack <[email protected]>; "[email protected]" <[email protected]>
>> Cc:
>> Sent: Thursday, November 24, 2011 11:57 AM
>> Subject: Re: Size of KeyValue
>>
>> Hmm... Might be hard to prove whether removing that would be a net win or 
>> net loss in the current code base.
>> I'll do some tests and report back.
>>
>>
>>
>> ________________________________
>> From: Stack <[email protected]>
>> To: [email protected]; lars hofhansl <[email protected]>
>> Sent: Wednesday, November 23, 2011 8:41 PM
>> Subject: Re: Size of KeyValue
>>
>> On Wed, Nov 23, 2011 at 3:40 PM, lars hofhansl <[email protected]> wrote:
>>> Looking at KeyValue I see three variable purely used for caching:
>>> timestampCache(long), rowCache(byte[]), and keyLength(int).
>>>
>>> From a quick glance over the code I do not see many spots where we 
>>> repeatedly get the TS, rowKey, of keyLength from the same KV.
>>> Together these consume 24 bytes (almost 1/2 of KeyValue's constant memory 
>>> overhead) on every key value created, and we create
>>> a *lot* KVs (real and "fake" ones) during scanning and seeking.
>>>
>>> Were these added to address specific performance concerns? If not, we might 
>>> consider removing these.
>>>
>>
>> IIRC, I added them after watching stuff in a profiler (a long time
>> ago).  Things change.  Thats a lot of static mem to give up.
>>
>> St.Ack
>>
>
>
>
>-- 
>Todd Lipcon
>Software Engineer, Cloudera

Reply via email to