Re: Size of KeyValue

Todd Lipcon Tue, 29 Nov 2011 23:26:28 -0800

On Mon, Nov 28, 2011 at 8:05 PM, Lars <[email protected]> wrote:
> Hmm, interesting. It's used (among others) in server side scanners to hold 
> the current row. Could just keep a reference to the KeyValue around instead. 
> Need to make we don't hold on to the current blocks buffer forever, though.
>


Yep - another option is for the scanner to have its own byte[] where
it copies the current row key into - since it's already doing
comparisons with it for every advance of the scanner, it should be in
L2 cache if not L1, and the copy would be minimally expensive.

-Todd

> Todd Lipcon <[email protected]> schrieb:
>
>>If I recall correctly, we put this in more for the benefit of the
>>client side, with the assumption that the server side would never call
>>this API. Then, we ended up writing some bad code somewhere in the
>>server which calls this function.
>>
>>-Todd
>>
>>On Mon, Nov 28, 2011 at 5:42 PM, lars hofhansl <[email protected]> wrote:
>>> Did some (unscientific) tests...
>>> The test scenario is 1m (wide) rows (50m cells) and then scanning along all 
>>> cells.
>>>
>>> The difference in runtime is within the noise. When I measure GC stats with 
>>> jstat I see a ~3% reduction is young collections, and a ~10% reduction in 
>>> overall GC time.
>>>
>>> At the same time I set a counting breakpoint on KeyValue.getRow that fires 
>>> when rowcache is found not-null. I found this triggered about every 16 key 
>>> values, which suggests
>>> the optimization saves a lot of copying of the row key.
>>>
>>> It is not entirely clear under what circumstances the rowCache would a be 
>>> win, outweighing the extra static memory by every KV.
>>>
>>> So it looks like it is not worth making the change, although I suppose 
>>> anything reducing GC pressure is a win.
>>>
>>> -- Lars
>>>
>>>
>>> ----- Original Message -----
>>> From: lars hofhansl <[email protected]>
>>> To: Stack <[email protected]>; "[email protected]" <[email protected]>
>>> Cc:
>>> Sent: Thursday, November 24, 2011 11:57 AM
>>> Subject: Re: Size of KeyValue
>>>
>>> Hmm... Might be hard to prove whether removing that would be a net win or 
>>> net loss in the current code base.
>>> I'll do some tests and report back.
>>>
>>>
>>>
>>> ________________________________
>>> From: Stack <[email protected]>
>>> To: [email protected]; lars hofhansl <[email protected]>
>>> Sent: Wednesday, November 23, 2011 8:41 PM
>>> Subject: Re: Size of KeyValue
>>>
>>> On Wed, Nov 23, 2011 at 3:40 PM, lars hofhansl <[email protected]> wrote:
>>>> Looking at KeyValue I see three variable purely used for caching:
>>>> timestampCache(long), rowCache(byte[]), and keyLength(int).
>>>>
>>>> From a quick glance over the code I do not see many spots where we 
>>>> repeatedly get the TS, rowKey, of keyLength from the same KV.
>>>> Together these consume 24 bytes (almost 1/2 of KeyValue's constant memory 
>>>> overhead) on every key value created, and we create
>>>> a *lot* KVs (real and "fake" ones) during scanning and seeking.
>>>>
>>>> Were these added to address specific performance concerns? If not, we 
>>>> might consider removing these.
>>>>
>>>
>>> IIRC, I added them after watching stuff in a profiler (a long time
>>> ago).  Things change.  Thats a lot of static mem to give up.
>>>
>>> St.Ack
>>>
>>
>>
>>
>>--
>>Todd Lipcon
>>Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Size of KeyValue

Reply via email to