mcvsubbu commented on issue #4362: Optimize MutableOffHeapByteArrayStore by 
directly calling the PinotDataBuffer API
URL: https://github.com/apache/incubator-pinot/pull/4362#issuecomment-506496149
 
 
   > @buchireddy Yes, the unit of the score is ms/op.
   > You are correct that batch read/write will create a buffer duplicate, 
which is extra garbage. And that is why we put a threshold so that small 
read/write don't create the duplicate as an optimization (the old 
implementation always create the buffer). The duplicate of the buffer is always 
short-lived tiny garbage, which should have negligible GC impact. The benchmark 
is testing the read/write for hundreds millions times, so the GC should already 
be included.
   
   The previous implementation created a byte buffer _once_ and used that for 
read/write. It did not do a bulk copy, but it did not generate garbage on each 
get() either.
   Remember, get() is  called during value lookup of dictionary multiple times 
(once on each hash collision). Given that the whole off-heap effort is to save 
on GC, I am wondering if that optimization is worth it. 
   
   Isn't it better to benchmark the whole dictionary operation for some 
datasets taken from production?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to