[GitHub] [accumulo] EdColeman commented on issue #1664: Investigate optimal / configurable batch sizes for accumulo-gc deletion candidates

GitBox Thu, 30 Jul 2020 15:52:43 -0700


EdColeman commented on issue #1664:
URL: https://github.com/apache/accumulo/issues/1664#issuecomment-666759157



   I may be just adding more words and Christopher's comment has it covered, 
but in case this helps.
   It would be interesting to know some of the ratios when you run different 
timing scenarios - like number of candidates vs the number of deletes.  
   I think the thing we are trying to gauge is if there is a bottle neck in the 
Accumulo gc processor and do different batch sizes have any impact on where / 
when a bottle neck could be triggered? Something that could help determine 
that, is there a timing difference when there is a large number of candidate 
and few deletes vs a large number of candidates and lots of deletes?  The first 
case may provide insight into the Accumulo overhead, while the second could be 
dominated by hdfs.
   The overall goal is to determine if there is are situations where Accumulo 
could get into a state where it just cannot keep up with deletes and would fall 
further and further behind. If that can be shown, then what are the triggering 
conditions and where are the bottle necks, and does batch size have any impact?
   Some of this could be simulated, but there also needs to measurements that 
include hdfs interactions. If it can be shown that the Accumulo gc process is 
never a bottle neck regardless of batch size, then we know to focus on hdfs 
interactions for additional performance improvements. If there are times where 
the gc process is dominating the gc cycle, then what is the performance 
difference with different batch sizes?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [accumulo] EdColeman commented on issue #1664: Investigate optimal / configurable batch sizes for accumulo-gc deletion candidates

Reply via email to