[ https://issues.apache.org/jira/browse/HBASE-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107757#comment-15107757 ]
Hudson commented on HBASE-14058: -------------------------------- SUCCESS: Integrated in HBase-1.2-IT #401 (See [https://builds.apache.org/job/HBase-1.2-IT/401/]) HBASE-14058 Stabilizing default heap memory tuner (eclark: rev e738e69f8cc59581a454207483aca42e7f314396) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultHeapMemoryTuner.java > Stabilizing default heap memory tuner > ------------------------------------- > > Key: HBASE-14058 > URL: https://issues.apache.org/jira/browse/HBASE-14058 > Project: HBase > Issue Type: Improvement > Components: regionserver > Affects Versions: 2.0.0, 1.2.0, 1.3.0 > Reporter: Abhilash > Assignee: Abhilash > Fix For: 2.0.0, 1.3.0 > > Attachments: 0001-Stabilizing-default-heap-memory-tuner.patch, > HBASE-14058-v1.patch, HBASE-14058.patch, after_modifications.png, > before_modifications.png > > > The memory tuner works well in general cases but when we have a work load > that is both read heavy as well as write heavy the tuner does too many > tuning. We should try to control the number of tuner operation and stabilize > it. The main problem was that the tuner thinks it is in steady state even if > it sees just one neutral tuner period thus does too many tuning operations > and too many reverts that too with large step sizes(step size was set to > maximum even after one neutral period). So to stop this I have thought of > these steps: > 1) The division created by μ + δ/2 and μ - δ/2 is too small. Statistically > ~62% periods will lie outside this range, which means 62% of the data points > are considered either high or low which is too much. Use μ + δ*0.8 and μ - > δ*0.8 instead. On expectations it will decrease number of tuner operations > per 100 periods from 19 to just 10. If we use δ/2 then 31% of data values > will be considered to be high and 31% will be considered to be low (2*0.31 * > 0.31 = 0.19), on the other hand if we use δ*0.8 then 22% will be low and 22% > will be high(2*0.22*0.22 ~ 0.10). > 2) Defining proper steady state by looking at past few periods(it is equal to > hbase.regionserver.heapmemory.autotuner.lookup.periods) rather than just last > tuner operation. We say tuner is in steady state when last few tuner periods > were NEUTRAL. We keep decreasing step size unless it is extremely low. Then > leave system in that state for some time. > 3) Rather then decreasing step size only while reverting, decrease the > magnitude of step size whenever we are trying to revert tuning done in last > few periods(sum the changes of last few periods and compare to current step) > rather than just looking at last period. When its magnitude gets too low then > make tuner steps NEUTRAL(no operation). This will cause step size to > continuously decrease unless we reach steady state. After that tuning process > will restart (tuner step size rests again when we reach steady state). > 4) The tuning done in last few periods will be decaying sum of past tuner > steps with sign. This parameter will be positive for increase in memstore and > negative for increase in block cache. Rather than using arithmetic mean we > use this to give more priority to recent tuner steps. > Please see the attachments. One represents the size of memstore(green) and > size of block cache(blue) adjusted by tuner without these modification and > other with the above modifications. The x-axis is time axis and y-axis is the > fraction of heap memory available to memstore and block cache at that time(it > always sums up to 80%). I configured min/max ranges for both components to > 0.1 and 0.7 respectively(so in the plots the y-axis min and max is 0.1 and > 0.7). In both cases the tuner tries to distribute memory by giving ~15% to > memstore and ~65% to block cache. But the modified one does it much more > smoothly. > I got these results from YCSB test. The test was doing approximately 5000 > inserts and 500 reads per second (for one region server). The results can be > further fine tuned and number of tuner operation can be reduced with these > changes in configuration. > For more fine tuning: > a) lower max step size (suggested = 4%) > b) lower min step size ( default if also fine ) > To further decrease frequency of tuning operations: > c) increase the number of lookup periods ( in the tests it was just 10, > default is 60 ) > d) increase tuner period ( in the tests it was just 20 secs, default is > 60secs) > I used smaller tuner period/ number of look up periods to get more data > points. -- This message was sent by Atlassian JIRA (v6.3.4#6332)