----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/32183/#review76862 -----------------------------------------------------------
ambari-server/src/main/resources/common-services/AMBARI_METRICS/0.1.0/package/scripts/params.py <https://reviews.apache.org/r/32183/#comment124547> Safer to also do .lower() on values. ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py <https://reviews.apache.org/r/32183/#comment124549> Use pythonic variable name. ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py <https://reviews.apache.org/r/32183/#comment124548> Use pythonic function names with lowercase_and_underscore - Alejandro Fernandez On March 18, 2015, 12:58 a.m., Sid Wagle wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/32183/ > ----------------------------------------------------------- > > (Updated March 18, 2015, 12:58 a.m.) > > > Review request for Ambari, Alejandro Fernandez, Dmytro Sen, Mahadev Konar, > and Myroslav Papirkovskyy. > > > Bugs: AMBARI-10111 > https://issues.apache.org/jira/browse/AMBARI-10111 > > > Repository: ambari > > > Description > ------- > > After looking at HBase logs and GC logs, discovered that: > 1. RS stopped due to unreachable ZK, the cause based on logs is long gc pause: > {code} > 2015-03-17 15:34:33,810 WARN > [M:0;perf-400-10mar-b-ams-1:48488.oldLogCleaner] util.Sleeper: We slept > 78463ms instead of 60000ms, this is likely due to a long garbage collecting > pause and it's usually bad, see > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > 2015-03-17 15:34:33,810 WARN > [RS:0;perf-400-10mar-b-ams-1:38900.compactionChecker] util.Sleeper: We slept > 26609ms instead of 10000ms, this is likely due to a long garbage collecting > pause and it's usually bad, see > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > 2015-03-17 15:34:33,810 WARN > [RS:0;perf-400-10mar-b-ams-1:38900.periodicFlusher] util.Sleeper: We slept > 26608ms instead of 10000ms, this is likely due to a long garbage collecting > pause and it's usually bad, see > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > {code} > > 2. Looking at gc logs, it seems everything is nice and clean and a sudden > unexpected event as seen here: > {code} > 2015-03-17T15:34:10.908+0000: 47483.945: [GC2015-03-17T15:34:10.908+0000: > 47483.945: [ParNew: 613440K->68096K(613440K), 0.0645970 secs] > 10961428K->10444812K(12514816K), 0.0648180 secs] [Times: user=0.34 sys=0.01, > real=0.06 secs] > 2015-03-17T15:34:11.525+0000: 47484.562: [GC2015-03-17T15:34:11.525+0000: > 47484.562: [ParNew: 613424K->68096K(613440K), 0.0585420 secs] > 10956776K->10438129K(12514816K), 0.0590000 secs] [Times: user=0.31 sys=0.02, > real=0.06 secs] > 2015-03-17T15:34:12.133+0000: 47485.170: [GC2015-03-17T15:34:12.134+0000: > 47485.170: [ParNew (promotion failed): 613346K->613346K(613440K), 0.1492450 > secs]2015-03-17T15:34:12.283+0000: 47485.320: > [CMS2015-03-17T15:34:28.879+0000: 47501.916: [CMS-concurrent-sweep: > 33.271/35.389 secs] [Times: user=144.69 sys=3.76, real=35.39 secs] > (concurrent mode failure): 10365250K->4753155K(11901376K), 21.5240950 secs] > 10955450K->4753155K(12514816K), [CMS Perm : 41519K->41519K(69388K)], > 21.6736450 secs] [Times: user=21.93 sys=0.02, real=21.67 secs] > 2015-03-17T15:34:34.584+0000: 47507.621: [GC2015-03-17T15:34:34.584+0000: > 47507.621: [ParNew: 545344K->59138K(613440K), 0.0552630 secs] > 5298499K->4812293K(12514816K), 0.0554990 secs] [Times: user=0.32 sys=0.02, > real=0.06 secs] > {code} > > The possible reason is GCE VM paused for more that a minute leading to the > unexpected event. > > *Recommendations*: > - Tune GC settings for the HBase daemon. > - Create a Canary script to check health of the system and restart the > underlying daemons if needed. Creating a Jira for 2.1. > > > Diffs > ----- > > > ambari-server/src/main/resources/common-services/AMBARI_METRICS/0.1.0/configuration/ams-hbase-env.xml > 88692a4 > > ambari-server/src/main/resources/common-services/AMBARI_METRICS/0.1.0/package/scripts/hbase.py > 8c8f85a > > ambari-server/src/main/resources/common-services/AMBARI_METRICS/0.1.0/package/scripts/params.py > ab8b56a > ambari-server/src/main/resources/stacks/HDP/2.0.6/services/stack_advisor.py > a933d98 > ambari-server/src/test/python/stacks/2.2/common/test_stack_advisor.py > 33bac19 > > Diff: https://reviews.apache.org/r/32183/diff/ > > > Testing > ------- > > Manually verified. Unit tests pass. > > > Thanks, > > Sid Wagle > >
