Here's what I did: 0. Launch minicluster out of dev tree with ./bin/hbase master start 1. Load 10M rows using YCSB 2. Snapshot 3. Shutdown and restart 4. Clone a test table from the snapshot 5. Run YCSB workload A for 2M operations
The profile was captured at step 5. HTH > On Nov 5, 2015, at 10:40 AM, Matteo Bertozzi <[email protected]> wrote: > > what operation do you have going on the HMaster? > I did a simple test where there are no operation going on and poll doesn't > get out of the waitCond.await() > so unless you have operations on the master you'll never call queue.poll() > > or maybe, there's a bug and the previous ops where hanging and trigger a > loop. > what did you executed on the master before? > do you have a script or something that I can use to reproduce this? > > Matteo > > > On Thu, Nov 5, 2015 at 10:25 AM, Matteo Bertozzi <[email protected]> > wrote: > >> let me take a look, but branch-1 and master should be the same on the proc >> stuff >> >> Matteo >> >> >>> On Thu, Nov 5, 2015 at 10:23 AM, Stack <[email protected]> wrote: >>> >>> You are looking at master process only? >>> >>> I've been looking at branch-1 only of late. >>> >>> St.Ack >>> >>> On Thu, Nov 5, 2015 at 12:06 AM, Andrew Purtell <[email protected]> >>> wrote: >>> >>>> I just profiled trunk running YCSB workload A. I was looking for >>> something >>>> else but found completely unexpected results. >>>> >>>> I see 60% of all allocation pressure in TLABs coming from >>>> ProcedureExecutors, allocating 16 GB of >>>> java.util.AbstractMap$SimpleImmutableEntry in this code: >>>> >>>> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%) >>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot() 11,603 >>>> 18,402,380,120 100 >>>> java.util.concurrent.ConcurrentSkipListMap.firstEntry() 11,599 >>>> 18,401,678,776 99.996 >>>> >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() >>>> 11,599 18,401,678,776 99.996 >>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() >>>> 11,599 18,401,678,776 99.996 >>>> >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() >>> 11,599 >>>> 18,401,678,776 99.996 >>>> >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 11,599 >>>> 18,401,678,776 99.996 >>>> >>>> >>>> The three ProcedureExecutor threads also consume about 60% of all CPU >>> time, >>>> each about 20%. Here are the top hot code arcs from one of them: >>>> >>>> Stack Trace Sample Count Percentage(%) >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() 1,521 >>>> 39.191 >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() >>>> 1,521 39.191 >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() >>> 1,521 >>>> 39.191 >>>> java.lang.String$CaseInsensitiveComparator.compare(String, String) 614 >>>> 15.821 >>>> java.lang.String$CaseInsensitiveComparator.compare(Object, Object) >>> 614 >>>> 15.821 >>>> java.lang.String.compareToIgnoreCase(String) 614 15.821 >>>> org.apache.hadoop.hbase.ServerName.compareTo(ServerName) 614 >>>> 15.821 >>>> org.apache.hadoop.hbase.ServerName.compareTo(Object) 614 >>> 15.821 >>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator, >>>> Object, Object) 614 15.821 >>>> >>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int, >>>> Comparator) >>>> 614 15.821 >>>> >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int) 614 >>> 15.821 >>>> >>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) 614 >>> 15.821 >>>> >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() >>> 614 >>>> 15.821 >>>> >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() 614 >>> 15.821 >>>> >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() >>> 614 >>>> 15.821 >>>> >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 614 >>> 15.821 >>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot() 468 >>> 12.059 >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int) 468 >>>> 12.059 >>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) 468 >>>> 12.059 >>>> >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() >>> 468 >>>> 12.059 >>>> >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() >>>> 468 12.059 >>>> >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() >>> 468 >>>> 12.059 >>>> >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 468 >>> 12.059 >>>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator, Object, >>> Object) >>>> 402 10.358 >>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int, >>>> Comparator) 280 7.215 >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int) >>> 280 >>>> 7.215 >>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) >>> 280 >>>> 7.215 >>>> >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() >>> 280 >>>> 7.215 >>>> >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() 280 >>> 7.215 >>>> >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() 280 >>>> 7.215 >>>> >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 280 >>> 7.215 >>>> java.util.concurrent.ConcurrentSkipListMap.findPredecessor(Object, >>>> Comparator) 122 3.144 >>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int, >>>> Comparator) 122 3.144 >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int) >>>> 122 >>>> 3.144 >>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) >>>> 122 3.144 >>>> >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() >>> 122 >>>> 3.144 >>>> >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() 122 >>> 3.144 >>>> >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() >>> 122 >>>> 3.144 >>>> >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 122 >>> 3.144 >>>> >>>> >>>> ​Just me? Someone working on ProcedureV2 should have a look at WTH the >>>> ProcedureExecutors are doing. They seem to be polling the run queue >>>> constantly and burning up CPU. I'll try running more workloads at the >>> next >>>> opportunity and see if I can repeat this or gain more insight. >>>> >>>> >>>> -- >>>> Best regards, >>>> >>>> - Andy >>>> >>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein >>>> (via Tom White) >> >>
