what is the shutdown and restart? a clean stop? kill? is the second YCSB job run on the cloned table?
the only master operation with procedures that I see here are - create meta & namespace table - create the table for ycsb - disable the table for snapshot - maybe enable for the table after restart? so, 5 procedure running let say 10 step each (exaggerating) you'll have 50 calls to poll() then we have the SSH procedure, how many server does the cluster have? maybe we keep spinning on the server actions after restart... Matteo On Thu, Nov 5, 2015 at 11:47 AM, Andrew Purtell <[email protected]> wrote: > Here's what I did: > > 0. Launch minicluster out of dev tree with ./bin/hbase master start > 1. Load 10M rows using YCSB > 2. Snapshot > 3. Shutdown and restart > 4. Clone a test table from the snapshot > 5. Run YCSB workload A for 2M operations > > The profile was captured at step 5. > > HTH > > > > On Nov 5, 2015, at 10:40 AM, Matteo Bertozzi <[email protected]> > wrote: > > > > what operation do you have going on the HMaster? > > I did a simple test where there are no operation going on and poll > doesn't > > get out of the waitCond.await() > > so unless you have operations on the master you'll never call > queue.poll() > > > > or maybe, there's a bug and the previous ops where hanging and trigger a > > loop. > > what did you executed on the master before? > > do you have a script or something that I can use to reproduce this? > > > > Matteo > > > > > > On Thu, Nov 5, 2015 at 10:25 AM, Matteo Bertozzi < > [email protected]> > > wrote: > > > >> let me take a look, but branch-1 and master should be the same on the > proc > >> stuff > >> > >> Matteo > >> > >> > >>> On Thu, Nov 5, 2015 at 10:23 AM, Stack <[email protected]> wrote: > >>> > >>> You are looking at master process only? > >>> > >>> I've been looking at branch-1 only of late. > >>> > >>> St.Ack > >>> > >>> On Thu, Nov 5, 2015 at 12:06 AM, Andrew Purtell <[email protected]> > >>> wrote: > >>> > >>>> I just profiled trunk running YCSB workload A. I was looking for > >>> something > >>>> else but found completely unexpected results. > >>>> > >>>> I see 60% of all allocation pressure in TLABs coming from > >>>> ProcedureExecutors, allocating 16 GB of > >>>> java.util.AbstractMap$SimpleImmutableEntry in this code: > >>>> > >>>> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%) > >>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot() > 11,603 > >>>> 18,402,380,120 100 > >>>> java.util.concurrent.ConcurrentSkipListMap.firstEntry() 11,599 > >>>> 18,401,678,776 99.996 > >>>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() > >>>> 11,599 18,401,678,776 99.996 > >>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() > >>>> 11,599 18,401,678,776 99.996 > >>>> > >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() > >>> 11,599 > >>>> 18,401,678,776 99.996 > >>>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 11,599 > >>>> 18,401,678,776 99.996 > >>>> > >>>> > >>>> The three ProcedureExecutor threads also consume about 60% of all CPU > >>> time, > >>>> each about 20%. Here are the top hot code arcs from one of them: > >>>> > >>>> Stack Trace Sample Count Percentage(%) > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() 1,521 > >>>> 39.191 > >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() > >>>> 1,521 39.191 > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() > >>> 1,521 > >>>> 39.191 > >>>> java.lang.String$CaseInsensitiveComparator.compare(String, String) 614 > >>>> 15.821 > >>>> java.lang.String$CaseInsensitiveComparator.compare(Object, Object) > >>> 614 > >>>> 15.821 > >>>> java.lang.String.compareToIgnoreCase(String) 614 15.821 > >>>> org.apache.hadoop.hbase.ServerName.compareTo(ServerName) 614 > >>>> 15.821 > >>>> org.apache.hadoop.hbase.ServerName.compareTo(Object) 614 > >>> 15.821 > >>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator, > >>>> Object, Object) 614 15.821 > >>>> > >>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int, > >>>> Comparator) > >>>> 614 15.821 > >>>> > >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int) 614 > >>> 15.821 > >>>> > >>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) 614 > >>> 15.821 > >>>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() > >>> 614 > >>>> 15.821 > >>>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() 614 > >>> 15.821 > >>>> > >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() > >>> 614 > >>>> 15.821 > >>>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 614 > >>> 15.821 > >>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot() 468 > >>> 12.059 > >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int) 468 > >>>> 12.059 > >>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) > 468 > >>>> 12.059 > >>>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() > >>> 468 > >>>> 12.059 > >>>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() > >>>> 468 12.059 > >>>> > >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() > >>> 468 > >>>> 12.059 > >>>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 468 > >>> 12.059 > >>>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator, Object, > >>> Object) > >>>> 402 10.358 > >>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int, > >>>> Comparator) 280 7.215 > >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int) > >>> 280 > >>>> 7.215 > >>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) > >>> 280 > >>>> 7.215 > >>>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() > >>> 280 > >>>> 7.215 > >>>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() 280 > >>> 7.215 > >>>> > >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() > 280 > >>>> 7.215 > >>>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 280 > >>> 7.215 > >>>> java.util.concurrent.ConcurrentSkipListMap.findPredecessor(Object, > >>>> Comparator) 122 3.144 > >>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int, > >>>> Comparator) 122 3.144 > >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, > int) > >>>> 122 > >>>> 3.144 > >>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) > >>>> 122 3.144 > >>>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() > >>> 122 > >>>> 3.144 > >>>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() 122 > >>> 3.144 > >>>> > >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() > >>> 122 > >>>> 3.144 > >>>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 122 > >>> 3.144 > >>>> > >>>> > >>>> ​Just me? Someone working on ProcedureV2 should have a look at WTH the > >>>> ProcedureExecutors are doing. They seem to be polling the run queue > >>>> constantly and burning up CPU. I'll try running more workloads at the > >>> next > >>>> opportunity and see if I can repeat this or gain more insight. > >>>> > >>>> > >>>> -- > >>>> Best regards, > >>>> > >>>> - Andy > >>>> > >>>> Problems worthy of attack prove their worth by hitting back. - Piet > Hein > >>>> (via Tom White) > >> > >> >
