On Thu, Nov 5, 2015 at 12:18 PM, Matteo Bertozzi <[email protected]> wrote:
> what is the shutdown and restart? a clean stop? kill? > The shutdown of a ^C of the minicluster as launched out of the dev tree. The termination looked normal to be but I did see the SSH handler kick in upon restart. is the second YCSB job run on the cloned table? > Yes > > the only master operation with procedures that I see here are > - create meta & namespace table > - create the table for ycsb > - disable the table for snapshot > - maybe enable for the table after restart? > > so, 5 procedure running let say 10 step each (exaggerating) you'll have 50 > calls to poll() > then we have the SSH procedure, how many server does the cluster have? > The minicluster in master embeds a zk quorum peer, the master, and two regionservers. > maybe we keep spinning on the server actions after restart... > > That was a thought I had as well, spinning on poll of the master procedure queue after restart. > > > Matteo > > > On Thu, Nov 5, 2015 at 11:47 AM, Andrew Purtell <[email protected]> > wrote: > > > Here's what I did: > > > > 0. Launch minicluster out of dev tree with ./bin/hbase master start > > 1. Load 10M rows using YCSB > > 2. Snapshot > > 3. Shutdown and restart > > 4. Clone a test table from the snapshot > > 5. Run YCSB workload A for 2M operations > > > > The profile was captured at step 5. > > > > HTH > > > > > > > On Nov 5, 2015, at 10:40 AM, Matteo Bertozzi <[email protected]> > > wrote: > > > > > > what operation do you have going on the HMaster? > > > I did a simple test where there are no operation going on and poll > > doesn't > > > get out of the waitCond.await() > > > so unless you have operations on the master you'll never call > > queue.poll() > > > > > > or maybe, there's a bug and the previous ops where hanging and trigger > a > > > loop. > > > what did you executed on the master before? > > > do you have a script or something that I can use to reproduce this? > > > > > > Matteo > > > > > > > > > On Thu, Nov 5, 2015 at 10:25 AM, Matteo Bertozzi < > > [email protected]> > > > wrote: > > > > > >> let me take a look, but branch-1 and master should be the same on the > > proc > > >> stuff > > >> > > >> Matteo > > >> > > >> > > >>> On Thu, Nov 5, 2015 at 10:23 AM, Stack <[email protected]> wrote: > > >>> > > >>> You are looking at master process only? > > >>> > > >>> I've been looking at branch-1 only of late. > > >>> > > >>> St.Ack > > >>> > > >>> On Thu, Nov 5, 2015 at 12:06 AM, Andrew Purtell <[email protected] > > > > >>> wrote: > > >>> > > >>>> I just profiled trunk running YCSB workload A. I was looking for > > >>> something > > >>>> else but found completely unexpected results. > > >>>> > > >>>> I see 60% of all allocation pressure in TLABs coming from > > >>>> ProcedureExecutors, allocating 16 GB of > > >>>> java.util.AbstractMap$SimpleImmutableEntry in this code: > > >>>> > > >>>> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%) > > >>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot() > > 11,603 > > >>>> 18,402,380,120 100 > > >>>> java.util.concurrent.ConcurrentSkipListMap.firstEntry() 11,599 > > >>>> 18,401,678,776 99.996 > > >>>> > > >>>> > org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() > > >>>> 11,599 18,401,678,776 99.996 > > >>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() > > >>>> 11,599 18,401,678,776 99.996 > > >>>> > > >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() > > >>> 11,599 > > >>>> 18,401,678,776 99.996 > > >>>> > > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() > 11,599 > > >>>> 18,401,678,776 99.996 > > >>>> > > >>>> > > >>>> The three ProcedureExecutor threads also consume about 60% of all > CPU > > >>> time, > > >>>> each about 20%. Here are the top hot code arcs from one of them: > > >>>> > > >>>> Stack Trace Sample Count Percentage(%) > > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() > 1,521 > > >>>> 39.191 > > >>>> > org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() > > >>>> 1,521 39.191 > > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() > > >>> 1,521 > > >>>> 39.191 > > >>>> java.lang.String$CaseInsensitiveComparator.compare(String, String) > 614 > > >>>> 15.821 > > >>>> java.lang.String$CaseInsensitiveComparator.compare(Object, Object) > > >>> 614 > > >>>> 15.821 > > >>>> java.lang.String.compareToIgnoreCase(String) 614 15.821 > > >>>> org.apache.hadoop.hbase.ServerName.compareTo(ServerName) 614 > > >>>> 15.821 > > >>>> org.apache.hadoop.hbase.ServerName.compareTo(Object) 614 > > >>> 15.821 > > >>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator, > > >>>> Object, Object) 614 15.821 > > >>>> > > >>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int, > > >>>> Comparator) > > >>>> 614 15.821 > > >>>> > > >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int) 614 > > >>> 15.821 > > >>>> > > >>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) 614 > > >>> 15.821 > > >>>> > > >>>> > org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() > > >>> 614 > > >>>> 15.821 > > >>>> > > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() 614 > > >>> 15.821 > > >>>> > > >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() > > >>> 614 > > >>>> 15.821 > > >>>> > > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 614 > > >>> 15.821 > > >>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot() 468 > > >>> 12.059 > > >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int) > 468 > > >>>> 12.059 > > >>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) > > 468 > > >>>> 12.059 > > >>>> > > >>>> > org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() > > >>> 468 > > >>>> 12.059 > > >>>> > > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() > > >>>> 468 12.059 > > >>>> > > >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() > > >>> 468 > > >>>> 12.059 > > >>>> > > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 468 > > >>> 12.059 > > >>>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator, Object, > > >>> Object) > > >>>> 402 10.358 > > >>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int, > > >>>> Comparator) 280 7.215 > > >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int) > > >>> 280 > > >>>> 7.215 > > >>>> > java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) > > >>> 280 > > >>>> 7.215 > > >>>> > > >>>> > org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() > > >>> 280 > > >>>> 7.215 > > >>>> > > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() 280 > > >>> 7.215 > > >>>> > > >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() > > 280 > > >>>> 7.215 > > >>>> > > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 280 > > >>> 7.215 > > >>>> java.util.concurrent.ConcurrentSkipListMap.findPredecessor(Object, > > >>>> Comparator) 122 3.144 > > >>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object, > int, > > >>>> Comparator) 122 3.144 > > >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, > > int) > > >>>> 122 > > >>>> 3.144 > > >>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) > > >>>> 122 3.144 > > >>>> > > >>>> > org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject() > > >>> 122 > > >>>> 3.144 > > >>>> > > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll() 122 > > >>> 3.144 > > >>>> > > >>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll() > > >>> 122 > > >>>> 3.144 > > >>>> > > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 122 > > >>> 3.144 > > >>>> > > >>>> > > >>>> Just me? Someone working on ProcedureV2 should have a look at WTH > the > > >>>> ProcedureExecutors are doing. They seem to be polling the run queue > > >>>> constantly and burning up CPU. I'll try running more workloads at > the > > >>> next > > >>>> opportunity and see if I can repeat this or gain more insight. > > >>>> > > >>>> > > >>>> -- > > >>>> Best regards, > > >>>> > > >>>> - Andy > > >>>> > > >>>> Problems worthy of attack prove their worth by hitting back. - Piet > > Hein > > >>>> (via Tom White) > > >> > > >> > > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
