Re: Has anyone tried profiling trunk recently?

Andrew Purtell Thu, 05 Nov 2015 22:47:21 -0800

It was definitely something to do with SSH. I was able to reproduce, then
ran through this again but manually cleared away all files in WALs/* after
shutdown, and then the profile was what I'd expect.



On Thu, Nov 5, 2015 at 8:41 PM, Andrew Purtell <[email protected]> wrote:

>
>
> On Thu, Nov 5, 2015 at 12:18 PM, Matteo Bertozzi <[email protected]>
> wrote:
>
>> what is the shutdown and restart? a clean stop? kill?
>>
>
> The shutdown of a ^C of the minicluster as launched out of the dev tree.
> The termination looked normal to be but I did see the SSH handler kick in
> upon restart.
>
> is the second YCSB job run on the cloned table?
>>
>
> Yes
>
>
>
>>
>> the only master operation with procedures that I see here are
>>  - create meta & namespace table
>>  - create the table for ycsb
>>  - disable the table for snapshot
>>  - maybe enable for the table after restart?
>>
>> so, 5 procedure running let say 10 step each (exaggerating) you'll have 50
>> calls to poll()
>> then we have the SSH procedure, how many server does the cluster have?
>>
>
> The minicluster in master embeds a zk quorum peer, the master, and two
> regionservers.
>
>
>
>> maybe we keep spinning on the server actions after restart...
>>
>>
> That was a thought I had as well, spinning on poll of the master
> procedure queue after restart.
>
>
>
>>
>>
>> Matteo
>>
>>
>> On Thu, Nov 5, 2015 at 11:47 AM, Andrew Purtell <[email protected]
>> >
>> wrote:
>>
>> > Here's what I did:
>> >
>> > 0. Launch minicluster out of dev tree with ./bin/hbase master start
>> > 1. Load 10M rows using YCSB
>> > 2. Snapshot
>> > 3. Shutdown and restart
>> > 4. Clone a test table from the snapshot
>> > 5. Run YCSB workload A for 2M operations
>> >
>> > The profile was captured at step 5.
>> >
>> > HTH
>> >
>> >
>> > > On Nov 5, 2015, at 10:40 AM, Matteo Bertozzi <[email protected]
>> >
>> > wrote:
>> > >
>> > > what operation do you have going on the HMaster?
>> > > I did a simple test where there are no operation going on and poll
>> > doesn't
>> > > get out of the waitCond.await()
>> > > so unless you have operations on the master you'll never call
>> > queue.poll()
>> > >
>> > > or maybe, there's a bug and the previous ops where hanging and
>> trigger a
>> > > loop.
>> > > what did you executed on the master before?
>> > > do you have a script or something that I can use to reproduce this?
>> > >
>> > > Matteo
>> > >
>> > >
>> > > On Thu, Nov 5, 2015 at 10:25 AM, Matteo Bertozzi <
>> > [email protected]>
>> > > wrote:
>> > >
>> > >> let me take a look, but branch-1 and master should be the same on the
>> > proc
>> > >> stuff
>> > >>
>> > >> Matteo
>> > >>
>> > >>
>> > >>> On Thu, Nov 5, 2015 at 10:23 AM, Stack <[email protected]> wrote:
>> > >>>
>> > >>> You are looking at master process only?
>> > >>>
>> > >>> I've been looking at branch-1 only of late.
>> > >>>
>> > >>> St.Ack
>> > >>>
>> > >>> On Thu, Nov 5, 2015 at 12:06 AM, Andrew Purtell <
>> [email protected]>
>> > >>> wrote:
>> > >>>
>> > >>>> I just profiled trunk running YCSB workload A. I was looking for
>> > >>> something
>> > >>>> else but found completely unexpected results.
>> > >>>>
>> > >>>> I see 60% of all allocation pressure in TLABs coming from
>> > >>>> ProcedureExecutors, allocating 16 GB of
>> > >>>> java.util.AbstractMap$SimpleImmutableEntry in this code:
>> > >>>>
>> > >>>> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>> > >>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot()
>> > 11,603
>> > >>>> 18,402,380,120 100
>> > >>>>   java.util.concurrent.ConcurrentSkipListMap.firstEntry() 11,599
>> > >>>> 18,401,678,776 99.996
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
>> > >>>> 11,599 18,401,678,776 99.996
>> > >>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>> > >>>> 11,599 18,401,678,776 99.996
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>> > >>> 11,599
>> > >>>> 18,401,678,776 99.996
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
>> 11,599
>> > >>>> 18,401,678,776 99.996
>> > >>>>
>> > >>>>
>> > >>>> The three ProcedureExecutor threads also consume about 60% of all
>> CPU
>> > >>> time,
>> > >>>> each about 20%. Here are the top hot code arcs from one of them:
>> > >>>>
>> > >>>> Stack Trace Sample Count Percentage(%)
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>> 1,521
>> > >>>> 39.191
>> > >>>>
>>  org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>> > >>>> 1,521 39.191
>> > >>>>
>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
>> > >>> 1,521
>> > >>>> 39.191
>> > >>>> java.lang.String$CaseInsensitiveComparator.compare(String, String)
>> 614
>> > >>>> 15.821
>> > >>>>   java.lang.String$CaseInsensitiveComparator.compare(Object,
>> Object)
>> > >>> 614
>> > >>>> 15.821
>> > >>>>      java.lang.String.compareToIgnoreCase(String) 614 15.821
>> > >>>>         org.apache.hadoop.hbase.ServerName.compareTo(ServerName)
>> 614
>> > >>>> 15.821
>> > >>>>            org.apache.hadoop.hbase.ServerName.compareTo(Object) 614
>> > >>> 15.821
>> > >>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator,
>> > >>>> Object, Object) 614 15.821
>> > >>>>
>> > >>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int,
>> > >>>> Comparator)
>> > >>>> 614 15.821
>> > >>>>
>> > >>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int) 614
>> > >>> 15.821
>> > >>>>
>> > >>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object) 614
>> > >>> 15.821
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
>> > >>> 614
>> > >>>> 15.821
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>> 614
>> > >>> 15.821
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>> > >>> 614
>> > >>>> 15.821
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 614
>> > >>> 15.821
>> > >>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot()
>> 468
>> > >>> 12.059
>> > >>>>   java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int)
>> 468
>> > >>>> 12.059
>> > >>>>      java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
>> > 468
>> > >>>> 12.059
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
>> > >>> 468
>> > >>>> 12.059
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>> > >>>> 468 12.059
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>> > >>> 468
>> > >>>> 12.059
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 468
>> > >>> 12.059
>> > >>>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator, Object,
>> > >>> Object)
>> > >>>> 402 10.358
>> > >>>>   java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int,
>> > >>>> Comparator) 280 7.215
>> > >>>>      java.util.concurrent.ConcurrentSkipListMap.getNear(Object,
>> int)
>> > >>> 280
>> > >>>> 7.215
>> > >>>>
>>  java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
>> > >>> 280
>> > >>>> 7.215
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
>> > >>> 280
>> > >>>> 7.215
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>> 280
>> > >>> 7.215
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>> > 280
>> > >>>> 7.215
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 280
>> > >>> 7.215
>> > >>>>
>>  java.util.concurrent.ConcurrentSkipListMap.findPredecessor(Object,
>> > >>>> Comparator) 122 3.144
>> > >>>>      java.util.concurrent.ConcurrentSkipListMap.findNear(Object,
>> int,
>> > >>>> Comparator) 122 3.144
>> > >>>>         java.util.concurrent.ConcurrentSkipListMap.getNear(Object,
>> > int)
>> > >>>> 122
>> > >>>> 3.144
>> > >>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
>> > >>>> 122 3.144
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
>> > >>> 122
>> > >>>> 3.144
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>> 122
>> > >>> 3.144
>> > >>>>
>> > >>>>
>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>> > >>> 122
>> > >>>> 3.144
>> > >>>>
>> > >>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop() 122
>> > >>> 3.144
>> > >>>>
>> > >>>>
>> > >>>> Just me? Someone working on ProcedureV2 should have a look at WTH
>> the
>> > >>>> ProcedureExecutors are doing. They seem to be polling the run queue
>> > >>>> constantly and burning up CPU. I'll try running more workloads at
>> the
>> > >>> next
>> > >>>> opportunity and see if I can repeat this or gain more insight.
>> > >>>>
>> > >>>>
>> > >>>> --
>> > >>>> Best regards,
>> > >>>>
>> > >>>>   - Andy
>> > >>>>
>> > >>>> Problems worthy of attack prove their worth by hitting back. - Piet
>> > Hein
>> > >>>> (via Tom White)
>> > >>
>> > >>
>> >
>>
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Has anyone tried profiling trunk recently?

Reply via email to