Re: Has anyone tried profiling trunk recently?

Andrew Purtell Fri, 06 Nov 2015 08:01:36 -0800

Thanks. I do not know if it was a burst. When I get a chance I'll try to 
reproduce this again with a profiler attached to the process.


> On Nov 5, 2015, at 11:25 PM, Matteo Bertozzi <[email protected]> wrote:
> 
> do you know if that is just a burst on startup while the SSH is running, or
> it is going on forever?
> in theory it should be just on startup when the SSH is running, that is by
> implementation and I'm working on completing the procedure scheduler to
> remove that spin on startup.
> 
> if it is going on forever that is a bug, that i'm still unable to
> reproduce.
> 
> 
>> On Thu, Nov 5, 2015 at 10:45 PM, Andrew Purtell <[email protected]> wrote:
>> 
>> It was definitely something to do with SSH. I was able to reproduce, then
>> ran through this again but manually cleared away all files in WALs/* after
>> shutdown, and then the profile was what I'd expect.
>> 
>> 
>> On Thu, Nov 5, 2015 at 8:41 PM, Andrew Purtell <[email protected]>
>> wrote:
>> 
>>> 
>>> 
>>> On Thu, Nov 5, 2015 at 12:18 PM, Matteo Bertozzi <
>> [email protected]>
>>> wrote:
>>> 
>>>> what is the shutdown and restart? a clean stop? kill?
>>> 
>>> The shutdown of a ^C of the minicluster as launched out of the dev tree.
>>> The termination looked normal to be but I did see the SSH handler kick in
>>> upon restart.
>>> 
>>> is the second YCSB job run on the cloned table?
>>> 
>>> Yes
>>> 
>>> 
>>> 
>>>> 
>>>> the only master operation with procedures that I see here are
>>>> - create meta & namespace table
>>>> - create the table for ycsb
>>>> - disable the table for snapshot
>>>> - maybe enable for the table after restart?
>>>> 
>>>> so, 5 procedure running let say 10 step each (exaggerating) you'll have
>> 50
>>>> calls to poll()
>>>> then we have the SSH procedure, how many server does the cluster have?
>>> 
>>> The minicluster in master embeds a zk quorum peer, the master, and two
>>> regionservers.
>>> 
>>> 
>>> 
>>>> maybe we keep spinning on the server actions after restart...
>>> That was a thought I had as well, spinning on poll of the master
>>> procedure queue after restart.
>>> 
>>> 
>>> 
>>>> 
>>>> 
>>>> Matteo
>>>> 
>>>> 
>>>> On Thu, Nov 5, 2015 at 11:47 AM, Andrew Purtell <
>> [email protected]
>>>> wrote:
>>>> 
>>>>> Here's what I did:
>>>>> 
>>>>> 0. Launch minicluster out of dev tree with ./bin/hbase master start
>>>>> 1. Load 10M rows using YCSB
>>>>> 2. Snapshot
>>>>> 3. Shutdown and restart
>>>>> 4. Clone a test table from the snapshot
>>>>> 5. Run YCSB workload A for 2M operations
>>>>> 
>>>>> The profile was captured at step 5.
>>>>> 
>>>>> HTH
>>>>> 
>>>>> 
>>>>>> On Nov 5, 2015, at 10:40 AM, Matteo Bertozzi <
>> [email protected]
>>>>> 
>>>>> wrote:
>>>>>> 
>>>>>> what operation do you have going on the HMaster?
>>>>>> I did a simple test where there are no operation going on and poll
>>>>> doesn't
>>>>>> get out of the waitCond.await()
>>>>>> so unless you have operations on the master you'll never call
>>>>> queue.poll()
>>>>>> 
>>>>>> or maybe, there's a bug and the previous ops where hanging and
>>>> trigger a
>>>>>> loop.
>>>>>> what did you executed on the master before?
>>>>>> do you have a script or something that I can use to reproduce this?
>>>>>> 
>>>>>> Matteo
>>>>>> 
>>>>>> 
>>>>>> On Thu, Nov 5, 2015 at 10:25 AM, Matteo Bertozzi <
>>>>> [email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> let me take a look, but branch-1 and master should be the same on
>> the
>>>>> proc
>>>>>>> stuff
>>>>>>> 
>>>>>>> Matteo
>>>>>>> 
>>>>>>> 
>>>>>>>> On Thu, Nov 5, 2015 at 10:23 AM, Stack <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> You are looking at master process only?
>>>>>>>> 
>>>>>>>> I've been looking at branch-1 only of late.
>>>>>>>> 
>>>>>>>> St.Ack
>>>>>>>> 
>>>>>>>> On Thu, Nov 5, 2015 at 12:06 AM, Andrew Purtell <
>>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I just profiled trunk running YCSB workload A. I was looking for
>>>>>>>> something
>>>>>>>>> else but found completely unexpected results.
>>>>>>>>> 
>>>>>>>>> I see 60% of all allocation pressure in TLABs coming from
>>>>>>>>> ProcedureExecutors, allocating 16 GB of
>>>>>>>>> java.util.AbstractMap$SimpleImmutableEntry in this code:
>>>>>>>>> 
>>>>>>>>> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>>>>>>>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot()
>>>>> 11,603
>>>>>>>>> 18,402,380,120 100
>>>>>>>>>  java.util.concurrent.ConcurrentSkipListMap.firstEntry() 11,599
>>>>>>>>> 18,401,678,776 99.996
>>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
>>>>>>>>> 11,599 18,401,678,776 99.996
>>>>>>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>>>>>>>>> 11,599 18,401,678,776 99.996
>>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>>>>>>>> 11,599
>>>>>>>>> 18,401,678,776 99.996
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
>>>> 11,599
>>>>>>>>> 18,401,678,776 99.996
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> The three ProcedureExecutor threads also consume about 60% of all
>>>> CPU
>>>>>>>> time,
>>>>>>>>> each about 20%. Here are the top hot code arcs from one of them:
>>>>>>>>> 
>>>>>>>>> Stack Trace Sample Count Percentage(%)
>>>>>>>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>>>> 1,521
>>>>>>>>> 39.191
>>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>>>>>>>>> 1,521 39.191
>>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
>>>>>>>> 1,521
>>>>>>>>> 39.191
>>>>>>>>> java.lang.String$CaseInsensitiveComparator.compare(String,
>> String)
>>>> 614
>>>>>>>>> 15.821
>>>>>>>>>  java.lang.String$CaseInsensitiveComparator.compare(Object,
>>>> Object)
>>>>>>>> 614
>>>>>>>>> 15.821
>>>>>>>>>     java.lang.String.compareToIgnoreCase(String) 614 15.821
>>>>>>>>>        org.apache.hadoop.hbase.ServerName.compareTo(ServerName)
>>>> 614
>>>>>>>>> 15.821
>>>>>>>>>           org.apache.hadoop.hbase.ServerName.compareTo(Object)
>> 614
>>>>>>>> 15.821
>>>>>>>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator,
>>>>>>>>> Object, Object) 614 15.821
>>>>>>>>> 
>>>>>>>>> java.util.concurrent.ConcurrentSkipListMap.findNear(Object, int,
>>>>>>>>> Comparator)
>>>>>>>>> 614 15.821
>>>>>>>>> 
>>>>>>>>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int)
>> 614
>>>>>>>> 15.821
>>>>>>>>> 
>>>>>>>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
>> 614
>>>>>>>> 15.821
>>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
>>>>>>>> 614
>>>>>>>>> 15.821
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>>>> 614
>>>>>>>> 15.821
>>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>>>>>>>> 614
>>>>>>>>> 15.821
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
>> 614
>>>>>>>> 15.821
>>>>>>>>> java.util.concurrent.ConcurrentSkipListMap$Node.createSnapshot()
>>>> 468
>>>>>>>> 12.059
>>>>>>>>>  java.util.concurrent.ConcurrentSkipListMap.getNear(Object, int)
>>>> 468
>>>>>>>>> 12.059
>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
>>>>> 468
>>>>>>>>> 12.059
>>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
>>>>>>>> 468
>>>>>>>>> 12.059
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>>>>>>>>> 468 12.059
>>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>>>>>>>> 468
>>>>>>>>> 12.059
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
>> 468
>>>>>>>> 12.059
>>>>>>>>> java.util.concurrent.ConcurrentSkipListMap.cpr(Comparator,
>> Object,
>>>>>>>> Object)
>>>>>>>>> 402 10.358
>>>>>>>>>  java.util.concurrent.ConcurrentSkipListMap.findNear(Object,
>> int,
>>>>>>>>> Comparator) 280 7.215
>>>>>>>>>     java.util.concurrent.ConcurrentSkipListMap.getNear(Object,
>>>> int)
>>>>>>>> 280
>>>>>>>>> 7.215
>>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
>>>>>>>> 280
>>>>>>>>> 7.215
>>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
>>>>>>>> 280
>>>>>>>>> 7.215
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>>>> 280
>>>>>>>> 7.215
>>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>>>>> 280
>>>>>>>>> 7.215
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
>> 280
>>>>>>>> 7.215
>>>> java.util.concurrent.ConcurrentSkipListMap.findPredecessor(Object,
>>>>>>>>> Comparator) 122 3.144
>>>>>>>>>     java.util.concurrent.ConcurrentSkipListMap.findNear(Object,
>>>> int,
>>>>>>>>> Comparator) 122 3.144
>> java.util.concurrent.ConcurrentSkipListMap.getNear(Object,
>>>>> int)
>>>>>>>>> 122
>>>>>>>>> 3.144
>>>>>>>> java.util.concurrent.ConcurrentSkipListMap.higherEntry(Object)
>>>>>>>>> 122 3.144
>>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.nextObject()
>>>>>>>> 122
>>>>>>>>> 3.144
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.procedure2.ProcedureFairRunQueues.poll()
>>>> 122
>>>>>>>> 3.144
>>>> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.poll()
>>>>>>>> 122
>>>>>>>>> 3.144
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop()
>> 122
>>>>>>>> 3.144
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Just me? Someone working on ProcedureV2 should have a look at
>> WTH
>>>> the
>>>>>>>>> ProcedureExecutors are doing. They seem to be polling the run
>> queue
>>>>>>>>> constantly and burning up CPU. I'll try running more workloads at
>>>> the
>>>>>>>> next
>>>>>>>>> opportunity and see if I can repeat this or gain more insight.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Best regards,
>>>>>>>>> 
>>>>>>>>>  - Andy
>>>>>>>>> 
>>>>>>>>> Problems worthy of attack prove their worth by hitting back. -
>> Piet
>>>>> Hein
>>>>>>>>> (via Tom White)
>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> 
>>>   - Andy
>>> 
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>> 
>> 
>> 
>> --
>> Best regards,
>> 
>>   - Andy
>> 
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>

Re: Has anyone tried profiling trunk recently?

Reply via email to