Re: master performance

Jonathan Gray Mon, 01 Jun 2009 22:37:19 -0700

And in your MapReduce jobs, make sure you are not instantiating more than
one HTable per task (there is a cost associated with it and can contribute
to load on .META.)


On Mon, June 1, 2009 8:53 pm, stack wrote:
> And check that you have block caching enabled on your .META. table.  Do
> "describe '.META.'" in the shell.  Its on by default but maybe you
> migrated from an older version or something else got in the way of its
> working.
>
> St.Ack
>
>
> On Mon, Jun 1, 2009 at 8:36 PM, stack <[email protected]> wrote:
>
>
>> What Ryan said and then can you try same test after a major compaction?
>>  Does it make a difference?  You can force it in shell by doing "hbase>
>>  major_compaction '.META.'" IIRC (Type 'tools' in shell to get help
>> syntax).   What size are your jobs?  Short-lived?  Seconds or minutes?
>> Each
>> job needs to build up cache or region locations.  To do this, its trip
>> to .META.  Longer-lived jobs will save on trips to .META.  Also, take a
>> thread dump when its slow ("kill -QUIT PID_OF_MASTER") and send it to
>> us.  Do it a few times.  We'll take a look see.
>>
>> Should be better in 0.20.0 but maybe a few things we can do meantime.
>>
>>
>> St.Ack
>>
>>
>> On Mon, Jun 1, 2009 at 5:31 PM, Jeremy Pinkham <[email protected]>
>> wrote:
>>
>>
>>>
>>> sorry for the novel...
>>>
>>> I've been experiencing some problems with my hbase cluster and hoping
>>>  someone can point me in the right direction.  I have a 40 node
>>> cluster running 0.19.0.  Each node has 4 cores, 8GB (2GB dedicated to
>>> the regionserver), and 1TB data disk.  The master is on a dedicated
>>> machine separate from the namenode and the jobtracker.  There is a
>>> single table with 4 column families and 3700 regions evenly spread
>>> across the 40 nodes.  The TTL's match our loading pace well enough
>>> that we don't typically see too many splits anymore.
>>>
>>> In trying to troubleshoot some larger issues with bulk loads on this
>>> cluster I have created a test scenario to try and narrow the problem
>>> based on various symptoms.  This test is map/reduce job that is using
>>> the HRegionPartitioner (as an easy way to generate some traffic to the
>>> master for meta data).  I've been running this job with various size
>>> inputs to gauge the effect of different numbers of mappers and have
>>> found that as the number of concurrent mappers creeps up to what I
>>> think are still small numbers (<50 mappers), the performance of the
>>> master is dramatically impacted.  I'm judging the performance here
>>> simply by checking the response time of the UI on the master, since
>>> that has historically been a good indication of when the cluster is
>>> getting into trouble during our loads (which I'm sure could mean a lot
>>> of things), although i suppose it's possible to two are unrelated.
>>>
>>> The UI normally takes about 5-7 seconds to refresh master.jsp.
>>> Running a
>>> job with 5 mappers doesn't seem to impact it too much, but a job with
>>> 38
>>> mappers makes the UI completely unresponsive for anywhere from 30
>>> seconds to several minutes during the run.  During this time, there is
>>> nothing happening in the logs, scans/gets from within the shell
>>> continue to work fine, and ganglia/top show the box to be virtually
>>> idle.  All links off of master.jsp work fine, so I presume it's
>>> something about the master pulling info from the individual nodes, but
>>> those UI's are also perfectly responsive.
>>>
>>> This same cluster used to run on just 20 nodes without issue, so I'm
>>> curious if I've crossed some threshold of horizontal scalability or if
>>> there is just a tuning parameter that I'm missing that might take care
>>> of this, or if there is something known between 0.19.0 and 0.19.3 that
>>> might be a factor.
>>>
>>> Thanks
>>>
>>>
>>> jeremy
>>>
>>>
>>> The information transmitted in this email is intended only for the
>>> person(s) or entity to which it is addressed and may contain
>>> confidential and/or privileged material. Any review, retransmission,
>>> dissemination or other use of, or taking of any action in reliance
>>> upon, this information by persons or entities other than the intended
>>> recipient is prohibited. If you received this email in error, please
>>> contact the sender and permanently delete the email from any computer.
>>>
>>>
>>>
>>
>

Re: master performance

Reply via email to