And in your MapReduce jobs, make sure you are not instantiating more than one HTable per task (there is a cost associated with it and can contribute to load on .META.)
On Mon, June 1, 2009 8:53 pm, stack wrote: > And check that you have block caching enabled on your .META. table. Do > "describe '.META.'" in the shell. Its on by default but maybe you > migrated from an older version or something else got in the way of its > working. > > St.Ack > > > On Mon, Jun 1, 2009 at 8:36 PM, stack <[email protected]> wrote: > > >> What Ryan said and then can you try same test after a major compaction? >> Does it make a difference? You can force it in shell by doing "hbase> >> major_compaction '.META.'" IIRC (Type 'tools' in shell to get help >> syntax). What size are your jobs? Short-lived? Seconds or minutes? >> Each >> job needs to build up cache or region locations. To do this, its trip >> to .META. Longer-lived jobs will save on trips to .META. Also, take a >> thread dump when its slow ("kill -QUIT PID_OF_MASTER") and send it to >> us. Do it a few times. We'll take a look see. >> >> Should be better in 0.20.0 but maybe a few things we can do meantime. >> >> >> St.Ack >> >> >> On Mon, Jun 1, 2009 at 5:31 PM, Jeremy Pinkham <[email protected]> >> wrote: >> >> >>> >>> sorry for the novel... >>> >>> I've been experiencing some problems with my hbase cluster and hoping >>> someone can point me in the right direction. I have a 40 node >>> cluster running 0.19.0. Each node has 4 cores, 8GB (2GB dedicated to >>> the regionserver), and 1TB data disk. The master is on a dedicated >>> machine separate from the namenode and the jobtracker. There is a >>> single table with 4 column families and 3700 regions evenly spread >>> across the 40 nodes. The TTL's match our loading pace well enough >>> that we don't typically see too many splits anymore. >>> >>> In trying to troubleshoot some larger issues with bulk loads on this >>> cluster I have created a test scenario to try and narrow the problem >>> based on various symptoms. This test is map/reduce job that is using >>> the HRegionPartitioner (as an easy way to generate some traffic to the >>> master for meta data). I've been running this job with various size >>> inputs to gauge the effect of different numbers of mappers and have >>> found that as the number of concurrent mappers creeps up to what I >>> think are still small numbers (<50 mappers), the performance of the >>> master is dramatically impacted. I'm judging the performance here >>> simply by checking the response time of the UI on the master, since >>> that has historically been a good indication of when the cluster is >>> getting into trouble during our loads (which I'm sure could mean a lot >>> of things), although i suppose it's possible to two are unrelated. >>> >>> The UI normally takes about 5-7 seconds to refresh master.jsp. >>> Running a >>> job with 5 mappers doesn't seem to impact it too much, but a job with >>> 38 >>> mappers makes the UI completely unresponsive for anywhere from 30 >>> seconds to several minutes during the run. During this time, there is >>> nothing happening in the logs, scans/gets from within the shell >>> continue to work fine, and ganglia/top show the box to be virtually >>> idle. All links off of master.jsp work fine, so I presume it's >>> something about the master pulling info from the individual nodes, but >>> those UI's are also perfectly responsive. >>> >>> This same cluster used to run on just 20 nodes without issue, so I'm >>> curious if I've crossed some threshold of horizontal scalability or if >>> there is just a tuning parameter that I'm missing that might take care >>> of this, or if there is something known between 0.19.0 and 0.19.3 that >>> might be a factor. >>> >>> Thanks >>> >>> >>> jeremy >>> >>> >>> The information transmitted in this email is intended only for the >>> person(s) or entity to which it is addressed and may contain >>> confidential and/or privileged material. Any review, retransmission, >>> dissemination or other use of, or taking of any action in reliance >>> upon, this information by persons or entities other than the intended >>> recipient is prohibited. If you received this email in error, please >>> contact the sender and permanently delete the email from any computer. >>> >>> >>> >> >
