On Wed, 2010-02-17 at 15:26 -0600, Dan Washusen wrote: > With a 12gb heap you now have 2.4GB for the block cache verses your original > 800mb. Out of interest, what are your block cache stats now? Are they > above the 72% you were seeing previously? Now that you have the resources, > it might also be worth while increasing the block cache from 0.2 to > something larger (like 0.4).
Here are the block cache hit ratio stats: Region server 1: 99 Region server 2: 99 Region server 3: 99 Region server 4: 99 > > You also have more memory available for the OS level file system cache so > that probably helps. Before adding more RAM you had pretty much all > available memory allocated to processes and nothing in the file system cache > (see the buffers section under Mem in top)... Would you be able to send us > through the top output for the machines? Sure. Here's 'top | head' for all region servers: Region server 1: http://pastebin.com/m45e0bc2 Region server 2: http://pastebin.com/m31868bb6 Region server 3: http://pastebin.com/m5b1ce71f Region server 4: http://pastebin.com/mae7326f > > Another random thaught; if you are not running map reduce task then you > could shut down the Job Trackers and put that memory to better use... Thank you! We were wondering if we could do that. We thought HBase might kick off some M/R jobs to do maintenance tasks or something, but it's good to know that we can safely shut the M/R stuff down. > > Anyway, thanks for sharing! > > Cheers, > Dan > > On 18 February 2010 06:31, Stack <st...@duboce.net> wrote: > > > On Wed, Feb 17, 2010 at 11:04 AM, James Baldassari <ja...@dataxu.com> > > wrote: > > > OK, I'll do my best to capture our changes here. Ideally we would have > > > changed one variable at a time, but since these performance problems > > > were happening in our production environment, we finally had to just > > > throw the kitchen sink at it. > > > > Been there. > > > > > > So I'm not sure which combination of the > > > following fixed the problems, but hopefully this will be useful > > > nonetheless: > > > > > > - Upgraded Hadoop from 0.20 to 0.20.1 (Cloudera version > > > hadoop-0.20-0.20.1+169.56-1). This version apparently has some fixes > > > for HDFS stability issues under load > > > > > > > I took a look in here > > http://archive.cloudera.com/cdh/2/hadoop-0.20.1+169.56.CHANGES.txt and > > nothing obvious jumps out. I'm asking the cloudera lads if they think > > there is anything that could have made a difference. > > > > > - Applied the HBASE-2180 patch > > > > > > > I'd think this would have made a big difference (though you reported > > in earlier mail it had no effect). > > > > > - Decreased the hfile block size from 64k to 4k in the configuration > > > file, but we didn't alter our existing tables, so I'm not sure if this > > > change had any effect. > > > > Did you restart after making this change? If so, the cluster picked > > it up. You can verify by taking a look at a random hfile. Do > > something like: > > > > ./bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -m PATH_TO_FILE_IN_HDFS > > > > This'll dump out meta stats on the chosen file part of which will be > > block size used writing the file. > > > > > > > > - Modified the Hadoop configuration to specify only one data directory > > > for HDFS and one Hadoop temp directory since our servers only have one > > > disk each. > > > > > > > I'd think this'd make for no difference in perf. > > > > > > > - Threw lots of hardware at it. We upgraded the region servers (which > > > are also our HDFS data nodes) from 8-core/8G boxes to 16-core/16G. Our > > > final system configuration is as follows: > > > 4x 16-core/16GB RAM/250GB SATAII HDFS data nodes / Hbase region servers > > > 1x 4-core/8GB RAM/250GB (RAID 1) SATAII namenode / Hbase master > > > 1x 4-core/8GB RAM/250GB (RAID 1) SATAII secondary namenode > > > > > > > This probably did it (smile). > > > > > - Increased region server max heap from 5G to 12G > > > > > > > So, all is in cache? Is that what did fixed the issue? > > > > Want to send us regionserver logs? > > > > Whats the request rate you are servicing? Whats it say on the master home > > page? > > > > > I think that's everything. If I had to guess, I would say that > > > upgrading Hadoop and moving to bigger hardware with more heap space for > > > HBase was what did it. If anyone wants more details, like some specific > > > config setting, let me know and I'll try to get that for you. > > > > > > HBase is having no problems keeping up with all the gets/puts now. The > > > load on the region servers is evenly distributed and is very low (< 1). > > > > > > > Whats the utilization on these boxes? Any cpu or i/o load? > > > > > Thanks again to everyone who helped me work through these issues. I > > > really appreciate it. > > > > > Thanks for sticking with it. Sorry for the loss of sleep. > > St.Ack > > > > > > > > > > On Wed, 2010-02-17 at 02:18 -0600, Daniel Washusen wrote: > > >> Glad you sorted it out! Please do tell... > > >> > > >> On 17/02/2010, at 4:59 PM, James Baldassari <ja...@dataxu.com> wrote: > > >> > > >> > Hi, > > >> > > > >> > I think we managed to solve our performance and load issues. > > >> > Everything > > >> > has been stable for about an hour now, but I'm not going to raise the > > >> > victory flag until the morning because we've had short periods of > > >> > stability in the past. > > >> > > > >> > I've been working on this problem non-stop for almost a week now, so I > > >> > really need to get some sleep, but if everything looks good tomorrow > > >> > I'll write up a summary of all the changes we made and share it with > > >> > the > > >> > group. Hopefully this exercise in tuning for a high-throughput > > >> > real-time environment will be useful to others. > > >> > > > >> > Thanks, > > >> > James > > >> > > > >> > > > >> > On Tue, 2010-02-16 at 23:18 -0600, Stack wrote: > > >> >> When you look at top on the loaded server is it the regionserver or > > >> >> the datanode that is using up the cpu? > > >> >> > > >> >> I look at your hdfs listing. Some of the regions have 3 and 4 files > > >> >> but most look fine. A good few are on the compaction verge so I'd > > >> >> imagine a lot of compaction going on but this is background though it > > >> >> does suck cpu and i/o... it shouldn't be too bad. > > >> >> > > >> >> I took a look at the regionserver log. The server is struggling > > >> >> during which time period? There is one log run at the start and > > >> >> there > > >> >> it seems like nothing untoward. Please enable DEBUG going forward. > > >> >> It'll shed more light on whats going on: See > > >> >> http://wiki.apache.org/hadoop/Hbase/FAQ#A5 for how. Otherwise, the > > >> >> log doesn't have anything running long enough for it to have been > > >> >> under serious load. > > >> >> > > >> >> This is a four node cluster now? You don't seem to have too many > > >> >> regions per server yet you have a pretty high read/write rate going > > >> >> by > > >> >> earlier requests postings. Maybe you need to add more servers. Are > > >> >> you going to add in those 16G machines? > > >> >> > > >> >> When you look at the master ui, you can see that the request rate > > >> >> over > > >> >> time is about the same for all regionservers? (refresh the master ui > > >> >> every so often to take a new sampling). > > >> >> > > >> >> St.Ack > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> On Tue, Feb 16, 2010 at 3:59 PM, James Baldassari > > >> >> <ja...@dataxu.com> wrote: > > >> >>> Nope. We don't do any map reduce. We're only using Hadoop for > > >> >>> HBase at > > >> >>> the moment. > > >> >>> > > >> >>> That one node, hdfs02, still has a load of 16 with around 40% I/O > > >> >>> and > > >> >>> 120% CPU. The other nodes are all around 66% CPU with 0-1% I/O > > >> >>> and load > > >> >>> of 1 to 3. > > >> >>> > > >> >>> I don't think all the requests are going to hdfs02 based on the > > >> >>> status > > >> >>> 'detailed' output. It seems like that node is just having a much > > >> >>> harder > > >> >>> time getting the data or something. Maybe we have some incorrect > > >> >>> HDFS > > >> >>> setting. All the configs are identical, though. > > >> >>> > > >> >>> -James > > >> >>> > > >> >>> > > >> >>> On Tue, 2010-02-16 at 17:45 -0600, Dan Washusen wrote: > > >> >>>> You mentioned in a previous email that you have a Task Tracker > > >> >>>> process > > >> >>>> running on each of the nodes. Is there any chance there is a map > > >> >>>> reduce job > > >> >>>> running? > > >> >>>> > > >> >>>> On 17 February 2010 10:31, James Baldassari <ja...@dataxu.com> > > >> >>>> wrote: > > >> >>>> > > >> >>>>> On Tue, 2010-02-16 at 16:45 -0600, Stack wrote: > > >> >>>>>> On Tue, Feb 16, 2010 at 2:25 PM, James Baldassari < > > ja...@dataxu.com > > >> >>>>>> > > > >> >>>>> wrote: > > >> >>>>>>> On Tue, 2010-02-16 at 14:05 -0600, Stack wrote: > > >> >>>>>>>> On Tue, Feb 16, 2010 at 10:50 AM, James Baldassari < > > ja...@dataxu.com > > >> >>>>>>>> > > > >> >>>>> wrote: > > >> >>>>>>> > > >> >>>>>>> Whether the keys themselves are evenly distributed is another > > >> >>>>>>> matter. > > >> >>>>>>> Our keys are user IDs, and they should be fairly random. If > > >> >>>>>>> we do a > > >> >>>>>>> status 'detailed' in the hbase shell we see the following > > >> >>>>>>> distribution > > >> >>>>>>> for the value of "requests" (not entirely sure what this value > > >> >>>>>>> means): > > >> >>>>>>> hdfs01: 7078 > > >> >>>>>>> hdfs02: 5898 > > >> >>>>>>> hdfs03: 5870 > > >> >>>>>>> hdfs04: 3807 > > >> >>>>>>> > > >> >>>>>> That looks like they are evenly distributed. Requests are how > > >> >>>>>> many > > >> >>>>>> hits a second. See the UI on master port 60010. The numbers > > >> >>>>>> should > > >> >>>>>> match. > > >> >>>>> > > >> >>>>> So the total across all 4 region servers would be 22,653/ > > >> >>>>> second? Hmm, > > >> >>>>> that doesn't seem too bad. I guess we just need a little more > > >> >>>>> throughput... > > >> >>>>> > > >> >>>>>> > > >> >>>>>> > > >> >>>>>>> There are no order of magnitude differences here, and the > > >> >>>>>>> request count > > >> >>>>>>> doesn't seem to map to the load on the server. Right now > > >> >>>>>>> hdfs02 has a > > >> >>>>>>> load of 16 while the 3 others have loads between 1 and 2. > > >> >>>>>> > > >> >>>>>> > > >> >>>>>> This is interesting. I went back over your dumps of cache > > >> >>>>>> stats above > > >> >>>>>> and the 'loaded' servers didn't have any attribute there that > > >> >>>>>> differentiated it from others. For example, the number of > > >> >>>>>> storefiles > > >> >>>>>> seemed about same. > > >> >>>>>> > > >> >>>>>> I wonder what is making for the high load? Can you figure it? > > >> >>>>>> Is it > > >> >>>>>> high CPU use (unlikely). Is it then high i/o? Can you try and > > >> >>>>>> figure > > >> >>>>>> whats different about the layout under the loaded server and > > >> >>>>>> that of > > >> >>>>>> an unloaded server? Maybe do a ./bin/hadoop fs -lsr /hbase and > > >> >>>>>> see if > > >> >>>>>> anything jumps out at you. > > >> >>>>> > > >> >>>>> It's I/O wait that is killing the highly loaded server. The CPU > > >> >>>>> usage > > >> >>>>> reported by top is just about the same across all servers > > >> >>>>> (around 100% > > >> >>>>> on an 8-core node), but one server at any given time has a much > > >> >>>>> higher > > >> >>>>> load due to I/O. > > >> >>>>> > > >> >>>>>> > > >> >>>>>> If you want to post the above or a loaded servers log to > > >> >>>>>> pastbin we'll > > >> >>>>>> take a looksee. > > >> >>>>> > > >> >>>>> I'm not really sure what to look for, but maybe someone else > > >> >>>>> will notice > > >> >>>>> something, so here's the output of hadoop fs -lsr /hbase: > > >> >>>>> http://pastebin.com/m98096de > > >> >>>>> > > >> >>>>> And here is today's region server log from hdfs02, which seems > > >> >>>>> to get > > >> >>>>> hit particularly hard: http://pastebin.com/m1d8a1e5f > > >> >>>>> > > >> >>>>> Please note that we restarted it several times today, so some of > > >> >>>>> those > > >> >>>>> errors are probably just due to restarting the region server. > > >> >>>>> > > >> >>>>>> > > >> >>>>>> > > >> >>>>>> Applying > > >> >>>>>>> HBASE-2180 did not make any measurable difference. There are > > >> >>>>>>> no errors > > >> >>>>>>> in the region server logs. However, looking at the Hadoop > > >> >>>>>>> datanode > > >> >>>>>>> logs, I'm seeing lots of these: > > >> >>>>>>> > > >> >>>>>>> 2010-02-16 17:07:54,064 ERROR > > >> >>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: > > >> >>>>> DatanodeRegistration( > > >> >>>>> 10.24.183.165:50010, > > >> >>>>> storageID=DS-1519453437-10.24.183.165-50010-1265907617548, > > >> >>>>> infoPort=50075, > > >> >>>>> ipcPort=50020):DataXceiver > > >> >>>>>>> java.io.EOFException > > >> >>>>>>> at java.io.DataInputStream.readShort > > >> >>>>>>> (DataInputStream.java:298) > > >> >>>>>>> at > > >> >>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run > > >> >>>>> (DataXceiver.java:79) > > >> >>>>>>> at java.lang.Thread.run(Thread.java:619) > > >> >>>>>> > > >> >>>>>> You upped xceivers on your hdfs cluster? If you look at > > >> >>>>>> otherend of > > >> >>>>>> the above EOFE, can you see why it died? > > >> >>>>> > > >> >>>>> Max xceivers = 3072; datanode handler count = 20; region server > > >> >>>>> handler > > >> >>>>> count = 100 > > >> >>>>> > > >> >>>>> I can't find the other end of the EOFException. I looked in the > > >> >>>>> Hadoop > > >> >>>>> and HBase logs on the server that is the name node and HBase > > >> >>>>> master, as > > >> >>>>> well as the on HBase client. > > >> >>>>> > > >> >>>>> Thanks for all the help! > > >> >>>>> > > >> >>>>> -James > > >> >>>>> > > >> >>>>>> > > >> >>>>>> > > >> >>>>>>> > > >> >>>>>>> However, I do think it's strange that > > >> >>>>>>> the load is so unbalanced on the region servers. > > >> >>>>>>> > > >> >>>>>> > > >> >>>>>> I agree. > > >> >>>>>> > > >> >>>>>> > > >> >>>>>>> We're also going to try throwing some more hardware at the > > >> >>>>>>> problem. > > >> >>>>>>> We'll set up a new cluster with 16-core, 16G nodes to see if > > >> >>>>>>> they are > > >> >>>>>>> better able to handle the large number of client requests. We > > >> >>>>>>> might > > >> >>>>>>> also decrease the block size to 32k or lower. > > >> >>>>>>> > > >> >>>>>> Ok. > > >> >>>>>> > > >> >>>>>>>> Should only be a matter if you intend distributing the above. > > >> >>>>>>> > > >> >>>>>>> This is probably a topic for a separate thread, but I've never > > >> >>>>>>> seen a > > >> >>>>>>> legal definition for the word "distribution." How does this > > >> >>>>>>> apply to > > >> >>>>>>> the SaaS model? > > >> >>>>>>> > > >> >>>>>> Fair enough. > > >> >>>>>> > > >> >>>>>> Something is up. Especially if hbase-2180 made no difference. > > >> >>>>>> > > >> >>>>>> St.Ack > > >> >>>>> > > >> >>>>> > > >> >>> > > >> >>> > > >> > > > > > > > > >