Looks like your problem is #4: "dfs.datanode.max.xcievers" You need to set this in hadoop/conf/hdfs-site.xml
Change this and I'm sure your experience will improve. Good luck. -ryan On Sat, Dec 5, 2009 at 11:07 PM, Adam Silberstein <[email protected]> wrote: > Thanks for the suggestions. Let me run down what I tried: > 1. My ulimit was already much higher than 1024, so no change there. > 2. I was not using hdfs-127. I switched to that. I didn't use M/R to do my > initial load, by the way. > 3. I was a little unclear on which handler counts to increase and to what. > I changed hbase.regionserver.handler.count, dfs.namenode.handler.count, and > dfs.datanode.handler.count all from 10 to 100. > 4. I did see the error that I was exceeding the dfs.datanode.max.xcievers > value 256. What's odd is that I have that set to ~3000, but it's apparently > not getting picked up by hdfs when it starts. Any ideas there (like is it > really xceivers)? > 5. I'm not sure how many regions per regionserver. What's a good way to > check that. > 6. Didn't get to checking for missing block. > > Ultimately, either #2 or #3 or both helped. I was able to push throughput > way up without seeing the error recur. So thanks a lot for the help! I'm > still interested in getting the best performance possible. So if you think > fixing the xciever problem will help, I'd like to spend some more time > there. > > Thanks, > Adam > > > On 12/5/09 9:38 PM, "stack" <[email protected]> wrote: > >> See http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6. Different hdfs >> complaint but make sure your ulimit is > 1024 (check first or second line in >> master log -- it prints out what hbase is seeing for ulimit), check that >> hdfs-127 is applied to the first hadoop that hbase sees on CLASSPATH (this >> is particularly important if your loading script is a mapreduce task, >> clients might not be seeing the patched hadoop that hbase ships with). Also >> up the handler count for hdfs (the referred to timeout is no longer >> pertinent I believe) and while you are at it, those for hbase if you haven't >> changed them from defaults. While you are at it, make sure you don't suffer >> from http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5. >> >> How many regions per regionserver? >> >> Can you put a regionserver log somewhere I can pull it to take a look? >> >> For a "Could not obtain block message", what happens if you take the >> filename -- 2540865741541403627 in the below -- and grep NameNode. Does it >> tell you anything? >> >> St.Ack >> >> On Sat, Dec 5, 2009 at 3:32 PM, Adam Silberstein >> <[email protected]>wrote: >> >>> Hi, >>> I¹m having problems doing client operations when my table is large. I did >>> an initial test like this: >>> 6 servers >>> 6 GB heap size per server >>> 20 million 1K recs (so ~3 GB per server) >>> >>> I was able to do at least 5,000 random read/write operations per second. >>> >>> I think increased my table size to >>> 120 million 1K recs (so ~20 GB per server) >>> >>> I then put a very light load of random reads on the table: 20 reads per >>> second. I¹m able to do a few, but within 10-20 seconds, they all fail. I >>> found many errors of the following type in the hbase master log: >>> >>> java.io.IOException: java.io.IOException: Could not obtain block: >>> blk_-7409743019137510182_39869 >>> file=/hbase/.META./1028785192/info/2540865741541403627 >>> >>> If I wait about 5 minutes, I can repeat this sequence (do a few operations, >>> then get errors). >>> >>> If anyone has any suggestions or needs me to list particular settings, let >>> me know. The odd thing is that I observe no problems and great performance >>> with a smaller table. >>> >>> Thanks, >>> Adam >>> >>> >>> > >
