Thank you very much Andy. Yes, it is really a difficult issue. Schubert On Fri, Mar 27, 2009 at 1:13 AM, Andrew Purtell <[email protected]> wrote:
> > Hi Schubert, > > I set dfs.datanode.max.xcievers=4096 in my config. This was the > only way I was able to bring > 7000 regions online on 25 nodes > during cluster restart without DFS errors. Definitely the > default is too low for HBase. HFile in 0.20 will have material > impact here, which should help the situation. Also perhaps more > can/will be done with regards to HBASE-24 to relieve the load on > the DataNodes: > > > https://issues.apache.org/jira/browse/HBASE-24?focusedCommentId=12613104&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12613104 > > The root cause of this is HADOOP-3846: > https://issues.apache.org/jira/browse/HADOOP-3856 > > I looked at helping out on this issue. There is so much > reimplementation of such a fundamental component (to Hadoop) > involved that it's difficult for a part-time volunteer to make > progress on it. Even if the code can be changed, there is > follow up shepherding through Core review and release processes > to consider... I hold out hope that a commercial user of Hadoop > will have pain in this area and commit sponsored resources to > address the issue of I/O scalability in DFS. I think when DFS > was written the expectation was that 10,000 nodes would have > only a few open files each -- very large mapreduce inputs, > intermediates, and outputs -- not that 100s of nodes might > have 1,000s of files open each. In any case, the issue is well > known. > > I have found "dfs.datanode.socket.write.timeout=0" is not > necessary for HBase 0.19.1 on Hadoop 0.19.1 in my testing. > > Best regards, > > -Andy > > > > From: schubert zhang <[email protected]> > > Subject: Re: Data lost during intensive writes > > To: [email protected], [email protected] > > Date: Thursday, March 26, 2009, 4:58 AM > > > > I will set "dfs.datanode.max.xcievers=1024" (default is 256) > > > > I am using branch-0.19. > > Do you think "dfs.datanode.socket.write.timeout=0" is > > necessary in release-0.19? > > > > Schubert > > > > >
