That is a different motivation. The document talks about why you should use > federation. I am asking about motivation of supporting the code base while > not using it. At least this is how understand Allen's question and some of > my colleagues'. >
Namenode code is not changed at all. Datanode code changes to add the notion of block pool and a thread per NN. For a single NN, datanode is equivalent to the current datanode. If you argue that there should not be any code change - not sure how features like this can be added to HDFS. There is no change from user perspective and performance of the system. No additional complexity from the existing system. > If you could put some numbers in the jira for the reference. > Will do. > > Also it is interesting to know whether there is a benefit in splitting > the namespace. Can I e.g. do more getBlockLocations per second? > This is one of the aspects of scaling, right? > I do not understand your question. This feature does not scale getBlockLocations per second for a single NN. When you use many NNs, total requests per second does scale for the entire cluster. > As we developed this feature, some significant improvements have been made > to the system - fast snapshots (snapshot time down from 1hr 45 mins to 1 > min!), fast startup, cleanup of storage, fixing multi threading issues in > several places, decommissioning improvements etc. > > This is a valid concern. Hence the single namenode configuration that most > > installations run today, will run as is. We put a lot of development and > > testing effort to ensure this. > > > > I don't know what you mean by "as is". My experience with this word in real > estate tells me it can be anything. > I used the word with following meaning: http://www.merriam-webster.com/dictionary/as%20is — *as is* *:* in the presently existing condition without modification