[ https://issues.apache.org/jira/browse/HADOOP-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dhruba borthakur resolved HADOOP-1306. -------------------------------------- Resolution: Duplicate The slowness of getAdditionalBlock RPC has been addressed by HADOOP-1269, HADOOP-1187, HADOOP-1149 AND HADOOP-1073 > DFS Scalability: Reduce the number of getAdditionalBlock RPCs on the namenode > ----------------------------------------------------------------------------- > > Key: HADOOP-1306 > URL: https://issues.apache.org/jira/browse/HADOOP-1306 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Reporter: dhruba borthakur > Attachments: fineGrainLocks3.patch > > > One of the most-frequently-invoked RPCs in the namenode is the addBlock() > RPC. The DFSClient uses this RPC to allocate one more block for a file that > it is currently operating upon. The scalability of the namenode will improve > if we can decrease the number of addBlock() RPCs. One idea that we want to > discuss here is to make addBlock() return more than one block. This proposal > came out of a discussion I had with Ben Reed. > Let's say that addBlock() returns n blocks for the file. The namenode already > tracks these blocks using the pendingCreates data structure. The client > guarantees that these n blocks will be used in order. The client also > guarantees that if it cannot use a block (dues to whatever reason), it will > inform the namenode using the abandonBlock() RPC. These RPCs are already > supported. > Another possible optimization : since the namenode has to allocate n blocks > for a file, should it use the same set of datanodes for this set of blocks? > My proposal is that if n is a small number (e.g. 3), it is prudent to > allocate the same set of datanodes to host all replicas for this set of > blocks. This will reduce the CPU spent in chooseTargets(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.