[ http://issues.apache.org/jira/browse/HADOOP-158?page=comments#action_12413909 ]
Sameer Paranjpye commented on HADOOP-158: ----------------------------------------- It can be sequential. In that case, the namenode would need to determine the lowest unused file-id at startup and start file-id assignments from that point. Even sequential allocation of file-ids should probably do the collision check because you don't need a trillion files in the system before you wrap around, you only need a trillion file creation events. If you're doing the collision check in both schemes the random file-id assignment keeps things simpler. The possibility of collision with sequential assignment of file-ids is very remote, but why expose ourselves? I'm probably being paranoid so ignore me on this one if you want. > dfs should allocate a random blockid range to a file, then assign ids > sequentially to blocks in the file > -------------------------------------------------------------------------------------------------------- > > Key: HADOOP-158 > URL: http://issues.apache.org/jira/browse/HADOOP-158 > Project: Hadoop > Type: Bug > Components: dfs > Versions: 0.1.0 > Reporter: Doug Cutting > Assignee: Konstantin Shvachko > Fix For: 0.4 > > A random number generator is used to allocate block ids in dfs. Sometimes a > block id is allocated that is already used in the filesystem, which causes > filesystem corruption. > A short-term fix for this is to simply check when allocating block ids > whether any file is already using the newly allocated id, and, if it is, > generate another one. There can still be collisions in some rare conditions, > but these are harder to fix and will wait, since this simple fix will handle > the vast majority of collisions. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
