[ http://issues.apache.org/jira/browse/HADOOP-158?page=comments#action_12413922 ]
Sameer Paranjpye commented on HADOOP-158: ----------------------------------------- Yes, random assignment of file-ids makes collisions more likely. However, collisions are possible even with sequential assignment, and if they are possible they need to be detected. Since, collision detection code is needed with both random and sequential assignment, random assignment makes the system simpler because the namenode doesn't have to track the 'high watermark' file-id. Don't think recently assigned file-ids that belong to incomplete files are a concern, since the namenode will be aware of all file-ids used, whether they belong to incomplete files or not. Wrap around before a file completes is not the only collision scenario. In the sequential assignment scheme, suppose, the first million files in the system get the file-ids, 0-999999. These files archival data of some kind, so are never deleted. Life goes on, lots of files are created and removed, at any given time there are only a few million files total (complete + incomplete) in the system. At some point, the system will have gone through a trillion file creation events, the file-ids will wrap and start to collide with the first million files. > dfs should allocate a random blockid range to a file, then assign ids > sequentially to blocks in the file > -------------------------------------------------------------------------------------------------------- > > Key: HADOOP-158 > URL: http://issues.apache.org/jira/browse/HADOOP-158 > Project: Hadoop > Type: Bug > Components: dfs > Versions: 0.1.0 > Reporter: Doug Cutting > Assignee: Konstantin Shvachko > Fix For: 0.4 > > A random number generator is used to allocate block ids in dfs. Sometimes a > block id is allocated that is already used in the filesystem, which causes > filesystem corruption. > A short-term fix for this is to simply check when allocating block ids > whether any file is already using the newly allocated id, and, if it is, > generate another one. There can still be collisions in some rare conditions, > but these are harder to fix and will wait, since this simple fix will handle > the vast majority of collisions. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
