[ 
http://issues.apache.org/jira/browse/HADOOP-158?page=comments#action_12413922 ] 

Sameer Paranjpye commented on HADOOP-158:
-----------------------------------------

Yes, random assignment of file-ids makes collisions more likely. However, 
collisions are possible even with sequential assignment, and if they are 
possible they need to be detected. Since, collision detection code is needed 
with both random and sequential assignment, random assignment makes the system 
simpler because the namenode doesn't have to track the 'high watermark' file-id.

Don't think recently assigned file-ids that belong to incomplete files are a 
concern, since the namenode will be aware of all file-ids used, whether they 
belong to incomplete files or not.

Wrap around before a file completes is not the only collision scenario. In the 
sequential assignment scheme, suppose, the first million files in the system 
get the file-ids, 0-999999. These files archival data of some kind, so are 
never deleted. Life goes on, lots of files are created and removed, at any 
given time there are only a few million files total (complete + incomplete) in 
the system. At some point, the system will have gone through a trillion file 
creation events, the file-ids will wrap and start to collide with the first 
million files.


> dfs should allocate a random blockid range to a file, then assign ids 
> sequentially to blocks in the file
> --------------------------------------------------------------------------------------------------------
>
>          Key: HADOOP-158
>          URL: http://issues.apache.org/jira/browse/HADOOP-158
>      Project: Hadoop
>         Type: Bug

>   Components: dfs
>     Versions: 0.1.0
>     Reporter: Doug Cutting
>     Assignee: Konstantin Shvachko
>      Fix For: 0.4

>
> A random number generator is used to allocate block ids in dfs.  Sometimes a 
> block id is allocated that is already used in the filesystem, which causes 
> filesystem corruption.
> A short-term fix for this is to simply check when allocating block ids 
> whether any file is already using the newly allocated id, and, if it is, 
> generate another one.  There can still be collisions in some rare conditions, 
> but these are harder to fix and will wait, since this simple fix will handle 
> the vast majority of collisions.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to