[ http://issues.apache.org/jira/browse/HADOOP-146?page=all ] Doug Cutting closed HADOOP-146: -------------------------------
> potential conflict in block id's, leading to data corruption > ------------------------------------------------------------ > > Key: HADOOP-146 > URL: http://issues.apache.org/jira/browse/HADOOP-146 > Project: Hadoop > Type: Bug > Components: dfs > Versions: 0.1.0, 0.1.1 > Reporter: Yoram Arnon > Assignee: Konstantin Shvachko > Fix For: 0.3.0 > Attachments: hadoop-146-random.patch > > currently, block id's are generated randomly, and are not tested for > collisions with existing id's. > while ids are 64 bits, given enough time and a large enough FS, collisions > are expected. > when a collision occurs, a random subset of blocks with that id will be > removed as extra replicas, and the contents of that portion of the containing > file are one random version of the block. > to solve this one could check for id collision when creating a new block, > getting a new id in case of conflict. This approach requires the name node to > keep track of all existing block id's (rather than just the ones who have > reported in), and to identify old versions of a block id as in valid (in case > a data node dies, a file is deleted, then a block id is reused for a new > file). > Alternatively, one could simply use sequential block id's. Here the downsides > are: > 1. migration from an existing file system is hard, requiring compaction of > the entire FS > 2. once you cycle through 64 bits of id's (quite a few years at full blast), > you're in trouble again (or run occasional/background compaction) > 3. you must never lose the high watermark block id. > synchronized Block allocateBlock(UTF8 src) { > Block b = new Block(); > FileUnderConstruction v = (FileUnderConstruction) > pendingCreates.get(src); > v.add(b); > pendingCreateBlocks.add(b); > return b; > } > static Random r = new Random(); > /** > */ > public Block() { > this.blkid = r.nextLong(); > this.len = 0; > } -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
