Hello, developers, I am just curious about the following two things which seems to be contradictory to each other, please help me find out my understanding mistakes:
1) Excerpted from sosp 2013 paper, "Then, when a node fails, the system detects all missing RDD partitions and launches tasks to recompute them from the last checkpoint. Many tasks can be launched at the same time to compute different RDD partitions, allowing the whole cluster to partake in recovery." 2) Excerpted from code, this function is called when there's one dead BlockManager, this function didn't launch tasks to recover lost partitions, instead, it updated many meta-data. private def removeBlockManager(blockManagerId: BlockManagerId) { val info = blockManagerInfo(blockManagerId) // Remove the block manager from blockManagerIdByExecutor. blockManagerIdByExecutor -= blockManagerId.executorId // Remove it from blockManagerInfo and remove all the blocks. blockManagerInfo.remove(blockManagerId) val iterator = info.blocks.keySet.iterator while (iterator.hasNext) { val blockId = iterator.next val locations = blockLocations.get(blockId) locations -= blockManagerId if (locations.size == 0) { blockLocations.remove(locations) } } } thanks, dachuan. -- Dachuan Huang Cellphone: 614-390-7234 2015 Neil Avenue Ohio State University Columbus, Ohio U.S.A. 43210