Todd Lipcon created KUDU-1538:
---------------------------------

             Summary: "Orphaned" block deletion can delete live blocks in use 
by other tablets
                 Key: KUDU-1538
                 URL: https://issues.apache.org/jira/browse/KUDU-1538
             Project: Kudu
          Issue Type: Bug
          Components: fs, tablet
    Affects Versions: 0.9.1
            Reporter: Todd Lipcon
            Priority: Blocker


Currently, we allocate block IDs using a random number generator, ensuring that 
the blocks we allocate are not already in use. Of course that doesn't proclude 
a block which was previously used and then deleted from having its ID reused.

This interacts quite poorly with the "orphaned block" processing we have in 
tablet metadata. As a refresher, the "orphaned block" thing is used as follows:
- during a compaction, we have the output blocks (newly written data) and the 
input blocks (data which has been compacted and no longer relevant)
- when the compaction finishes, we write a new TabletMetadata which swaps in 
the new blocks and removes the old blocks
-- followed by that, we delete the old (input) blocks. Of course we can't 
delete the old blocks until after we've flushed the metadata, or else if we 
crashed before flushing the metadata we'd have lost track of the new block IDs.
-- so, we defer the deletion of the input blocks until after the metadata has 
been flushed
- this leaves open the opposite hole: if we defer the deletion of the old 
blocks, and we crash just _after_ flushing metadata, we would leak those old 
blocks and their disk space, which is no good either.
-- so, when we flush metadata, we include the 'old blocks' in a 'orphan_blocks' 
array. On loading of metadata, we try to 'roll forward' the deletion to prevent 
the above-mentioned leak from being permanent.

The "roll forward" behavior mentioned above is what seems to be eating blocks. 
We can now have the following bad interleaving:
- a compaction in tablet A succeeds and lists block ID "X" as orphaned
- a different tablet B re-uses block ID "X"
- we restart the TS, or trigger a remote bootstrap (which also "cleans up" 
orphan blocks)
-- it deletes block "X" from underneath tablet "B"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to