[ 
https://issues.apache.org/jira/browse/HDFS-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396248#comment-13396248
 ] 

Suresh Srinivas commented on HDFS-3510:
---------------------------------------

bq. Given the different intent of pre-allocation, is this jira still valid?
I am assuming given the updates to description, the answer is yes.

bq. The idea is that if we're going to encounter an out-of-disk-space 
condition, we don't want it to happen in the middle of writing valid data.
Can you explain why this is important. Also can you explain how you solve the 
problem? It would be good add a brief description of the solution you are 
proposing. That way there is no need for looking at the code to understand the 
proposal.



                
> Fix FSEditLog pre-allocation
> ----------------------------
>
>                 Key: HDFS-3510
>                 URL: https://issues.apache.org/jira/browse/HDFS-3510
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>             Fix For: 1.0.0, 2.0.1-alpha
>
>         Attachments: HDFS-3510-b1.001.patch, HDFS-3510-b1.002.patch, 
> HDFS-3510.001.patch, HDFS-3510.003.patch, HDFS-3510.004.patch, 
> HDFS-3510.004.patch, HDFS-3510.006.patch, HDFS-3510.007.patch, 
> HDFS-3510.008.patch
>
>
> In the FSEditLog, we want to avoid running out of space in the middle of 
> writing an edit log operation to the disk. We do this by a process called 
> "preallocation"-- reserving space on the disk for the upcoming edit log 
> entries before beginning to write them.
> The idea is that if we're going to encounter an out-of-disk-space condition, 
> we don't want it to happen in the middle of writing valid data.  Instead, we 
> want it to happen in the middle of writing padding bytes.  The edit log uses 
> bytes with the value 0xff (in decimal, -1) as padding.  These bytes 
> correspond to FSEditLogOp.OP_INVALID.
> The current preallocation strategy is flawed.  Although we preallocate a very 
> large chunk at a time-- 1 megabyte, in fact-- we only do this preallocation 
> when we are more than 4096 bytes away from the end of the file.  This means 
> that the effective preallocation length is only 4096 bytes.  A batch of edit 
> log entries could easily be more than this.  There is evidence that this has 
> caused problems in the field for end-users.
> Here is a visual illustration of the old preallocation strategy:
> {code}
> first write
> |
> V <----- 1 MB ----->
> +--+---------------+
> |__|FFFFFFFFFFFFFFF|
> +--+---------------+
>     second write
>     |
>     V
> +--+------+--------+
> |__|______|FFFFFFFF|
> +--+------+--------+
>            third write
>            |
>            V
> +--+------+------+-+
> |__|______|______|_|
> +--+------+------+-+
>                   fourth write
>                   | (NOT preallocated)
>                   V
> +--+------+------+-+
> |__|______|______|________
> +--+------+------+-+
>                           fifth write
>                           |
>                           V<--- 1 MB -->
> +--+------+------+--------+---+--------+
> |__|______|______|________|___|FFFFFFFF|
> +--+------+------+--------+---+--------+
> {code}
> And here is the new preallocation strategy:
> {code}
> first write
> |
> V <----- 1 MB ----->
> +--+---------------+
> |__|FFFFFFFFFFFFFFF|
> +--+---------------+
>     second write
>     |
>     V
> +--+------+--------+
> |__|______|FFFFFFFF|
> +--+------+--------+
>            third write
>            |
>            V
> +--+------+------+-+
> |__|______|______|_|
> +--+------+------+-+
>                   fourth write
>                   |
>                   V <------ 1MB-->
> +--+------+------+--------+------+
> |__|______|______|________|      |
> +--+------+------+--------+------+
>                           fifth write
>                           |
>                           V
> +--+------+------+--------+---+--+
> |__|______|______|________|___|  |
> +--+------+------+--------+---+--+
> {code}
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to