[ 
https://issues.apache.org/jira/browse/HDFS-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3510:
---------------------------------------

    Description: 
It is good to avoid running out of space in the middle of writing a batch of 
edits, because when it happens, we often get partial edits at the end of the 
log.
Edit log preallocation can solve this problem (see HADOOP-2330 for a full 
description of edit log preallocation).

The current pre-allocation code was introduced for performance reasons, not for 
preventing partial edits.  As a consequence, we sometimes do a write without 
using pre-allocation.  We should change the pre-allocation code so that it 
always preallocates at least enough space before writing out the edits.

  was:
In the FSEditLog, we want to avoid running out of space in the middle of 
writing an edit log operation to the disk. We do this by a process called 
"preallocation"-- reserving space on the disk for the upcoming edit log entries 
before beginning to write them.

The idea is that if we're going to encounter an out-of-disk-space condition, we 
don't want it to happen in the middle of writing valid data.  Instead, we want 
it to happen in the middle of writing padding bytes.  The edit log uses bytes 
with the value 0xff (in decimal, -1) as padding.  These bytes correspond to 
FSEditLogOp.OP_INVALID.

The current preallocation strategy is flawed.  Although we preallocate a very 
large chunk at a time-- 1 megabyte, in fact-- we only do this preallocation 
when we are more than 4096 bytes away from the end of the file.  This means 
that the effective preallocation length is only 4096 bytes.  A batch of edit 
log entries could easily be more than this.  There is evidence that this has 
caused problems in the field for end-users.

Here is a visual illustration of the old preallocation strategy:

{code}
first write
|
V <----- 1 MB ----->
+--+---------------+
|__|FFFFFFFFFFFFFFF|
+--+---------------+
    second write
    |
    V
+--+------+--------+
|__|______|FFFFFFFF|
+--+------+--------+
           third write
           |
           V
+--+------+------+-+
|__|______|______|_|
+--+------+------+-+
                  fourth write
                  | (NOT preallocated)
                  V
+--+------+------+-+
|__|______|______|________
+--+------+------+-+
                          fifth write
                          |
                          V<--- 1 MB -->
+--+------+------+--------+---+--------+
|__|______|______|________|___|FFFFFFFF|
+--+------+------+--------+---+--------+
{code}

And here is the new preallocation strategy:

{code}
first write
|
V <----- 1 MB ----->
+--+---------------+
|__|FFFFFFFFFFFFFFF|
+--+---------------+
    second write
    |
    V
+--+------+--------+
|__|______|FFFFFFFF|
+--+------+--------+
           third write
           |
           V
+--+------+------+-+
|__|______|______|_|
+--+------+------+-+
                  fourth write
                  |
                  V <------ 1MB-->
+--+------+------+--------+------+
|__|______|______|________|      |
+--+------+------+--------+------+
                          fifth write
                          |
                          V
+--+------+------+--------+---+--+
|__|______|______|________|___|  |
+--+------+------+--------+---+--+
{code}

{code}

        Summary: Improve FSEditLog pre-allocation  (was: Fix FSEditLog 
pre-allocation)
    
> Improve FSEditLog pre-allocation
> --------------------------------
>
>                 Key: HDFS-3510
>                 URL: https://issues.apache.org/jira/browse/HDFS-3510
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 1.0.0, 2.0.0-alpha
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>             Fix For: 1.0.0, 2.0.1-alpha
>
>         Attachments: HDFS-3510-b1.001.patch, HDFS-3510-b1.002.patch, 
> HDFS-3510.001.patch, HDFS-3510.003.patch, HDFS-3510.004.patch, 
> HDFS-3510.004.patch, HDFS-3510.006.patch, HDFS-3510.007.patch, 
> HDFS-3510.008.patch, HDFS-3510.009.patch
>
>
> It is good to avoid running out of space in the middle of writing a batch of 
> edits, because when it happens, we often get partial edits at the end of the 
> log.
> Edit log preallocation can solve this problem (see HADOOP-2330 for a full 
> description of edit log preallocation).
> The current pre-allocation code was introduced for performance reasons, not 
> for preventing partial edits.  As a consequence, we sometimes do a write 
> without using pre-allocation.  We should change the pre-allocation code so 
> that it always preallocates at least enough space before writing out the 
> edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to