[ 
https://issues.apache.org/jira/browse/HDFS-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208973#comment-13208973
 ] 

Colin Patrick McCabe commented on HDFS-1846:
--------------------------------------------

Let me clear up one misconception here:

Sparse files occur when you seek to a point past the end of a file, and write 
data.

Sparse files DO NOT occur when you write out data using write(2).  Writing 
zeros to a file will not result in a sparse file.  The kernel does not have 
time to check every buffer to see whether it consists of all zeros.  If you say 
that you have important data to write, it believes you.

Using the fallocate system call is significantly faster than copying all of the 
data from userspace.  This is true for two reasons:
1. The copying solution needs to copy all those zeros (or 0xdeadbeefs, or 
whatever) from userspace to kernel space.  The fallocate solution copies 
nothing.
2. Filesystems that support extents, like ext4, can optimize their space layout 
if you tell them ahead of time that you want a big contiguous chunk of data.

For performance reasons, it would be really good to add fallocate support.  I 
do not believe that it would require another configuration knob.  The native 
code would just need to have some kind of compile-time check that falls back on 
a non-fallocate solution if the environment is too old.
                
> Don't fill preallocated portion of edits log with 0x00
> ------------------------------------------------------
>
>                 Key: HDFS-1846
>                 URL: https://issues.apache.org/jira/browse/HDFS-1846
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>             Fix For: 0.23.0
>
>         Attachments: editsStored, hdfs-1846-perf-analysis.0.patch, 
> hdfs-1846.0.txt, hdfs-1846.1.patch, hdfs-1846.2.patch, hdfs-1846.3.patch, 
> hdfs-1846.3.patch
>
>
> HADOOP-2330 added a feature to preallocate space in the local file system for 
> the NN transaction log. That change seeks past the current end of the file 
> and writes out some data, which on most systems results in the intervening 
> data in the file being filled with zeros. Most underlying file systems have 
> special handling for sparse files, and don't actually allocate blocks on disk 
> for blocks of a file which consist completely of 0x00.
> I've seen cases in the wild where the volume an edits dir is on fills up, 
> resulting in a partial final transaction being written out to disk. If you 
> examine the bytes of this (now corrupt) edits file, you'll see the partial 
> final transaction followed by a lot of zeros, suggesting that the 
> preallocation previously succeeded before the volume ran out of space. If we 
> fill the preallocated space with something other than zeros, we'd likely see 
> the failure at preallocation time, rather than transaction-writing time, and 
> so cause the NN to crash earlier, without a partial transaction being written 
> out.
> I also hypothesize that filling the preallocated space in the edits log with 
> something other than 0x00 will result in a performance improvement in NN 
> throughput. I haven't tested this yet, but I intend to as part of this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to