[ 
https://issues.apache.org/jira/browse/HDFS-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865397#action_12865397
 ] 

Todd Lipcon commented on HDFS-1137:
-----------------------------------

I always assumed this is entirely on purpose. Because of the coarse grained 
locking in FSNamesystem, "fixing" this would basically serialize all writes 1:1 
with syncs to the edit log, which would drastically decrease write throughput.

We already do sync() before returning to the writer, so any write that the 
writer thinks is successful is guaranteed to be durable. It's just that other 
readers may see things that were not made durable.

I think this is perfectly acceptable for a filesystem, and it's exactly what 
you see in systems like ext3 - writes to the metadata journal are not synced 
unless you explicitly call fsync(), so a reader can read data which will 
disappear after a crash.

> Name node is using the write-ahead log improperly
> -------------------------------------------------
>
>                 Key: HDFS-1137
>                 URL: https://issues.apache.org/jira/browse/HDFS-1137
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>            Reporter: Benjamin Reed
>
> The Name node is doing the write-ahead log (WAL) (aka edit log) improperly. 
> Usually when using WAL, changes are written to the log before they are 
> applied to the state. Currently the Namenode does the WAL after applying the 
> change. This means that read may see changes before they are durable. A 
> client may read information and the server fail before the information is 
> written to the WAL, which results in the client reading state that 
> disappears. To fix the Namenode should write changes before (aka ahead of) 
> applying the change.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to