[
https://issues.apache.org/jira/browse/HDFS-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yongjun Zhang updated HDFS-9092:
--------------------------------
Resolution: Fixed
Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
Status: Resolved (was: Patch Available)
Thanks [~brandonli] a lot for the review!
I committed to trunk and branch-2.
BTW, the test audit msg is due to my accidental inclusion of CHANGES.txt in the
patch; somehow my earlier cleaning of whitespaces using --whitespace=fix did
not clean up everything; the new test failure is irrelevant to the fix here.
> Nfs silently drops overlapping write requests and causes data copying to fail
> -----------------------------------------------------------------------------
>
> Key: HDFS-9092
> URL: https://issues.apache.org/jira/browse/HDFS-9092
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: nfs
> Affects Versions: 2.7.1
> Reporter: Yongjun Zhang
> Assignee: Yongjun Zhang
> Fix For: 2.8.0
>
> Attachments: HDFS-9092.001.patch, HDFS-9092.002.patch
>
>
> When NOT using 'sync' option, the NFS writes may issue the following warning:
> org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Got an overlapping write
> (1248751616, 1249677312), nextOffset=1248752400. Silently drop it now
> and the size of data copied via NFS will stay at 1248752400.
> Found what happened is:
> 1. The write requests from client are sent asynchronously.
> 2. The NFS gateway has handler to handle the incoming requests by creating an
> internal write request structuire and put it into cache;
> 3. In parallel, a separate thread in NFS gateway takes requests out from the
> cache and writes the data to HDFS.
> The current offset is how much data has been written by the write thread in
> 3. The detection of overlapping write request happens in 2, but it only
> checks the write request against the curent offset, and trim the request if
> necessary. Because the write requests are sent asynchronously, if two
> requests are beyond the current offset, and they overlap, it's not detected
> and both are put into the cache. This cause the symptom reported in this case
> at step 3.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)