[ 
https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407491#comment-13407491
 ] 

Suresh Srinivas commented on HADOOP-8564:
-----------------------------------------

+1 for the second option. This will also allow adding future optimization at 
the stream level on Windows, similar to the ones done for Linux.
                
> Create a Windows native InputStream class to address datanode concurrent 
> reading and writing issue
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-8564
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8564
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1-win
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>
> HDFS files are made up of blocks. First, let’s look at writing. When the data 
> is written to datanode, an active or temporary file is created to receive 
> packets. After the last packet for the block is received, we will finalize 
> the block. One step during finalization is to rename the block file to a new 
> directory. The relevant code can be found via the call sequence: 
> FSDataSet.finalizeBlockInternal -> FSDir.addBlock.
> {code} 
>         if ( ! metaData.renameTo( newmeta ) ||
>             ! src.renameTo( dest ) ) {
>           throw new IOException( "could not move files for " + b +
>                                  " from tmp to " + 
>                                  dest.getAbsolutePath() );
>         }
> {code}
> Let’s then switch to reading. On HDFS, it is expected the client can also 
> read these unfinished blocks. So when the read calls from client reach 
> datanode, the datanode will open an input stream on the unfinished block file.
> The problem comes in when the file is opened for reading while the datanode 
> receives last packet from client and try to rename the finished block file. 
> This operation will succeed on Linux, but not on Windows .  The behavior can 
> be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. 
> sharing the delete (including renaming) permission with other processes while 
> opening the file. There is also a Java bug ([id 
> 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported 
> a while back on this. However, since this behavior exists for Java on Windows 
> since JDK 1.0, the Java developers do not want to break the backward 
> compatibility on this behavior. Instead, a new file system API is proposed in 
> JDK 7.
> As outlined in the [Java forum|http://www.java.net/node/645421] by the Java 
> developer (kbr), there are three ways to fix the problem:
> # Use different mechanism in the application in dealing with files.
> # Create a new implementation of InputStream abstract class using Windows 
> native code.
> # Patch JDK with a private patch that alters FileInputStream behavior.
> For the third option, it cannot fix the problem for users using Oracle JDK.
> We discussed some options for the first approach. For example one option is 
> to use two phase renaming, i.e. first hardlink; then remove the old hardlink 
> when read is finished. This option was thought to be rather pervasive.  
> Another option discussed is to change the HDFS behavior on Windows by not 
> allowing client reading unfinished blocks. However this behavior change is 
> thought to be problematic and may affect other application build on top of 
> HDFS.
> For all the reasons discussed above, we will use the second approach to 
> address the problem.
> If there are better options to fix the problem, we would also like to hear 
> about them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to