[ 
https://issues.apache.org/jira/browse/HDFS-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HDFS-14229:
------------------------------
    Description: 
Right now, the create call on HDFS is blocking.  The write call can also be 
blocking if the write buffer reached its limit.

However, for most applications, the only requirement is that when "close" on a 
file is called, the file is persisted and visible in HDFS.  There is no need to 
make "create" visible right after the "create" call returns.

A particular use case of this is to use HDFS as a place to store shuffle data 
(in Spark, Map-Reduce, or other loose-coupled applications).

 

This Jira proposes that we add a new "async-hdfs://" protocol that maps to a 
new AsyncDistributedFileSystem class, whose create call is nonblocking but 
still returns a FSOutputStream that is never blocked on write (even when the 
file has not been physically created on HDFS yet).  The close call on the 
FSOutputStream will block until the creation and all previous writes are 
completed and the file is closed.

 

Note that this Jira is related to 
https://issues.apache.org/jira/browse/HDFS-9924 but not the same.  HDFS-9924 
talks about async rename etc.  This Jira talks about async create|write. 

  was:
Right now, the create call on HDFS is blocking.  The write call can also be 
blocking if the write buffer reached its limit.

However, for most applications, the only requirement is that when "close" on a 
file is called, the file is persisted and visible in HDFS.  There is no need to 
make "create" visible right after the "create" call returns.

A particular use case of this is to use HDFS as a place to store shuffle data 
(in Spark, Map-Reduce, or other loose-coupled applications).

 

This Jira proposes that we add a new "async-hdfs://" protocol that maps to a 
new AsyncDistributedFileSystem class, whose create call is nonblocking but 
still returns a FSOutputStream that is never blocked on write (even when the 
file has not been physically created on HDFS yet).  The close call on the 
FSOutputStream will block until the creation and all previous writes are 
completed and the file is closed.

 


> Nonblocking HDFS create|write
> -----------------------------
>
>                 Key: HDFS-14229
>                 URL: https://issues.apache.org/jira/browse/HDFS-14229
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs-client
>            Reporter: Zheng Shao
>            Priority: Major
>
> Right now, the create call on HDFS is blocking.  The write call can also be 
> blocking if the write buffer reached its limit.
> However, for most applications, the only requirement is that when "close" on 
> a file is called, the file is persisted and visible in HDFS.  There is no 
> need to make "create" visible right after the "create" call returns.
> A particular use case of this is to use HDFS as a place to store shuffle data 
> (in Spark, Map-Reduce, or other loose-coupled applications).
>  
> This Jira proposes that we add a new "async-hdfs://" protocol that maps to a 
> new AsyncDistributedFileSystem class, whose create call is nonblocking but 
> still returns a FSOutputStream that is never blocked on write (even when the 
> file has not been physically created on HDFS yet).  The close call on the 
> FSOutputStream will block until the creation and all previous writes are 
> completed and the file is closed.
>  
> Note that this Jira is related to 
> https://issues.apache.org/jira/browse/HDFS-9924 but not the same.  HDFS-9924 
> talks about async rename etc.  This Jira talks about async create|write. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to