[
https://issues.apache.org/jira/browse/HDFS-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zheng Shao updated HDFS-14229:
------------------------------
Description:
Right now, the create call on HDFS is blocking. The write call can also be
blocking if the write buffer reached its limit.
However, for most applications, the only requirement is that when "close" on a
file is called, the file is persisted and visible in HDFS. There is no need to
make "create" visible right after the "create" call returns.
A particular use case of this is to use HDFS as a place to store shuffle data
(in Spark, Map-Reduce, or other loose-coupled applications).
This Jira proposes that we add a new "async-hdfs://" protocol that maps to a
new AsyncDistributedFileSystem class, whose create call is nonblocking but
still returns a FSOutputStream that is non-blocking on write (even when the
file has not been physically created on HDFS yet; may only be blocking when a
write buffer limit is specified and reached). The close call on the
FSOutputStream will block until the creation and all previous writes are
completed and the file is closed.
Note that this Jira is related to
https://issues.apache.org/jira/browse/HDFS-9924 but not the same. HDFS-9924
talks about async rename etc. This Jira talks about async create|write.
was:
Right now, the create call on HDFS is blocking. The write call can also be
blocking if the write buffer reached its limit.
However, for most applications, the only requirement is that when "close" on a
file is called, the file is persisted and visible in HDFS. There is no need to
make "create" visible right after the "create" call returns.
A particular use case of this is to use HDFS as a place to store shuffle data
(in Spark, Map-Reduce, or other loose-coupled applications).
This Jira proposes that we add a new "async-hdfs://" protocol that maps to a
new AsyncDistributedFileSystem class, whose create call is nonblocking but
still returns a FSOutputStream that is never blocked on write (even when the
file has not been physically created on HDFS yet). The close call on the
FSOutputStream will block until the creation and all previous writes are
completed and the file is closed.
Note that this Jira is related to
https://issues.apache.org/jira/browse/HDFS-9924 but not the same. HDFS-9924
talks about async rename etc. This Jira talks about async create|write.
> Nonblocking HDFS create|write
> -----------------------------
>
> Key: HDFS-14229
> URL: https://issues.apache.org/jira/browse/HDFS-14229
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: hdfs-client
> Reporter: Zheng Shao
> Priority: Major
>
> Right now, the create call on HDFS is blocking. The write call can also be
> blocking if the write buffer reached its limit.
> However, for most applications, the only requirement is that when "close" on
> a file is called, the file is persisted and visible in HDFS. There is no
> need to make "create" visible right after the "create" call returns.
> A particular use case of this is to use HDFS as a place to store shuffle data
> (in Spark, Map-Reduce, or other loose-coupled applications).
>
> This Jira proposes that we add a new "async-hdfs://" protocol that maps to a
> new AsyncDistributedFileSystem class, whose create call is nonblocking but
> still returns a FSOutputStream that is non-blocking on write (even when the
> file has not been physically created on HDFS yet; may only be blocking when a
> write buffer limit is specified and reached). The close call on the
> FSOutputStream will block until the creation and all previous writes are
> completed and the file is closed.
>
> Note that this Jira is related to
> https://issues.apache.org/jira/browse/HDFS-9924 but not the same. HDFS-9924
> talks about async rename etc. This Jira talks about async create|write.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]