[
https://issues.apache.org/jira/browse/HDFS-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114859#comment-15114859
]
Dinesh S. Atreya commented on HDFS-9607:
----------------------------------------
Continuing on the semantics from the parent (umbrella) JIRA before re-visiting
the API.
The proposed enhancements to core Hadoop include capability to do
“updates-in-place” in HDFS.
• Support seeks for writes (in addition to reads).
• After seek, if the new byte length is the same as the read (old) byte length,
in place update is allowed.
• Delete is an update with appropriate Delete marker
• If byte length is different, old entry can be marked as delete (as per higher
level API of the calling application such as Hive/ORC etc.) with new one
appended as before.
• It is the client’s discretion to perform either update, append or both and
the API changes in different Hadoop components should provide these
capabilities.
Expanded set of APIs is being advocated to ensure data integrity starting at
the HDFS layer itself.
This is echoed by other comments such as [[email protected]]
{quote}
HDFS is the most critical part of the Hadoop stack; data integrity is the one
thing the team cares about more than anything else. Something at the YARN layer
could impact availability or performance —but it shouldn't lose or corrupt
data. Things at the HDFS layer do, and every time something has gone in there
have been surprises downstream.
{quote}
> Advance Hadoop Architecture (AHA) - HDFS Update (write-in-place)
> ----------------------------------------------------------------
>
> Key: HDFS-9607
> URL: https://issues.apache.org/jira/browse/HDFS-9607
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Dinesh S. Atreya
>
> Link to Umbrella JIRA
> https://issues.apache.org/jira/browse/HADOOP-12620
> Provide capability to carry out in-place writes/updates. Only writes in-place
> are supported where the existing length does not change.
> For example, "Hello World" can be replaced by "Hello HDFS!"
> See
> https://issues.apache.org/jira/browse/HADOOP-12620?focusedCommentId=15046300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15046300
> for more details.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)