[
https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038807#comment-14038807
]
Thomas Demoor commented on HADOOP-9565:
---------------------------------------
I support the enum set to indicate system properties such as consistency and
operation guarantees and the other ideas postulated above but believe this will
be a daunting task.
HDFS is a consistent single-writer filesystem that uses temp files + rename for
concurrency. A lot of object stores are not consistent and they are evidently
multi-writer, as they are distributed systems. Therefore, as Steve pointed out
in the pdf, rename and delete are difficult to implement atomically. Some
remarks:
Consistency:
# Eventual consistency is an ambiguous term (read-after-write, atomic reads,
etc.), thus more detail will have to be provided. Fortunately, the fact that no
concurrent writes are performed makes things easier. However, writing
meaningful test seems very difficult. Will one impose a time limit to define
"eventual"? Data loss is destined to happen:
[HADOOP-9577|https://issues.apache.org/jira/browse/HADOOP-9577]
# We have chosen strong consistency (even for multi-geo sites) over
availability.
# I believe Azure is consistent under certain conditions (single geo site, ...)
thus, as Steve pointed out, the enum set would also be end-point dependent for
Azure.
# Documenting (cfr. pdf) where consistency and atomicity are required in the
MapReduce (and other?) applications running on top of YARN is super interesting
MapReduce-HDFS integration
# Other filesystems struggle due to the fact that hadoop.mapreduce code is
written with HDFS in mind. For instance, as discussed in (the comments of)
[HADOOP-9577|https://issues.apache.org/jira/browse/HADOOP-9577], a job writes
to different temp dirs for each job attempt and then renames one to the output
dir upon committing. An object store would rather let all speculative copies
write to the same filename and let one of them "win". However, this requires
implementing a custom OutputCommitter, which is not part of hadoop.fs and
things quickly get quite messy.
# Another example is storing the intermediate output (shuffle phase) on local
disk, to spare the HDFS NameNode, rather than in the distributed storage system
(cfr. MapR Hadoop).
# Evidently, this tight integration of mapreduce and HDFS is completely
understandable but maybe, similar to resource management and job tracking being
pulled out to YARN, these semantics, which lie very close to the FileSystem,
might be unified in the future.
We fairly recently picked up interest in Hadoop and would like to join the
"HCFS". If all goes well, we plan on contributing to
[HADOOP-9361|https://issues.apache.org/jira/browse/HADOOP-9361] in the future.
> Add a Blobstore interface to add to blobstore FileSystems
> ---------------------------------------------------------
>
> Key: HADOOP-9565
> URL: https://issues.apache.org/jira/browse/HADOOP-9565
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs
> Affects Versions: 2.0.4-alpha
> Reporter: Steve Loughran
> Priority: Minor
>
> We can make the fact that some {{FileSystem}} implementations are really
> blobstores, with different atomicity and consistency guarantees, by adding a
> {{Blobstore}} interface to add to them.
> This could also be a place to add a {{Copy(Path,Path)}} method, assuming that
> all blobstores implement at server-side copy operation as a substitute for
> rename.
--
This message was sent by Atlassian JIRA
(v6.2#6252)