[ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038807#comment-14038807 ]
Thomas Demoor commented on HADOOP-9565: --------------------------------------- I support the enum set to indicate system properties such as consistency and operation guarantees and the other ideas postulated above but believe this will be a daunting task. HDFS is a consistent single-writer filesystem that uses temp files + rename for concurrency. A lot of object stores are not consistent and they are evidently multi-writer, as they are distributed systems. Therefore, as Steve pointed out in the pdf, rename and delete are difficult to implement atomically. Some remarks: Consistency: # Eventual consistency is an ambiguous term (read-after-write, atomic reads, etc.), thus more detail will have to be provided. Fortunately, the fact that no concurrent writes are performed makes things easier. However, writing meaningful test seems very difficult. Will one impose a time limit to define "eventual"? Data loss is destined to happen: [HADOOP-9577|https://issues.apache.org/jira/browse/HADOOP-9577] # We have chosen strong consistency (even for multi-geo sites) over availability. # I believe Azure is consistent under certain conditions (single geo site, ...) thus, as Steve pointed out, the enum set would also be end-point dependent for Azure. # Documenting (cfr. pdf) where consistency and atomicity are required in the MapReduce (and other?) applications running on top of YARN is super interesting MapReduce-HDFS integration # Other filesystems struggle due to the fact that hadoop.mapreduce code is written with HDFS in mind. For instance, as discussed in (the comments of) [HADOOP-9577|https://issues.apache.org/jira/browse/HADOOP-9577], a job writes to different temp dirs for each job attempt and then renames one to the output dir upon committing. An object store would rather let all speculative copies write to the same filename and let one of them "win". However, this requires implementing a custom OutputCommitter, which is not part of hadoop.fs and things quickly get quite messy. # Another example is storing the intermediate output (shuffle phase) on local disk, to spare the HDFS NameNode, rather than in the distributed storage system (cfr. MapR Hadoop). # Evidently, this tight integration of mapreduce and HDFS is completely understandable but maybe, similar to resource management and job tracking being pulled out to YARN, these semantics, which lie very close to the FileSystem, might be unified in the future. We fairly recently picked up interest in Hadoop and would like to join the "HCFS". If all goes well, we plan on contributing to [HADOOP-9361|https://issues.apache.org/jira/browse/HADOOP-9361] in the future. > Add a Blobstore interface to add to blobstore FileSystems > --------------------------------------------------------- > > Key: HADOOP-9565 > URL: https://issues.apache.org/jira/browse/HADOOP-9565 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs > Affects Versions: 2.0.4-alpha > Reporter: Steve Loughran > Priority: Minor > > We can make the fact that some {{FileSystem}} implementations are really > blobstores, with different atomicity and consistency guarantees, by adding a > {{Blobstore}} interface to add to them. > This could also be a place to add a {{Copy(Path,Path)}} method, assuming that > all blobstores implement at server-side copy operation as a substitute for > rename. -- This message was sent by Atlassian JIRA (v6.2#6252)