[ 
https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038807#comment-14038807
 ] 

Thomas Demoor commented on HADOOP-9565:
---------------------------------------

I support the enum set to indicate system properties such as consistency and 
operation guarantees and the other ideas postulated above but believe this will 
be a daunting task. 

HDFS is a consistent single-writer filesystem that uses temp files + rename for 
concurrency. A lot of object stores are not consistent and they are evidently 
multi-writer, as they are distributed systems. Therefore, as Steve pointed out 
in the pdf, rename and delete are difficult to implement atomically. Some 
remarks:

Consistency:
# Eventual consistency is an ambiguous term (read-after-write, atomic reads, 
etc.), thus more detail will have to be provided. Fortunately, the fact that no 
concurrent writes are performed makes things easier. However, writing 
meaningful test seems very difficult.  Will one impose a time limit to define 
"eventual"? Data loss is destined to happen: 
[HADOOP-9577|https://issues.apache.org/jira/browse/HADOOP-9577] 
# We have chosen strong consistency (even for multi-geo sites) over 
availability. 
# I believe Azure is consistent under certain conditions (single geo site, ...) 
thus, as Steve pointed out, the enum set would also be end-point dependent for 
Azure.  
# Documenting (cfr. pdf) where consistency and atomicity are required in the 
MapReduce (and other?) applications running on top of YARN is super interesting

MapReduce-HDFS integration
#  Other filesystems struggle due to the fact that hadoop.mapreduce code is 
written with HDFS in mind. For instance, as discussed in (the comments of) 
[HADOOP-9577|https://issues.apache.org/jira/browse/HADOOP-9577], a job writes 
to different temp dirs for each job attempt and then renames one to the output 
dir upon committing. An object store would rather let all speculative copies 
write to the same filename and let one of them "win". However, this requires 
implementing a custom OutputCommitter, which is not part of hadoop.fs and 
things quickly get quite messy.
# Another example is storing the intermediate output (shuffle phase) on local 
disk, to spare the HDFS NameNode, rather than in the distributed storage system 
(cfr. MapR Hadoop).
# Evidently, this tight integration of mapreduce and HDFS is completely 
understandable but maybe, similar to resource management and job tracking being 
pulled out to YARN, these semantics, which lie very close to the FileSystem, 
might be unified in the future. 

We fairly recently picked up interest in Hadoop and would like to join the 
"HCFS". If all goes well, we plan on contributing to 
[HADOOP-9361|https://issues.apache.org/jira/browse/HADOOP-9361] in the future. 

> Add a Blobstore interface to add to blobstore FileSystems
> ---------------------------------------------------------
>
>                 Key: HADOOP-9565
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9565
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 2.0.4-alpha
>            Reporter: Steve Loughran
>            Priority: Minor
>
> We can make the fact that some {{FileSystem}} implementations are really 
> blobstores, with different atomicity and consistency guarantees, by adding a 
> {{Blobstore}} interface to add to them. 
> This could also be a place to add a {{Copy(Path,Path)}} method, assuming that 
> all blobstores implement at server-side copy operation as a substitute for 
> rename.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to