[ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409534#comment-15409534 ]
Thomas Demoor commented on HADOOP-9565: --------------------------------------- Steve the "avoid data write" thing you mention is exactly why these direct outputcommitters (and what I did for the FileOutputCommitter) work on object stores. Multiple writers can write to the same object concurrently. At any point, the last-started successfully-completed write is what is visible. Regular put: * Content length (=N) communicated at start of request. * Once N bytes hit S3 the object becomes visible * If hadoop task aborts before writing N bytes the upload will timeout and the object version is garbage collected by S3. MulitpartUpload: * Requires explicit API call to complete (or abort) * Only when complete API call is used the object becomes visible * If hadoop task fails the upload will remain to be active (s3a has the purge functionality to automatically clean these up after a certain period) but the object is NOT visible The interesting thing to think about are network partitions. > Add a Blobstore interface to add to blobstore FileSystems > --------------------------------------------------------- > > Key: HADOOP-9565 > URL: https://issues.apache.org/jira/browse/HADOOP-9565 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, fs/s3, fs/swift > Affects Versions: 2.6.0 > Reporter: Steve Loughran > Assignee: Pieter Reuse > Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, > HADOOP-9565-003.patch, HADOOP-9565-004.patch, HADOOP-9565-005.patch, > HADOOP-9565-006.patch, HADOOP-9565-branch-2-007.patch > > > We can make the fact that some {{FileSystem}} implementations are really > blobstores, with different atomicity and consistency guarantees, by adding a > {{Blobstore}} interface to add to them. > This could also be a place to add a {{Copy(Path,Path)}} method, assuming that > all blobstores implement at server-side copy operation as a substitute for > rename. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org