[ 
https://issues.apache.org/jira/browse/HDFS-13713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651665#comment-16651665
 ] 

Ewan Higgs commented on HDFS-13713:
-----------------------------------

HDFS-13713.008.patch (using HDFS prefix, not HADOOP prefix even though this 
also concerns S3AFilesystem)

008 
- Allow concurrent uploads for Local file system and HDFS.
- Reorder uploads in the concurrent case.
- finalization methods (complete, abort) are not idempotent on HDFS (upload IDs 
are consumed). But they are briefly on S3 while there is a server side GC that 
will reap the upload IDs at a later time). Added an implementation dependent 
boolean to determine which behaviour is expected with repeated completes and 
aborts using an already burned uploadid.

{quote}We could maybe be vague about what happens, i.e. {quote}

We may need to be leave this open because S3 has a behaviour that is not 
consistent with HDFS and it's not obvious that we would prefer one over the 
other. Let me explain:

1. In the contract tests it becomes obvious that inS3 the last-started 
successful upload is 'the winner'.

example: Given upload1 and upload2:

init 1
init 2 <-- last started upload
putpart 1
putpart 2
complete 2 <-- last started upload is complete - 'the winner'
complete 1 <-- never to be seen unless versioning is enabled

2. In HDFS the last completed upload is 'the winner'.

Example: given upload1 and upload2:

init 1
init 2
putpart 1
putpart 2
complete 2 <-- concat and copy into place - visible until complete1
complete 1 <-- concat and copy into place - 'the winner'

3. I don't know what WASB or GCS do so specifying based on S3 behaviour at this 
time could be undesirable.



> Add specification of Multipart Upload API to FS specification, with contract 
> tests
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-13713
>                 URL: https://issues.apache.org/jira/browse/HDFS-13713
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: fs, test
>    Affects Versions: 3.2.0
>            Reporter: Steve Loughran
>            Assignee: Ewan Higgs
>            Priority: Blocker
>         Attachments: HADOOP-13713-004.patch, HADOOP-13713-004.patch, 
> HADOOP-13713-005.patch, HADOOP-13713-006.patch, HADOOP-13713-007.patch, 
> HDFS-13713.001.patch, HDFS-13713.002.patch, HDFS-13713.003.patch, 
> HDFS-13713.008.patch, multipartuploader.md
>
>
> There's nothing in the FS spec covering the new API. Add it in a new .md file
> * add FS model with the notion of a function mapping (uploadID -> Upload), 
> the operations (list, commit, abort). The [TLA+ 
> mode|https://issues.apache.org/jira/secure/attachment/12865161/objectstore.pdf]l
>  of HADOOP-13786 shows how to do this.
> * Contract tests of not just the successful path, but all the invalid ones.
> * implementations of the contract tests of all FSs which support the new API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to