[jira] [Commented] (HADOOP-11452) Revisit FileSystem.rename(path, path, options)

Sanjay Radia (JIRA) Thu, 05 Jan 2017 15:42:07 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802888#comment-15802888
 ]


Sanjay Radia commented on HADOOP-11452:
---------------------------------------

Steve suggested:
bq. note that we could consider adding a new enum operation 
Rename.ATOMIC_REQUIRED which will fail if atomicity is not supported

We had considered such things (and this specific one) multiple times in the 
past,  in the context of S3 and also the local file system for not just rename 
but also other methods. Neither local fs or S3 have exactly the same semantics 
as HDFS for each method.   *Here is the main issue:* File systems like 
LocalFIlesystem is used for testing apps and for a long time S3 was used for 
simply testing or for non-critical usage on the cloud. Folks were willing to 
live with the occasional inconsistency or with the performance overhead of say 
copy-delete for rename on S3.  If  applications like  hive or Spark used the 
rename.ATOMIC_REQUIRED on then the app would just fail on S3 and those use 
cases (testing, non-critical or willing to live with the performance overhead) 
would not be supported and its users would be unhappy.

Now that users want to run production apps on cloud storage like S3,  apps like 
Hive are being modified to run well on S3 by changing how they do commit (say 
via the metastore or a menifest file instead of the rename). 

So adding the Rename.ATOMIC_REQUIRED flag is easy. But is it going to be 
useful? Please articulate how it will be used. For example if we were to change 
Hive to use Rename.ATOMIC_REQUIRED then Hive will just fail on S3.

So I think we should continue to make progress on Hive, Spark and others to run 
first class on S3. I dont think Rename.ATOMIC_REQUIRED helps. I believe it make 
sense to have an FS.whatFeaturesDoYouSupport() API so that an app like Hive 
could be implemented to run first class on HDFS, S3, AzureBlobStoage etc by 
querying the FS features and then using a  different implementation for say 
committing the output of a job. In some cases it may be better to use a totally 
different approach that works on all FSs such as a manifest file or depend on 
Hive Metastore to commit . (Turns out hive needs to be able to commit multiple 
tables and hence even the rename-dir is not good enough.)

> Revisit FileSystem.rename(path, path, options)
> ----------------------------------------------
>
>                 Key: HADOOP-11452
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11452
>             Project: Hadoop Common
>          Issue Type: Task
>          Components: fs
>    Affects Versions: 2.7.3
>            Reporter: Yi Liu
>            Assignee: Steve Loughran
>         Attachments: HADOOP-11452-001.patch, HADOOP-11452-002.patch
>
>
> Currently in {{FileSystem}}, {{rename}} with _Rename options_ is protected 
> and with _deprecated_ annotation. And the default implementation is not 
> atomic.
> So this method is not able to be used outside. On the other hand, HDFS has a 
> good and atomic implementation. (Also an interesting thing in {{DFSClient}}, 
> the _deprecated_ annotations for these two methods are opposite).
> It makes sense to make public for {{rename}} with _Rename options_, since 
> it's atomic for rename+overwrite, also it saves RPC calls if user desires 
> rename+overwrite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-11452) Revisit FileSystem.rename(path, path, options)

Reply via email to