[ 
https://issues.apache.org/jira/browse/HADOOP-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993932#comment-15993932
 ] 

Lei (Eddy) Xu commented on HADOOP-14365:
----------------------------------------

Hi, [~steve_l] and [~andrew.wang].

Thanks for raising the concerns here. I would like to have a more thoughtful 
and stable {{FileSystem#create}} API.  IIUC, this {{OutputStreamBuilder}} is 
introduced aligned with trunk EC development, in this sense, should we make it 
as {{InterfaceStability#unstable}}, instead of 2.9 branch blocker? 
Additionally, it is not a major feature or the dependency of such a feature in 
branch-2. 

I like the idea that it provides a generic interface to set the options, 
especially to avoid the usage of {{if (fs instanceof FooFileSystem)}} as much 
as possible, which looks like insufficient interface design in the old school 
OOP design.  I feel that to support the capability of the current 
{{FileSystem#create()}}, the {{Builder}} might have a large surface to support, 
much like an {{o.a.h.conf.Configuration}} interface. I have a few questions 
regarding this interface:

* To make the interface as generic as possible, it seems that all the 
outputstream-specific configurations should be set via this {{setOption(String, 
...)}} interface. [~steve_l], in your experience, is it sufficient to support 
all cases in S3A/Azure/Google GCE connectors? Do this connectors have options 
that are not string/int/boolean, i.e., {{Progressable}} or {{ChecksumOpt}} used 
in {{DFS}}. 
* Options like {{favoredNodes}} and such, are very HDFS-specific and are 
difficult to be presented in string/int/bool. 

[~steve_l], has you started this work yet? If not, I'd like to offer the help 
to work on this issue. 


> Stabilise FileSystem builder-based create API 
> ----------------------------------------------
>
>                 Key: HADOOP-14365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14365
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Lei (Eddy) Xu
>            Priority: Blocker
>
> HDFS-11170 added a builder-based create API for file creation which has a few 
> issues to work out before it can be considered ready for use
> 1. There no specification in the filesystem.md of what it is meant to do, 
> which means there's no public documentation on expected behaviour except on 
> the Javadocs, which consists of the sentences "Create a new 
> FSDataOutputStreamBuilder for the file with path" and "Base of specific file 
> system FSDataOutputStreamBuilder".
> I propose:
> # Give the new method a relevant name rather than just define the return 
> type, e.g. {{createFile()}}. 
> # `Filesystem.md` to be extended with coverage of this method, and, sadly for 
> the authors, coverage of what the semantics of 
> {{FSDataOutputStreamBuilder.build()}} are.
> 2. There are only tests for HDFS and local, neither of them perfect. 
> Proposed: move to {{AbstractContractCreateTest}}, test for all filesystems, 
> fix tests and FS where appropriate. 
> 3. Add more tests to generate the failure conditions implied by the updated 
> filesystem spec. Eg. create over a an existing file, create over a directory, 
> create with negative buffer size, negative block size, empty dest path, etc, 
> etc. 
> This will clarify when precondition checks are made, as well as whether. For 
> example: should {{newFSDataOutputStreamBuilder()}} validate the path 
> immediately?
> 4. Add to {{FileContext}}.
> 5. Take the opportunity to look at the flaws in today's {{create()}} calls 
> and address them, rather than replicate. In particular, I'd like to end the 
> behaviour "create all parent dirs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to