[
https://issues.apache.org/jira/browse/HADOOP-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993932#comment-15993932
]
Lei (Eddy) Xu commented on HADOOP-14365:
----------------------------------------
Hi, [~steve_l] and [~andrew.wang].
Thanks for raising the concerns here. I would like to have a more thoughtful
and stable {{FileSystem#create}} API. IIUC, this {{OutputStreamBuilder}} is
introduced aligned with trunk EC development, in this sense, should we make it
as {{InterfaceStability#unstable}}, instead of 2.9 branch blocker?
Additionally, it is not a major feature or the dependency of such a feature in
branch-2.
I like the idea that it provides a generic interface to set the options,
especially to avoid the usage of {{if (fs instanceof FooFileSystem)}} as much
as possible, which looks like insufficient interface design in the old school
OOP design. I feel that to support the capability of the current
{{FileSystem#create()}}, the {{Builder}} might have a large surface to support,
much like an {{o.a.h.conf.Configuration}} interface. I have a few questions
regarding this interface:
* To make the interface as generic as possible, it seems that all the
outputstream-specific configurations should be set via this {{setOption(String,
...)}} interface. [~steve_l], in your experience, is it sufficient to support
all cases in S3A/Azure/Google GCE connectors? Do this connectors have options
that are not string/int/boolean, i.e., {{Progressable}} or {{ChecksumOpt}} used
in {{DFS}}.
* Options like {{favoredNodes}} and such, are very HDFS-specific and are
difficult to be presented in string/int/bool.
[~steve_l], has you started this work yet? If not, I'd like to offer the help
to work on this issue.
> Stabilise FileSystem builder-based create API
> ----------------------------------------------
>
> Key: HADOOP-14365
> URL: https://issues.apache.org/jira/browse/HADOOP-14365
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 2.9.0
> Reporter: Steve Loughran
> Assignee: Lei (Eddy) Xu
> Priority: Blocker
>
> HDFS-11170 added a builder-based create API for file creation which has a few
> issues to work out before it can be considered ready for use
> 1. There no specification in the filesystem.md of what it is meant to do,
> which means there's no public documentation on expected behaviour except on
> the Javadocs, which consists of the sentences "Create a new
> FSDataOutputStreamBuilder for the file with path" and "Base of specific file
> system FSDataOutputStreamBuilder".
> I propose:
> # Give the new method a relevant name rather than just define the return
> type, e.g. {{createFile()}}.
> # `Filesystem.md` to be extended with coverage of this method, and, sadly for
> the authors, coverage of what the semantics of
> {{FSDataOutputStreamBuilder.build()}} are.
> 2. There are only tests for HDFS and local, neither of them perfect.
> Proposed: move to {{AbstractContractCreateTest}}, test for all filesystems,
> fix tests and FS where appropriate.
> 3. Add more tests to generate the failure conditions implied by the updated
> filesystem spec. Eg. create over a an existing file, create over a directory,
> create with negative buffer size, negative block size, empty dest path, etc,
> etc.
> This will clarify when precondition checks are made, as well as whether. For
> example: should {{newFSDataOutputStreamBuilder()}} validate the path
> immediately?
> 4. Add to {{FileContext}}.
> 5. Take the opportunity to look at the flaws in today's {{create()}} calls
> and address them, rather than replicate. In particular, I'd like to end the
> behaviour "create all parent dirs.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]