[ 
https://issues.apache.org/jira/browse/HADOOP-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15990799#comment-15990799
 ] 

Steve Loughran edited comment on HADOOP-14365 at 5/1/17 6:16 PM:
-----------------------------------------------------------------

Also, I want some generic setOption calls which can let me set bool, long, int
{code}
builder = fs.createFile("s3a://stevel/datasets/set1")
builder.setOption("s3a:encryption", true)
builder.setOption("s3a:encryption.kms.key", "AAZIF")
builder.setOption("s3a:acls", "aclinfo1", "aclInfo2", "aclInfo3")
{code}

Today we can only set those options in an FS by FS basis; indeed, it's only 
since HADOOP-13336 that we've had per-bucket config. Making it per file would 
be one more step change.

If we can set options this way, there's no need to have separate methods for 
every feature which is added. Equally critically, it stops me having to cast 
the FS into the FS client which I require to set an option. For example to play 
with favored nodes in HFDS I have to 

{code}
FileSystem fs =  FileSystem.getDefaultFS(conf, destPath);
FSDataOutputStream out;
if (fs instanceof  DistributedFileSystem) {
   dfs = (DistributedFileSystem) fs;
   builder = dfs.newFSDataOutputStreamBuilder(destPath)
   builder.setFavoredNodes(dns)
   out= builder.build();
} else {
  out = fs.newFSDataOutputStreamBuilder(destPath).build
}
{code}
It just gets too convoluted fast, especially if there's options for Azure 
different from S3A from HDFS.
Even worse: if we did add object-store specific builders, you'd need them on 
the CP before your code can use them. Maybe you can get away with that 
assumption for HDFS, but we can't for the others, especially when there are 
some (google GCS) which aren't even in the Hadoop codebase.

I really like this idea; if it works we could think of adding an openFile() 
operation to be similar; let us set fadvise = random option, retry policy, etc, 
etc.





was (Author: [email protected]):
Also, I want some generic setOption calls which can let me set bool, long, int
{code}
builder = fs.createFile("s3a://stevel/datasets/set1")
builder.setOption("s3a:encryption", true)
builder.setOption("s3a:encryption.kms.key", "AAZIF")
builder.setOption("s3a:acls", "aclinfo1", "aclInfo2", "aclInfo3")
{code}

Today we can only set those options in an FS by FS basis; indeed, it's only 
since HADOOP-13336 that we've had per-bucket config. Making it per file would 
be one more step change.

If we can set options this way, there's no need to have separate methods for 
every feature which is added. Equally critically, it stops me having to cast 
the FS into the FS client which I require to set an option. For example to play 
with favored nodes in HFDS I have to 

{code}
FileSystem fs =  FileSystem.getDefaultFS(conf, destPath);
FSDataOutputStream out;
if (fs instanceof  DistributedFileSystem) {
   dfs = (DistributedFileSystem) fs;
   builder = dfs.newFSDataOutputStreamBuilder(destPath)
   builder.setFavoredNodes(dns)
   out= builder.build();
} else {
  out = fs.newFSDataOutputStreamBuilder(destPath).build
}

It just gets too convoluted fast, especially if there's options for Azure 
different from S3A from HDFS.
Even worse: if we did add object-store specific builders, you'd need them on 
the CP before your code can use them. Maybe you can get away with that 
assumption for HDFS, but we can't for the others, especially when there are 
some (google GCS) which aren't even in the Hadoop codebase.

I really like this idea; if it works we could think of adding an openFile() 
operation to be similar; let us set fadvise = random option, retry policy, etc, 
etc.




> Stabilise FileSystem builder-based create API 
> ----------------------------------------------
>
>                 Key: HADOOP-14365
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14365
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Priority: Blocker
>
> HDFS-11170 added a builder-based create API for file creation which has a few 
> issues to work out before it can be considered ready for use
> 1. There no specification in the filesystem.md of what it is meant to do, 
> which means there's no public documentation on expected behaviour except on 
> the Javadocs, which consists of the sentences "Create a new 
> FSDataOutputStreamBuilder for the file with path" and "Base of specific file 
> system FSDataOutputStreamBuilder".
> I propose:
> # Give the new method a relevant name rather than just define the return 
> type, e.g. {{createFile()}}. 
> # `Filesystem.md` to be extended with coverage of this method, and, sadly for 
> the authors, coverage of what the semantics of 
> {{FSDataOutputStreamBuilder.build()}} are.
> 2. There are only tests for HDFS and local, neither of them perfect. 
> Proposed: move to {{AbstractContractCreateTest}}, test for all filesystems, 
> fix tests and FS where appropriate. 
> 3. Add more tests to generate the failure conditions implied by the updated 
> filesystem spec. Eg. create over a an existing file, create over a directory, 
> create with negative buffer size, negative block size, empty dest path, etc, 
> etc. 
> This will clarify when precondition checks are made, as well as whether. For 
> example: should {{newFSDataOutputStreamBuilder()}} validate the path 
> immediately?
> 4. Add to {{FileContext}}.
> 5. Take the opportunity to look at the flaws in today's {{create()}} calls 
> and address them, rather than replicate. In particular, I'd like to end the 
> behaviour "create all parent dirs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to