[ 
https://issues.apache.org/jira/browse/BEAM-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983958#comment-15983958
 ] 

Stephen Sisk commented on BEAM-2031:
------------------------------------

yeah - in that doc, I think that "2. Construct FileSystemConfig (conceptually a 
serializable map)" is the world I'm hoping to live in :)

Luke and I were talking, we think that there's a possible way to make multiple 
hadoopfilesystem configurations work - if the below assumptions are true.

Assumptions:
* fs.default.name is always set on Hadoop Configurations used to connect to 
filesystems
* fs.default.name always represents a unique prefix for different 
servers/useful configurations for user's purposes
* the user always uses prefixes that match to fs.default.name
(I'm not sure if those assumptions are true or not given my naivete in the 
hadoop ecosystem)

Given those, we could:
* Allow the user to provide a list of configurations (via pipelineoptions)
* Register for the unique set of schemes present in the configurations (might 
require some small changes to allow this to work)
* Inside of HadoopFileSystem, maintain a map of fs.default.name -> configuration
* When hadoop file system is given a uri, it would just look up the 
configuration based on the prefix, and then use that configuration.

This is aspirational for first stable release, but if anyone has insights into 
whether or not those assumptions are true, that'd be useful.

This may be moot if we use option 2 (Construct FileSystemConfig) in davor's doc.

> Hadoop FileSystem needs to receive Hadoop Configuration
> -------------------------------------------------------
>
>                 Key: BEAM-2031
>                 URL: https://issues.apache.org/jira/browse/BEAM-2031
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-java-extensions
>            Reporter: Stephen Sisk
>            Assignee: Stephen Sisk
>             Fix For: First stable release
>
>
> Since Beam FileSystem objects are configured via PipelineOptions, we need to 
> pass a Hadoop Configuration through PipelineOptions. I think that's very 
> solvable, but it does seem semi-complicated.
> cc [[email protected]] I believe you mentioned in the past that you had an 
> answer to this - is that written down anywhere?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to