For now I think we should stick with registering different configurations of filesystems under different schemes so we should use s3a://, s3b://, and s3c://.
If you go down the route of enhancing S3Options (similar to HadoopFileSystemOptions) to be able to register multiple S3 filesystems under different schemes, you would get the fact that all runners would support transporting this configuration to the workers for free via PipelineOptions. Going with the route of saving the set of registered FileSystems to be available on workers is more work but in my opinion much more flexible and would allow for FileSystems to be decoupled from PipelineOptions and hence could be constructed and registered directly. What were you thinking? On Tue, Mar 13, 2018 at 9:44 AM Jacob Marble <jacobmar...@gmail.com> wrote: > Starting a new thread just for dealing with AWS regions better, context S3 > and Redshift. > > S3FileSystem.amazonS3 build could be refactored to select region based on > [1]: > 1. the flag value region > 2. the EC2 region, if found in environment (running in EC2 VM) > 3. the default region (us-east-1) > > For actually moving data, a Map<String, AmazonS3> could be used to hold an > S3 client per region, new S3 clients created as needed. The "master" client > can be used to find a bucket's region [2]. > > I think this is pragmatic, looking for feedback before I write a PR. Also, > if someone is already making progress, let me know. > > Jacob > > [1] > > https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-region-selection.html > > [2] > https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Client.html#getBucketLocation-java.lang.String- >