I think when I wrote the S3 code, I couldn't see how to set storage class
per-bucket, so put it in a flag. It's easy to imagine a use case where
storage class differs per filespec, not only per bucket.

Jacob

On Fri, Mar 9, 2018 at 9:51 AM, Jacob Marble <jacobmar...@gmail.com> wrote:

> Yes, I agree with all of this.
>
> Jacob
>
> On Thu, Mar 8, 2018 at 9:52 PM, Robert Bradshaw <rober...@google.com>
> wrote:
>
>> On Thu, Mar 8, 2018 at 9:38 PM Eugene Kirpichov <kirpic...@google.com>
>> wrote:
>>
>>> I think it may have been an API design mistake to put the S3 region into
>>> PipelineOptions.
>>>
>>
>> +1, IMHO it's generally a mistake to put any transform configuration into
>> PipelineOptions for exactly this reason.
>>
>>
>>> PipelineOptions are global per pipeline, whereas it's totally reasonable
>>> to access S3 files in different regions even from the code of a single DoFn
>>> running on a single element. The same applies to "setS3StorageClass".
>>>
>>> Jacob: what do you think? Why is it necessary to specify the S3 region
>>> at all - can AWS infer it automatically? Per https://github.com/aws/aws
>>> -sdk-java/issues/1107 it seems that this is possible via a setting on
>>> the client, so that the specified region is used as the default but if the
>>> bucket is in a different region things still work.
>>>
>>> As for the storage class: so far nobody complained ;) but it should
>>> probably be specified via https://github.com/apache/
>>> beam/blob/master/sdks/java/core/src/main/java/org/apache/bea
>>> m/sdk/io/fs/CreateOptions.java instead of a pipeline option.
>>>
>>> On Thu, Mar 8, 2018 at 9:16 PM Romain Manni-Bucau <rmannibu...@gmail.com>
>>> wrote:
>>>
>>>> The "hint" would probably to use hints :) - indees this joke refers to
>>>> the hint thread.
>>>>
>>>> Long story short with hints you should be able to say "use that
>>>> specialize config here".
>>>>
>>>> Now, personally, I'd like to see a way to specialize config per
>>>> transform. With an hint an easy way is to use a prefix: --s3-region would
>>>> become --prefix_transform1-s3-region. But to impl it i have
>>>> https://github.com/apache/beam/pull/4683 which needs to be merged
>>>> before ;).
>>>>
>>>> Le 8 mars 2018 23:03, "Ismaël Mejía" <ieme...@gmail.com> a écrit :
>>>>
>>>>> I was trying to create a really simple pipeline that read from a
>>>>> bucket in a filesystem (s3) and writes to a different bucket in the
>>>>> same filesystem.
>>>>>
>>>>>     S3Options options =
>>>>> PipelineOptionsFactory.fromArgs(args).create().as(S3Options.class);
>>>>>     Pipeline pipeline = Pipeline.create(options);
>>>>>     pipeline
>>>>>       .apply("ReadLines", TextIO.read().from("s3://src-bucket/*"))
>>>>>       // .apply("AllOtherMagic", ...)
>>>>>       .apply("WriteCounts", TextIO.write().to("s3://dst-bucket/"));
>>>>>     p.run().waitUntilFinish();
>>>>>
>>>>> I discovered that my original bucket was in a different region so I
>>>>> needed to pass a different S3Options object to the Write
>>>>> ‘options.setAwsRegion(“dst-region”)’, but I could not find a way to do
>>>>> it. Can somebody give me a hint on how to do this?
>>>>>
>>>>> I was wondering that since File-based IOs use the configuration
>>>>> implied by the Filesystem if this was possible. With non-file based
>>>>> IOs all the configuration details are explicit in each specific
>>>>> transform, but this is not the case for these file-based transforms.
>>>>>
>>>>> Note. I know this question probably belongs more to user@ but since I
>>>>> couldn’t find an easy way to do it I was wondering if this is an issue
>>>>> we should consider at dev@ from an API point of view.
>>>>>
>>>>
>

Reply via email to