Agree. We need file-system abstractions in all languages since (1) users
may need to directly access file-systems from DoFns (2) common file-based
sources/sinks will probably will be available in multiple languages even
with portability API and cross language IO (these are usually the first
sources/sinks that get implemented in an SDK and server as reference
implementations for other sources/sinks).

- Cham

On Mon, Mar 12, 2018 at 10:48 AM Lukasz Cwik <lc...@google.com> wrote:

> There is still a lot of work before we get to supporting cross language
> transforms and hence get access to filesystems written in different
> languages but how the options are passed through from one to the other will
> need to be well understood and it would be best if the way a user defines
> these filesystems is the same in all languages because it would be annoying
> to provide the same configuration (in slightly different ways) for Java,
> Python, Go, ...
>
> On Fri, Mar 9, 2018 at 2:01 PM, Romain Manni-Bucau <rmannibu...@gmail.com>
> wrote:
>
>>
>>
>> Le 9 mars 2018 21:35, "Lukasz Cwik" <lc...@google.com> a écrit :
>>
>> The blocker is to get someone to follow through on the original design or
>> to get a new design (with feedback) and have it implemented.
>>
>>
>> If the pipelineoptionsfactory related pr are merged i can do a
>> pr/proposal bases on this thread draft this month.
>>
>>
>> Note that this impacts more than just Java as it also exists in Python
>> and Go as well.
>>
>>
>> Clearly outside my knowledge but since it is mainly java backed it should
>> be almost transparent no? If not should it be part of the portable api on
>> top of runners?
>>
>>
>> On Fri, Mar 9, 2018 at 12:18 PM, Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> Hmm, it doesnt solve the issue that beam doesnt enable to configure
>>> transform from its "config" (let say the cli).
>>>
>>> So if i have a generic pipeline taking a file as input and another as
>>> output then i must register 2 filesystems in all cases? If the pipeline is
>>> dynamic i must make it dynamic too?
>>>
>>> Sounds pretty bad for end users and not generic - all transform hit this
>>> issue since beam cant assume the impl. Using a prefix (namespace which can
>>> be implicit or not) is simple, straight forward and enables all cases to be
>>> handled smoothly for end users.
>>>
>>> What is the blocker to fix this design issue? I kind of fail to see why
>>> we end up on a few particular cases with workarounds right now :s.
>>>
>>> Le 9 mars 2018 19:00, "Jacob Marble" <jacobmar...@gmail.com> a écrit :
>>>
>>>> I think when I wrote the S3 code, I couldn't see how to set storage
>>>> class per-bucket, so put it in a flag. It's easy to imagine a use case
>>>> where storage class differs per filespec, not only per bucket.
>>>>
>>>> Jacob
>>>>
>>>> On Fri, Mar 9, 2018 at 9:51 AM, Jacob Marble <jacobmar...@gmail.com>
>>>> wrote:
>>>>
>>>>> Yes, I agree with all of this.
>>>>>
>>>>> Jacob
>>>>>
>>>>> On Thu, Mar 8, 2018 at 9:52 PM, Robert Bradshaw <rober...@google.com>
>>>>> wrote:
>>>>>
>>>>>> On Thu, Mar 8, 2018 at 9:38 PM Eugene Kirpichov <kirpic...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I think it may have been an API design mistake to put the S3 region
>>>>>>> into PipelineOptions.
>>>>>>>
>>>>>>
>>>>>> +1, IMHO it's generally a mistake to put any transform configuration
>>>>>> into PipelineOptions for exactly this reason.
>>>>>>
>>>>>>
>>>>>>> PipelineOptions are global per pipeline, whereas it's totally
>>>>>>> reasonable to access S3 files in different regions even from the code 
>>>>>>> of a
>>>>>>> single DoFn running on a single element. The same applies to
>>>>>>> "setS3StorageClass".
>>>>>>>
>>>>>>> Jacob: what do you think? Why is it necessary to specify the S3
>>>>>>> region at all - can AWS infer it automatically? Per
>>>>>>> https://github.com/aws/aws-sdk-java/issues/1107 it seems that this
>>>>>>> is possible via a setting on the client, so that the specified region is
>>>>>>> used as the default but if the bucket is in a different region things 
>>>>>>> still
>>>>>>> work.
>>>>>>>
>>>>>>> As for the storage class: so far nobody complained ;) but it should
>>>>>>> probably be specified via
>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/CreateOptions.java
>>>>>>>  instead
>>>>>>> of a pipeline option.
>>>>>>>
>>>>>>> On Thu, Mar 8, 2018 at 9:16 PM Romain Manni-Bucau <
>>>>>>> rmannibu...@gmail.com> wrote:
>>>>>>>
>>>>>>>> The "hint" would probably to use hints :) - indees this joke refers
>>>>>>>> to the hint thread.
>>>>>>>>
>>>>>>>> Long story short with hints you should be able to say "use that
>>>>>>>> specialize config here".
>>>>>>>>
>>>>>>>> Now, personally, I'd like to see a way to specialize config per
>>>>>>>> transform. With an hint an easy way is to use a prefix: --s3-region 
>>>>>>>> would
>>>>>>>> become --prefix_transform1-s3-region. But to impl it i have
>>>>>>>> https://github.com/apache/beam/pull/4683 which needs to be merged
>>>>>>>> before ;).
>>>>>>>>
>>>>>>>> Le 8 mars 2018 23:03, "Ismaël Mejía" <ieme...@gmail.com> a écrit :
>>>>>>>>
>>>>>>>>> I was trying to create a really simple pipeline that read from a
>>>>>>>>> bucket in a filesystem (s3) and writes to a different bucket in the
>>>>>>>>> same filesystem.
>>>>>>>>>
>>>>>>>>>     S3Options options =
>>>>>>>>> PipelineOptionsFactory.fromArgs(args).create().as(S3Options.class);
>>>>>>>>>     Pipeline pipeline = Pipeline.create(options);
>>>>>>>>>     pipeline
>>>>>>>>>       .apply("ReadLines", TextIO.read().from("s3://src-bucket/*"))
>>>>>>>>>       // .apply("AllOtherMagic", ...)
>>>>>>>>>       .apply("WriteCounts", TextIO.write().to("s3://dst-bucket/"));
>>>>>>>>>     p.run().waitUntilFinish();
>>>>>>>>>
>>>>>>>>> I discovered that my original bucket was in a different region so I
>>>>>>>>> needed to pass a different S3Options object to the Write
>>>>>>>>> ‘options.setAwsRegion(“dst-region”)’, but I could not find a way
>>>>>>>>> to do
>>>>>>>>> it. Can somebody give me a hint on how to do this?
>>>>>>>>>
>>>>>>>>> I was wondering that since File-based IOs use the configuration
>>>>>>>>> implied by the Filesystem if this was possible. With non-file based
>>>>>>>>> IOs all the configuration details are explicit in each specific
>>>>>>>>> transform, but this is not the case for these file-based
>>>>>>>>> transforms.
>>>>>>>>>
>>>>>>>>> Note. I know this question probably belongs more to user@ but
>>>>>>>>> since I
>>>>>>>>> couldn’t find an easy way to do it I was wondering if this is an
>>>>>>>>> issue
>>>>>>>>> we should consider at dev@ from an API point of view.
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>>
>>
>

Reply via email to