On Wed, Mar 27, 2024 at 9:12 AM Reuven Lax <re...@google.com> wrote:

> This does seem like the best compromise, though I think there will still
> end up being performance issues. A common pattern I've seen is that there
> is a long common prefix to the dynamic destination followed the dynamic
> component. e.g. the destination might be
> long/common/path/to/destination/files/<per-user-file>. In this case, the
> prefix is often much larger than messages themselves and is what gets
> effectively encoded in the lambda.
>

The idea here is that the destination would be given as a format string,
say, "long/common/path/to/destination/files/{dest_info.user}". Another way
to put this is that we support (only) "lambdas" that are represented as
string substitutions. (The fact that dest_info does not have to be part of
the record, and can be the output of an arbitrary map if need be, makes
this restriction not so bad.)

As well as solving the performance issues, I think this is actually a
pretty convenient and natural way for the user to name their destination
(for the common usecase, even easier than providing a lambda), and has the
benefit of being much more transparent than an arbitrary callable as well
for introspection (for both machine and human that may look at the
resulting pipeline).


> I'm not entirely sure how to address this in a portable context. We might
> simply have to accept the extra overhead when going cross language.
>
> Reuven
>
> On Wed, Mar 27, 2024 at 8:51 AM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> Thanks for putting this together, it will be a really useful feature to
>> have.
>>
>> I am in favor of the string-pattern approaches. I think we need to
>> support both the {record=..., dest_info=...} and the elide-fields
>> approaches, as the former is nicer when one has a fixed representation for
>> the output record (e.g. a proto or avro schema) and the flattened form for
>> ease of use in more free-form contexts (e.g. when producing records from
>> YAML and SQL).
>>
>> Also left some comments on the doc.
>>
>>
>> On Wed, Mar 27, 2024 at 6:51 AM Ahmed Abualsaud via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hey all,
>>>
>>> There have been some conversations lately about how best to enable
>>> dynamic destinations in a portable context. Usually, this comes up for
>>> cross-language transforms and more recently for Beam YAML.
>>>
>>> I've started a short doc outlining some routes we could take. The
>>> purpose is to establish a good standard for supporting dynamic destinations
>>> with portability, one that can be applied to most use cases and IOs. Please
>>> take a look and add any thoughts!
>>>
>>> https://s.apache.org/portable-dynamic-destinations
>>>
>>> Best,
>>> Ahmed
>>>
>>

Reply via email to