On Wed, Mar 27, 2024 at 10:20 AM Reuven Lax <re...@google.com> wrote:

> Can the prefix still be generated programmatically at graph creation time?
>

Yes. It's just a property of the transform passed by the user at
configuration time.


> On Wed, Mar 27, 2024 at 9:40 AM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> On Wed, Mar 27, 2024 at 9:12 AM Reuven Lax <re...@google.com> wrote:
>>
>>> This does seem like the best compromise, though I think there will still
>>> end up being performance issues. A common pattern I've seen is that there
>>> is a long common prefix to the dynamic destination followed the dynamic
>>> component. e.g. the destination might be
>>> long/common/path/to/destination/files/<per-user-file>. In this case, the
>>> prefix is often much larger than messages themselves and is what gets
>>> effectively encoded in the lambda.
>>>
>>
>> The idea here is that the destination would be given as a format string,
>> say, "long/common/path/to/destination/files/{dest_info.user}". Another way
>> to put this is that we support (only) "lambdas" that are represented as
>> string substitutions. (The fact that dest_info does not have to be part of
>> the record, and can be the output of an arbitrary map if need be, makes
>> this restriction not so bad.)
>>
>> As well as solving the performance issues, I think this is actually a
>> pretty convenient and natural way for the user to name their destination
>> (for the common usecase, even easier than providing a lambda), and has the
>> benefit of being much more transparent than an arbitrary callable as well
>> for introspection (for both machine and human that may look at the
>> resulting pipeline).
>>
>>
>>> I'm not entirely sure how to address this in a portable context. We
>>> might simply have to accept the extra overhead when going cross language.
>>>
>>> Reuven
>>>
>>> On Wed, Mar 27, 2024 at 8:51 AM Robert Bradshaw via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> Thanks for putting this together, it will be a really useful feature to
>>>> have.
>>>>
>>>> I am in favor of the string-pattern approaches. I think we need to
>>>> support both the {record=..., dest_info=...} and the elide-fields
>>>> approaches, as the former is nicer when one has a fixed representation for
>>>> the output record (e.g. a proto or avro schema) and the flattened form for
>>>> ease of use in more free-form contexts (e.g. when producing records from
>>>> YAML and SQL).
>>>>
>>>> Also left some comments on the doc.
>>>>
>>>>
>>>> On Wed, Mar 27, 2024 at 6:51 AM Ahmed Abualsaud via dev <
>>>> dev@beam.apache.org> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>> There have been some conversations lately about how best to enable
>>>>> dynamic destinations in a portable context. Usually, this comes up for
>>>>> cross-language transforms and more recently for Beam YAML.
>>>>>
>>>>> I've started a short doc outlining some routes we could take. The
>>>>> purpose is to establish a good standard for supporting dynamic 
>>>>> destinations
>>>>> with portability, one that can be applied to most use cases and IOs. 
>>>>> Please
>>>>> take a look and add any thoughts!
>>>>>
>>>>> https://s.apache.org/portable-dynamic-destinations
>>>>>
>>>>> Best,
>>>>> Ahmed
>>>>>
>>>>

Reply via email to