I really like this proposal. I think it has narrowed down and solved the essential problem of not shuffling excess redundant data, and also provides the vast majority of the functionality that a lambda would, with significantly better debugability and usability too, since the dynamic destination pattern string can be in display data, etc.
Kenn On Wed, Mar 27, 2024 at 1:58 PM Robert Bradshaw via dev <dev@beam.apache.org> wrote: > On Wed, Mar 27, 2024 at 10:20 AM Reuven Lax <re...@google.com> wrote: > >> Can the prefix still be generated programmatically at graph creation time? >> > > Yes. It's just a property of the transform passed by the user at > configuration time. > > >> On Wed, Mar 27, 2024 at 9:40 AM Robert Bradshaw <rober...@google.com> >> wrote: >> >>> On Wed, Mar 27, 2024 at 9:12 AM Reuven Lax <re...@google.com> wrote: >>> >>>> This does seem like the best compromise, though I think there will >>>> still end up being performance issues. A common pattern I've seen is that >>>> there is a long common prefix to the dynamic destination followed the >>>> dynamic component. e.g. the destination might be >>>> long/common/path/to/destination/files/<per-user-file>. In this case, the >>>> prefix is often much larger than messages themselves and is what gets >>>> effectively encoded in the lambda. >>>> >>> >>> The idea here is that the destination would be given as a format string, >>> say, "long/common/path/to/destination/files/{dest_info.user}". Another way >>> to put this is that we support (only) "lambdas" that are represented as >>> string substitutions. (The fact that dest_info does not have to be part of >>> the record, and can be the output of an arbitrary map if need be, makes >>> this restriction not so bad.) >>> >>> As well as solving the performance issues, I think this is actually a >>> pretty convenient and natural way for the user to name their destination >>> (for the common usecase, even easier than providing a lambda), and has the >>> benefit of being much more transparent than an arbitrary callable as well >>> for introspection (for both machine and human that may look at the >>> resulting pipeline). >>> >>> >>>> I'm not entirely sure how to address this in a portable context. We >>>> might simply have to accept the extra overhead when going cross language. >>>> >>>> Reuven >>>> >>>> On Wed, Mar 27, 2024 at 8:51 AM Robert Bradshaw via dev < >>>> dev@beam.apache.org> wrote: >>>> >>>>> Thanks for putting this together, it will be a really useful feature >>>>> to have. >>>>> >>>>> I am in favor of the string-pattern approaches. I think we need to >>>>> support both the {record=..., dest_info=...} and the elide-fields >>>>> approaches, as the former is nicer when one has a fixed representation for >>>>> the output record (e.g. a proto or avro schema) and the flattened form for >>>>> ease of use in more free-form contexts (e.g. when producing records from >>>>> YAML and SQL). >>>>> >>>>> Also left some comments on the doc. >>>>> >>>>> >>>>> On Wed, Mar 27, 2024 at 6:51 AM Ahmed Abualsaud via dev < >>>>> dev@beam.apache.org> wrote: >>>>> >>>>>> Hey all, >>>>>> >>>>>> There have been some conversations lately about how best to enable >>>>>> dynamic destinations in a portable context. Usually, this comes up for >>>>>> cross-language transforms and more recently for Beam YAML. >>>>>> >>>>>> I've started a short doc outlining some routes we could take. The >>>>>> purpose is to establish a good standard for supporting dynamic >>>>>> destinations >>>>>> with portability, one that can be applied to most use cases and IOs. >>>>>> Please >>>>>> take a look and add any thoughts! >>>>>> >>>>>> https://s.apache.org/portable-dynamic-destinations >>>>>> >>>>>> Best, >>>>>> Ahmed >>>>>> >>>>>