On Wed, Apr 3, 2024 at 4:15 AM Kenneth Knowles <k...@apache.org> wrote:

> Let me summarize the most recent proposal on-list to frame my question
> about this last suggestion. It looks like this:
>
> 1. user has an element, call it `data`
> 2. user maps `data` to an arbitrary metadata row, call it `dest`
> 3. we can do things like shuffle on `dest` because it isn't too big
> 4. we map `dest` to a concrete destination (aka URL) to write to by a
> string format that uses fields of `dest`
>
> I believe steps 1-3 are identical is expressivity to non-portable
> DynamicDestinations. So Reuven the question is for step 4: what are the
> mappings from `dest` to URL that cannot be expressed by string formatting
> but need SQL or Lua, etc? That would be a useful guide to consideration of
> those possibilities.
>

I think any non-trivial mapping can be done in step 2. It may be possible
to come up with a case where something other than string substitution is
needed to be done to make dest small enough to shuffle, but I think that'd
be a really rare corner case, and then it's just an optimization rather
than feature completeness question.


> FWIW I think even if we add a mini-language that string formatting has
> better ease of use (can easily be displayed in UI, etc) so it would be the
> first choice, and more advanced stuff is a fallback for rare cases. So they
> are both valuable and I'd be happy to implement the easier-to-use path
> right away while we discuss.
>

+1. Note that this even lets us share the config "path/table/..." field
that is a static string for non-dynamic destinations.

In light of the above, let's avoid a complex mini-language. I'd start with
nothing but plugging things in w/o any formatting options.


> On Tue, Apr 2, 2024 at 2:59 PM Reuven Lax via dev <dev@beam.apache.org>
> wrote:
>
>> I do suspect that over time we'll find more and more cases we can't
>> express, and will be asked to extend this little templating in more
>> directions. To head that off - could we easily just reuse an existing
>> language (SQL, LUA, something of the form?) instead of creating something
>> new?
>>
>> On Tue, Apr 2, 2024 at 8:55 AM Kenneth Knowles <k...@apache.org> wrote:
>>
>>> I really like this proposal. I think it has narrowed down and solved the
>>> essential problem of not shuffling excess redundant data, and also provides
>>> the vast majority of the functionality that a lambda would, with
>>> significantly better debugability and usability too, since the dynamic
>>> destination pattern string can be in display data, etc.
>>>
>>> Kenn
>>>
>>> On Wed, Mar 27, 2024 at 1:58 PM Robert Bradshaw via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> On Wed, Mar 27, 2024 at 10:20 AM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> Can the prefix still be generated programmatically at graph creation
>>>>> time?
>>>>>
>>>>
>>>> Yes. It's just a property of the transform passed by the user at
>>>> configuration time.
>>>>
>>>>
>>>>> On Wed, Mar 27, 2024 at 9:40 AM Robert Bradshaw <rober...@google.com>
>>>>> wrote:
>>>>>
>>>>>> On Wed, Mar 27, 2024 at 9:12 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> This does seem like the best compromise, though I think there will
>>>>>>> still end up being performance issues. A common pattern I've seen is 
>>>>>>> that
>>>>>>> there is a long common prefix to the dynamic destination followed the
>>>>>>> dynamic component. e.g. the destination might be
>>>>>>> long/common/path/to/destination/files/<per-user-file>. In this case, the
>>>>>>> prefix is often much larger than messages themselves and is what gets
>>>>>>> effectively encoded in the lambda.
>>>>>>>
>>>>>>
>>>>>> The idea here is that the destination would be given as a format
>>>>>> string, say, "long/common/path/to/destination/files/{dest_info.user}".
>>>>>> Another way to put this is that we support (only) "lambdas" that are
>>>>>> represented as string substitutions. (The fact that dest_info does not 
>>>>>> have
>>>>>> to be part of the record, and can be the output of an arbitrary map if 
>>>>>> need
>>>>>> be, makes this restriction not so bad.)
>>>>>>
>>>>>> As well as solving the performance issues, I think this is actually a
>>>>>> pretty convenient and natural way for the user to name their destination
>>>>>> (for the common usecase, even easier than providing a lambda), and has 
>>>>>> the
>>>>>> benefit of being much more transparent than an arbitrary callable as well
>>>>>> for introspection (for both machine and human that may look at the
>>>>>> resulting pipeline).
>>>>>>
>>>>>>
>>>>>>> I'm not entirely sure how to address this in a portable context. We
>>>>>>> might simply have to accept the extra overhead when going cross 
>>>>>>> language.
>>>>>>>
>>>>>>> Reuven
>>>>>>>
>>>>>>> On Wed, Mar 27, 2024 at 8:51 AM Robert Bradshaw via dev <
>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>
>>>>>>>> Thanks for putting this together, it will be a really
>>>>>>>> useful feature to have.
>>>>>>>>
>>>>>>>> I am in favor of the string-pattern approaches. I think we need to
>>>>>>>> support both the {record=..., dest_info=...} and the elide-fields
>>>>>>>> approaches, as the former is nicer when one has a fixed representation 
>>>>>>>> for
>>>>>>>> the output record (e.g. a proto or avro schema) and the flattened form 
>>>>>>>> for
>>>>>>>> ease of use in more free-form contexts (e.g. when producing records 
>>>>>>>> from
>>>>>>>> YAML and SQL).
>>>>>>>>
>>>>>>>> Also left some comments on the doc.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 27, 2024 at 6:51 AM Ahmed Abualsaud via dev <
>>>>>>>> dev@beam.apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hey all,
>>>>>>>>>
>>>>>>>>> There have been some conversations lately about how best to enable
>>>>>>>>> dynamic destinations in a portable context. Usually, this comes up for
>>>>>>>>> cross-language transforms and more recently for Beam YAML.
>>>>>>>>>
>>>>>>>>> I've started a short doc outlining some routes we could take. The
>>>>>>>>> purpose is to establish a good standard for supporting dynamic 
>>>>>>>>> destinations
>>>>>>>>> with portability, one that can be applied to most use cases and IOs. 
>>>>>>>>> Please
>>>>>>>>> take a look and add any thoughts!
>>>>>>>>>
>>>>>>>>> https://s.apache.org/portable-dynamic-destinations
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Ahmed
>>>>>>>>>
>>>>>>>>

Reply via email to