Sounds like consensus. (I do think option 2 might be useful for other contextful params, but that can be deferred.)
I'll put together a PR. On Fri, Feb 21, 2025 at 11:41 AM Kenneth Knowles <k...@apache.org> wrote: > +1 to option 1 > > > > On Fri, Feb 21, 2025 at 11:06 AM XQ Hu via dev <dev@beam.apache.org> > wrote: > >> +1 to ExtractWindowingInfo >> >> On Fri, Feb 21, 2025 at 10:55 AM Danny McCormick via dev < >> dev@beam.apache.org> wrote: >> >>> +1 to `ReifyWindowingInfo` (or maybe `ExtractWindowingInfo` or >>> `GetWindowing` is a little more understandable to the average user). I >>> definitely prefer something which doesn't require extending the set of >>> concepts/advanced usages we're exposing through Yaml, especially for a >>> feature that I think will not be heavily used (but if you need it, you need >>> it). >>> >>> As a rule, I think we should prefer a simple base language here with >>> higher level capabilities available through transforms when possible. It >>> will be a little more verbose, but more readable/searchable/learnable, and >>> it will preserve the base simplicity for the bulk of use cases. >>> >>> Thanks, >>> Danny >>> >>> On Thu, Feb 20, 2025 at 3:21 PM Robert Bradshaw via dev < >>> dev@beam.apache.org> wrote: >>> >>>> Currently our YAML API supports basic streaming, including setting >>>> windowing for aggregations, but there's no way to retrieve the >>>> windowing/timestamp metadata (short of stepping out of YAML proper and >>>> using Python, Java, etc. DoFn). It would probably be quite useful to have a >>>> more native way of getting this. >>>> >>>> One option would be to add a built-in transform to extract this >>>> information, e.g. something like >>>> >>>> - type: ReifyWindowingInfo >>>> config: >>>> new_field1: timestamp >>>> new_field2: window >>>> new_field3: window.end >>>> ... >>>> >>>> The possible values on the RHS of the map would be a fixed list; >>>> supporting things like window.end or pane_info.index would be desirable as >>>> their types are schema-compatible (unlike a raw Window or PaneInfo object). >>>> One could then use this information in downstream transforms. >>>> >>>> A second option would be to enhance MapToFields to make this >>>> information available. Currently this transform looks like >>>> >>>> - type: MapToFields >>>> config: >>>> language: python # java is also supported, javascript, etc. >>>> conceivable >>>> fields: >>>> output_field1: input_field + another_input_field >>>> output_field2: >>>> callable: | >>>> def my_inline_function(row): >>>> row.input_field + another_input_field >>>> ... >>>> >>>> The first case, called the "expression" case, is syntactic sugar that >>>> roughly reifies all[1] input fields as locals and translates to the second. >>>> >>>> For the second case, one could treat this similar to the process method >>>> of a DoFn and allow additional annotated arguments (e.g. >>>> ParDo.TimestampParam in Python, @Timestamp annotation for Java). We would >>>> detect and propagate this up to the generated DoFn. >>>> >>>> We could consider supporting the "expression" case via some magic >>>> variables (or a special namespace) or require the second form for this >>>> capability. >>>> >>>> We could, of course, offer both options as well. >>>> >>>> Anyone have any opinions or other ideas here? >>>> >>>> - Robert >>>> >>>> >>>> >>>> [1] As an optimization we only capture those locals that appear >>>> textually in the body of the expression. >>>> >>>