On Wed, May 14, 2025 at 3:49 PM Kenneth Knowles <k...@apache.org> wrote:
> I wonder, then, if the phantom type trick might be less "extra burden for > type safety" and more unambiguously helpful. It might looks something like > this (I'm sure someone has written how to do this in Python properly but > I'm just winging it) > > class OutputTag[T]: > # a string, basically, and I will omit the boilerplate > > U = TypeVar("U") > class TaggedOutput: > @classmethod > def of(cls, tag: OutputTag[U], value: U): > return TaggedOutput(tag, value) > > outputTag = OutputTag[int]("number") > > def process(self, x) -> str|TaggedOutput: > yield "x" > yield TaggedOutput.of(outputTag, 5) > For better or for worse, this hids the typing information from the signature of the process method itself. Something like def process(self, x: str) -> ( str | TaggedOutput[Literal["tag1"], int] | TaggedOutput[Literal["tag2"], str]): ... yield TaggedOutput.of("tag1", len(x)) ... could be validated by standard (static) type checkers and also inform our own typechecking. Having this available in the signature lets it be used for parameters to Map and potentially in other places as well. Note that using strings makes it so one can write result.tag1 rather than result.get(outputTag1) as well as avoiding this "annotated string" type. (I suppose one could argue that it's still good practice, as it would encourage shared constants rather than string literals. I can see both sides of this, but I also think it's somewhat orthogonal to how to extract the type information at pipeline construction time.) On Wed, May 14, 2025 at 2:59 PM Jack McCluskey <jrmcclus...@google.com> > wrote: > >> Ah yeah, unfortunately tagged outputs currently inherit the output typing >> of the parent DoFn. It's a bit of a pain, since a "correct" output type >> hint becomes the union of all of the possible output types of the DoFn (and >> becomes really hard to scope back down for DoFns that consume specific >> tagged outputs!) I looked into it *very* briefly while wrangling some >> type hinting breakages, but didn't spend too much time on it given other >> work on my plate. It's definitely on the list of type hinting improvements >> though. >> >> On Wed, May 14, 2025 at 2:14 PM Kenneth Knowles <k...@apache.org> wrote: >> >>> Replying to break the silence - in Java the DoFn is done according to >>> the main output type, then phantom types on the output tag are used to make >>> sure non-main outputs are type safe (I wouldn't expect this sort of >>> technique in Python) >>> >>> Anyone who is more expert in Beam Python typing stuff? +Jack McCluskey >>> <jrmcclus...@google.com> perhaps? >>> >>> Kenn >>> >>> On Fri, May 9, 2025 at 6:14 PM Joey Tran <joey.t...@schrodinger.com> >>> wrote: >>> >>>> Seems like a hard problem. I suppose it could look something like: >>>> ``` >>>> def process(self, x) -> Iterable[str | TaggedOutputs[{"numbers": int]] >>>> ``` >>>> A little ugly... >>>> >>>> >>>> >>>> On Fri, May 9, 2025 at 6:00 PM Robert Bradshaw <rober...@waymo.com> >>>> wrote: >>>> >>>>> Unfortunately type hints have not yet been implemented for >>>>> multiple-ouput Fns (though I think perhaps Jack was looking into this?) >>>>> >>>>> On Fri, May 9, 2025 at 2:40 PM Joey Tran <joey.t...@schrodinger.com> >>>>> wrote: >>>>> >>>>>> Is it to just type it based on the main output? >>>>>> def process(self, x) -> str: >>>>>> yield "x" >>>>>> yield TaggedOutput("numbers", 5) >>>>>> >>>>>>