That's cool! I was unaware of the Literal[T] type in Python. That's a very handy construct for easing into fancy types.
Not sure how non-language-nerds feel about any of this level of specificity but I really like yours. Kenn On Wed, May 14, 2025 at 8:02 PM Robert Bradshaw <rober...@waymo.com> wrote: > On Wed, May 14, 2025 at 3:49 PM Kenneth Knowles <k...@apache.org> wrote: > >> I wonder, then, if the phantom type trick might be less "extra burden for >> type safety" and more unambiguously helpful. It might looks something like >> this (I'm sure someone has written how to do this in Python properly but >> I'm just winging it) >> >> class OutputTag[T]: >> # a string, basically, and I will omit the boilerplate >> >> U = TypeVar("U") >> class TaggedOutput: >> @classmethod >> def of(cls, tag: OutputTag[U], value: U): >> return TaggedOutput(tag, value) >> >> outputTag = OutputTag[int]("number") >> >> def process(self, x) -> str|TaggedOutput: >> yield "x" >> yield TaggedOutput.of(outputTag, 5) >> > > For better or for worse, this hids the typing information from the > signature of the process method itself. Something like > > def process(self, x: str) -> ( > str > | TaggedOutput[Literal["tag1"], int] > | TaggedOutput[Literal["tag2"], str]): > ... > yield TaggedOutput.of("tag1", len(x)) > ... > > could be validated by standard (static) type checkers and also inform our > own typechecking. Having this available in the signature lets it be used > for parameters to Map and potentially in other places as well. > > Note that using strings makes it so one can write > > result.tag1 > > rather than > > result.get(outputTag1) > > as well as avoiding this "annotated string" type. (I suppose one could > argue that it's still good practice, as it would encourage shared constants > rather than string literals. I can see both sides of this, but I also think > it's somewhat orthogonal to how to extract the type information at pipeline > construction time.) > > > On Wed, May 14, 2025 at 2:59 PM Jack McCluskey <jrmcclus...@google.com> >> wrote: >> >>> Ah yeah, unfortunately tagged outputs currently inherit the output >>> typing of the parent DoFn. It's a bit of a pain, since a "correct" output >>> type hint becomes the union of all of the possible output types of the DoFn >>> (and becomes really hard to scope back down for DoFns that consume specific >>> tagged outputs!) I looked into it *very* briefly while wrangling some >>> type hinting breakages, but didn't spend too much time on it given other >>> work on my plate. It's definitely on the list of type hinting improvements >>> though. >>> >>> On Wed, May 14, 2025 at 2:14 PM Kenneth Knowles <k...@apache.org> wrote: >>> >>>> Replying to break the silence - in Java the DoFn is done according to >>>> the main output type, then phantom types on the output tag are used to make >>>> sure non-main outputs are type safe (I wouldn't expect this sort of >>>> technique in Python) >>>> >>>> Anyone who is more expert in Beam Python typing stuff? +Jack McCluskey >>>> <jrmcclus...@google.com> perhaps? >>>> >>>> Kenn >>>> >>>> On Fri, May 9, 2025 at 6:14 PM Joey Tran <joey.t...@schrodinger.com> >>>> wrote: >>>> >>>>> Seems like a hard problem. I suppose it could look something like: >>>>> ``` >>>>> def process(self, x) -> Iterable[str | TaggedOutputs[{"numbers": int]] >>>>> ``` >>>>> A little ugly... >>>>> >>>>> >>>>> >>>>> On Fri, May 9, 2025 at 6:00 PM Robert Bradshaw <rober...@waymo.com> >>>>> wrote: >>>>> >>>>>> Unfortunately type hints have not yet been implemented for >>>>>> multiple-ouput Fns (though I think perhaps Jack was looking into this?) >>>>>> >>>>>> On Fri, May 9, 2025 at 2:40 PM Joey Tran <joey.t...@schrodinger.com> >>>>>> wrote: >>>>>> >>>>>>> Is it to just type it based on the main output? >>>>>>> def process(self, x) -> str: >>>>>>> yield "x" >>>>>>> yield TaggedOutput("numbers", 5) >>>>>>> >>>>>>>