That's cool! I was unaware of the Literal[T] type in Python. That's a very
handy construct for easing into fancy types.

Not sure how non-language-nerds feel about any of this level of specificity
but I really like yours.

Kenn

On Wed, May 14, 2025 at 8:02 PM Robert Bradshaw <rober...@waymo.com> wrote:

> On Wed, May 14, 2025 at 3:49 PM Kenneth Knowles <k...@apache.org> wrote:
>
>> I wonder, then, if the phantom type trick might be less "extra burden for
>> type safety" and more unambiguously helpful. It might looks something like
>> this (I'm sure someone has written how to do this in Python properly but
>> I'm just winging it)
>>
>> class OutputTag[T]:
>>   # a string, basically, and I will omit the boilerplate
>>
>> U = TypeVar("U")
>> class TaggedOutput:
>>   @classmethod
>>   def of(cls, tag: OutputTag[U], value: U):
>>     return TaggedOutput(tag, value)
>>
>> outputTag = OutputTag[int]("number")
>>
>> def process(self, x) -> str|TaggedOutput:
>>     yield "x"
>>     yield TaggedOutput.of(outputTag, 5)
>>
>
> For better or for worse, this hids the typing information from the
> signature of the process method itself. Something like
>
> def process(self, x: str) -> (
>     str
>     | TaggedOutput[Literal["tag1"], int]
>     | TaggedOutput[Literal["tag2"], str]):
>   ...
>   yield TaggedOutput.of("tag1", len(x))
>   ...
>
> could be validated by standard (static) type checkers and also inform our
> own typechecking. Having this available in the signature lets it be used
> for parameters to Map and potentially in other places as well.
>
> Note that using strings makes it so one can write
>
>   result.tag1
>
> rather than
>
>   result.get(outputTag1)
>
> as well as avoiding this "annotated string" type. (I suppose one could
> argue that it's still good practice, as it would encourage shared constants
> rather than string literals. I can see both sides of this, but I also think
> it's somewhat orthogonal to how to extract the type information at pipeline
> construction time.)
>
>
> On Wed, May 14, 2025 at 2:59 PM Jack McCluskey <jrmcclus...@google.com>
>> wrote:
>>
>>> Ah yeah, unfortunately tagged outputs currently inherit the output
>>> typing of the parent DoFn. It's a bit of a pain, since a "correct" output
>>> type hint becomes the union of all of the possible output types of the DoFn
>>> (and becomes really hard to scope back down for DoFns that consume specific
>>> tagged outputs!) I looked into it *very* briefly while wrangling some
>>> type hinting breakages, but didn't spend too much time on it given other
>>> work on my plate. It's definitely on the list of type hinting improvements
>>> though.
>>>
>>> On Wed, May 14, 2025 at 2:14 PM Kenneth Knowles <k...@apache.org> wrote:
>>>
>>>> Replying to break the silence - in Java the DoFn is done according to
>>>> the main output type, then phantom types on the output tag are used to make
>>>> sure non-main outputs are type safe (I wouldn't expect this sort of
>>>> technique in Python)
>>>>
>>>> Anyone who is more expert in Beam Python typing stuff? +Jack McCluskey
>>>> <jrmcclus...@google.com> perhaps?
>>>>
>>>> Kenn
>>>>
>>>> On Fri, May 9, 2025 at 6:14 PM Joey Tran <joey.t...@schrodinger.com>
>>>> wrote:
>>>>
>>>>> Seems like a hard problem. I suppose it could look something like:
>>>>> ```
>>>>> def process(self, x) -> Iterable[str | TaggedOutputs[{"numbers": int]]
>>>>> ```
>>>>> A little ugly...
>>>>>
>>>>>
>>>>>
>>>>> On Fri, May 9, 2025 at 6:00 PM Robert Bradshaw <rober...@waymo.com>
>>>>> wrote:
>>>>>
>>>>>> Unfortunately type hints have not yet been implemented for
>>>>>> multiple-ouput Fns (though I think perhaps Jack was looking into this?)
>>>>>>
>>>>>> On Fri, May 9, 2025 at 2:40 PM Joey Tran <joey.t...@schrodinger.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Is it to just type it based on the main output?
>>>>>>> def process(self, x) -> str:
>>>>>>>     yield "x"
>>>>>>>     yield TaggedOutput("numbers", 5)
>>>>>>>
>>>>>>>

Reply via email to