I see. The problem is that you are trying to know certain properties of the
coder to use in a downstream transform which enforces that it is
deterministic like GroupByKey.

In all the scenarios so far that I have seen we have required both SDKs to
understand the coder, how are you having a cross language pipeline where
the downstream SDK doesn't understand the coder and works?

Also, an alternative strategy would be to tell the expansion service that
you need to choose a coder that is deterministic on the output. This would
require building the pipeline and before submission to the job server
perform the expansion telling it all the limitations that the SDK has
imposed on it.




On Tue, May 19, 2020 at 3:45 PM Sam Rohde <sro...@google.com> wrote:

> Hi all,
>
> Should there be more metadata in the Coder Proto? For example, adding an
> "is_deterministic" boolean field. This will allow for a language-agnostic
> way for SDKs to infer properties about a coder received from the expansion
> service.
>
> My motivation for this is that I recently ran into a problem in which an
> "ExternalCoder" in the Python SDK was erroneously marked as
> non-deterministic. The reason being is that the Coder proto doesn't have an
> "is_deterministic" and when the coder fails to be recreated in Python, the
> ExternalCoder defaults to False.
>
> Regards,
> Sam
>
>

Reply via email to