Yes I'm unclear on how a PCollection with ExternalCoder made it into a
downstream transform that enforces is_deterministic. My understanding of
ExternalCoder (admittedly just based on a quick look at commit history) is
that it's a shim added so the Python SDK can handle coders that are
internal to cross-language transforms.
I think that if the Python SDK is trying to introspect an ExternalCoder
instance then something is wrong.

Brian

On Tue, May 19, 2020 at 4:01 PM Luke Cwik <lc...@google.com> wrote:

> I see. The problem is that you are trying to know certain properties of
> the coder to use in a downstream transform which enforces that it is
> deterministic like GroupByKey.
>
> In all the scenarios so far that I have seen we have required both SDKs to
> understand the coder, how are you having a cross language pipeline where
> the downstream SDK doesn't understand the coder and works?
>
> Also, an alternative strategy would be to tell the expansion service that
> you need to choose a coder that is deterministic on the output. This would
> require building the pipeline and before submission to the job server
> perform the expansion telling it all the limitations that the SDK has
> imposed on it.
>
>
>
>
> On Tue, May 19, 2020 at 3:45 PM Sam Rohde <sro...@google.com> wrote:
>
>> Hi all,
>>
>> Should there be more metadata in the Coder Proto? For example, adding an
>> "is_deterministic" boolean field. This will allow for a language-agnostic
>> way for SDKs to infer properties about a coder received from the expansion
>> service.
>>
>> My motivation for this is that I recently ran into a problem in which an
>> "ExternalCoder" in the Python SDK was erroneously marked as
>> non-deterministic. The reason being is that the Coder proto doesn't have an
>> "is_deterministic" and when the coder fails to be recreated in Python, the
>> ExternalCoder defaults to False.
>>
>> Regards,
>> Sam
>>
>>

Reply via email to