Yichi Zhang created BEAM-11154:
----------------------------------

             Summary: Missing coder in pipeline components with dataflow runner 
v2
                 Key: BEAM-11154
                 URL: https://issues.apache.org/jira/browse/BEAM-11154
             Project: Beam
          Issue Type: Bug
          Components: runner-dataflow
            Reporter: Yichi Zhang
            Assignee: Yichi Zhang


When running pipelines with Top combine function on dataflow runner v2, the 
backend complains about missing coder id for example missing BoundedHeapCoder1.

After some troubleshooting this problem seems more generic:

The step context translation phase would not recognize already registered Coder 
with incorrect hashCode() function, and will try to give it a new uniqified 
name to the pipeline_proto_coder_id,

code pointers:
https://github.com/apache/beam/blob/5675108933de6eb601ca2e4f21870d2ababe0ec7/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L268

In this case, since the comparator field in BoundedHeapCoder often does not 
implement hashCode() and equals() the BoundedHeapCoder will also have a 
different hashCode() each time a new instance is created. The duplicated coder 
does not exist in already translated pipeline proto and will lead to the 
aforementioned missing coder id issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to