[ 
https://issues.apache.org/jira/browse/BEAM-11154?focusedWorklogId=506444&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506444
 ]

ASF GitHub Bot logged work on BEAM-11154:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 29/Oct/20 22:40
            Start Date: 29/Oct/20 22:40
    Worklog Time Spent: 10m 
      Work Description: kennknowles commented on a change in pull request 
#13225:
URL: https://github.com/apache/beam/pull/13225#discussion_r514607907



##########
File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java
##########
@@ -264,10 +264,18 @@ public String registerCoder(Coder<?> coder) throws 
IOException {
     if (existing != null) {
       return existing;
     }
+    // Unlike StructuredCoder, custom coders may not have proper 
implementation of hashCode() and
+    // equals(), this lead to unnecessary duplications. In order to avoid this 
we examine already
+    // registered coders and see if we can find a matching proto, and consider 
them same coder.
+    RunnerApi.Coder coderProto = CoderTranslation.toProto(coder, this);
+    for (Map.Entry<String, RunnerApi.Coder> entry : 
componentsBuilder.getCodersMap().entrySet()) {

Review comment:
       Keep an additional `Map<RunnerApi.Coder, String> coderProtoToId` for 
this? The objects in the map should be shared by reference so very little 
overhead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 506444)
    Time Spent: 0.5h  (was: 20m)

> Missing coder in pipeline components with dataflow runner v2
> ------------------------------------------------------------
>
>                 Key: BEAM-11154
>                 URL: https://issues.apache.org/jira/browse/BEAM-11154
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Yichi Zhang
>            Assignee: Yichi Zhang
>            Priority: P2
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When running pipelines with Top combine function on dataflow runner v2, the 
> backend complains about missing coder id for example missing 
> BoundedHeapCoder1.
> After some troubleshooting this problem seems more generic:
> The step context translation phase would not recognize already registered 
> Coder with incorrect hashCode() function, and will try to give it a new 
> uniqified name to the pipeline_proto_coder_id,
> code pointers:
> https://github.com/apache/beam/blob/5675108933de6eb601ca2e4f21870d2ababe0ec7/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L268
> In this case, since the comparator field in BoundedHeapCoder often does not 
> implement hashCode() and equals() the BoundedHeapCoder will also have a 
> different hashCode() each time a new instance is created. The duplicated 
> coder does not exist in already translated pipeline proto and will lead to 
> the aforementioned missing coder id issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to