[ 
https://issues.apache.org/jira/browse/BEAM-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-3160:
----------------------------
    Description: 
We should prevent coder inference from assuming that two coders for the same 
type are interchangeable.

Two Avro GenericRecord coders with different schemas are considered identical 
and an arbitrary one is returned by the Coder/Type inference system if the 
GenericRecord type appears multiple times.
e.g.
*KvCoder.of(IterableCoder.of(AvroCoder.of(SchemaA)), 
IterableCoder.of(AvroCoder.of(SchemaB)))* after coder inference for the type 
*KV<Iterable<GenericRecord>, Iterable<GenericRecord>>* will return 
*KvCoder.of(IterableCoder.of(AvroCoder.of(SchemaX)), 
IterableCoder.of(AvroCoder.of(SchemaX)))* where SchemaX is either SchemaA or 
SchemaB.

Code:
https://github.com/apache/beam/blob/v2.1.1/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/CoderRegistry.java#L420
 and other Type -> Coder maps in the same file should prevent insertion if the 
type already exists and the coders aren't equal.

  was:
We should prevent coder inference from assuming that two coders for the same 
type are interchangeable.

Two Avro GenericRecord coders with different schemas are considered identical 
and an arbitrary one is returned by the Coder/Type inference system if the 
GenericRecord type appears multiple times.
e.g.
*KvCoder.of(AvroCoder.of(SchemaA), AvroCoder.of(SchemaB))* after coder 
inference for the type *KV<GenericRecord, GenericRecord>* will return 
*KvCoder.of(AvroCoder.of(SchemaX), AvroCoder.of(SchemaX))* where SchemaX is 
either SchemaA or SchemaB.

Code:
https://github.com/apache/beam/blob/v2.1.1/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/CoderRegistry.java#L420
 and other Type -> Coder maps in the same file should prevent insertion if the 
type already exists.


> Type based coder inference incorrectly assumes that a coder for one type is 
> equivalent to every other coder for that type.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-3160
>                 URL: https://issues.apache.org/jira/browse/BEAM-3160
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Luke Cwik
>             Fix For: 2.3.0
>
>
> We should prevent coder inference from assuming that two coders for the same 
> type are interchangeable.
> Two Avro GenericRecord coders with different schemas are considered identical 
> and an arbitrary one is returned by the Coder/Type inference system if the 
> GenericRecord type appears multiple times.
> e.g.
> *KvCoder.of(IterableCoder.of(AvroCoder.of(SchemaA)), 
> IterableCoder.of(AvroCoder.of(SchemaB)))* after coder inference for the type 
> *KV<Iterable<GenericRecord>, Iterable<GenericRecord>>* will return 
> *KvCoder.of(IterableCoder.of(AvroCoder.of(SchemaX)), 
> IterableCoder.of(AvroCoder.of(SchemaX)))* where SchemaX is either SchemaA or 
> SchemaB.
> Code:
> https://github.com/apache/beam/blob/v2.1.1/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/CoderRegistry.java#L420
>  and other Type -> Coder maps in the same file should prevent insertion if 
> the type already exists and the coders aren't equal.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to