[ 
https://issues.apache.org/jira/browse/BEAM-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10277:
---------------------------------
    Description: 
h3. Problem/Status
The schema proto has an [encoding_position 
field|https://github.com/apache/beam/blob/2c619c81082839e054f16efee9311b9f74b6e436/model/pipeline/src/main/proto/schema.proto#L55]
 that is currently unused in every row coder implementation. The intention of 
this field is that it indicates an alternative order for the fields to be 
encoded in by [beam:coder:row:v1 
implementations|https://github.com/apache/beam/blob/1e60f383fb39b9ff8d44edcbe5357da4c1e52378/model/pipeline/src/main/proto/beam_runner_api.proto#L937-L990].
 Currently all the implementations ignore this field, and always encode the 
fields in the order that they appear in the schema.

h3. Motivation
The idea with the encoding position is that it will give runners away to 
enforce schema compatibility (BEAM-9502), by re-ordering the way fields are 
encoded when the schema changes between two job submissions. Schema changes 
could be due to fields re-ordering, or field additions/deletions.

h3. Code pointers
The Python beam:coder:row:v1 implementation lives in 
[row_coder.py|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/row_coder.py]
The Java implementation is a little more complicated, distributed between 
[SchemaCoder|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaCoder.java],
 
[RowCoder|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoder.java],
 and 
[RowCoderGenerator|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java].
 RowCoderGenerator contains the code relevant to this jira it uses bytebuddy to 
generate the code for 


row coder implementations (Java, Python, and Go) should use the encoding 
position to determine the order in which to encode/decode fields. This will 
allow runners to re-order fields to maintain compatibility when schemas change 
in a pipeline update.

  was:row coder implementations (Java, Python, and Go) should use the encoding 
position to determine the order in which to encode/decode fields. This will 
allow runners to re-order fields to maintain compatibility when schemas change 
in a pipeline update.


> beam:coder:row:v1 implementations should respect encoding_position
> ------------------------------------------------------------------
>
>                 Key: BEAM-10277
>                 URL: https://issues.apache.org/jira/browse/BEAM-10277
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core, sdk-py-core
>            Reporter: Brian Hulette
>            Priority: P3
>              Labels: Clarified
>
> h3. Problem/Status
> The schema proto has an [encoding_position 
> field|https://github.com/apache/beam/blob/2c619c81082839e054f16efee9311b9f74b6e436/model/pipeline/src/main/proto/schema.proto#L55]
>  that is currently unused in every row coder implementation. The intention of 
> this field is that it indicates an alternative order for the fields to be 
> encoded in by [beam:coder:row:v1 
> implementations|https://github.com/apache/beam/blob/1e60f383fb39b9ff8d44edcbe5357da4c1e52378/model/pipeline/src/main/proto/beam_runner_api.proto#L937-L990].
>  Currently all the implementations ignore this field, and always encode the 
> fields in the order that they appear in the schema.
> h3. Motivation
> The idea with the encoding position is that it will give runners away to 
> enforce schema compatibility (BEAM-9502), by re-ordering the way fields are 
> encoded when the schema changes between two job submissions. Schema changes 
> could be due to fields re-ordering, or field additions/deletions.
> h3. Code pointers
> The Python beam:coder:row:v1 implementation lives in 
> [row_coder.py|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/row_coder.py]
> The Java implementation is a little more complicated, distributed between 
> [SchemaCoder|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaCoder.java],
>  
> [RowCoder|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoder.java],
>  and 
> [RowCoderGenerator|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java].
>  RowCoderGenerator contains the code relevant to this jira it uses bytebuddy 
> to generate the code for 
> row coder implementations (Java, Python, and Go) should use the encoding 
> position to determine the order in which to encode/decode fields. This will 
> allow runners to re-order fields to maintain compatibility when schemas 
> change in a pipeline update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to