[
https://issues.apache.org/jira/browse/BEAM-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Beam JIRA Bot updated BEAM-10277:
---------------------------------
Labels: Clarified stale-assigned (was: Clarified)
> beam:coder:row:v1 implementations should respect encoding_position
> ------------------------------------------------------------------
>
> Key: BEAM-10277
> URL: https://issues.apache.org/jira/browse/BEAM-10277
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core, sdk-py-core
> Reporter: Brian Hulette
> Assignee: Irwin Alejandro Rodirguez Ramirez
> Priority: P3
> Labels: Clarified, stale-assigned
> Time Spent: 31h 50m
> Remaining Estimate: 0h
>
> h3. Problem/Status
> The schema proto has an [encoding_position
> field|https://github.com/apache/beam/blob/2c619c81082839e054f16efee9311b9f74b6e436/model/pipeline/src/main/proto/schema.proto#L55]
> that is currently unused in every row coder implementation. The intention of
> this field is that it indicates an alternative order for the fields to be
> encoded in by [beam:coder:row:v1
> implementations|https://github.com/apache/beam/blob/1e60f383fb39b9ff8d44edcbe5357da4c1e52378/model/pipeline/src/main/proto/beam_runner_api.proto#L937-L990].
> Currently all the implementations ignore this field, and always encode the
> fields in the order that they appear in the schema.
> h3. Motivation
> The idea with the encoding position is that it will give runners a way to
> enforce schema compatibility (BEAM-9502), by re-ordering the way fields are
> encoded when the schema changes between two job submissions. Schema changes
> could be due to fields re-ordering, or field additions/deletions.
> h3. Code pointers
> The Python beam:coder:row:v1 implementation lives in
> [row_coder.py|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/row_coder.py]
> The Java implementation is a little more complicated, distributed between
> [SchemaCoder|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaCoder.java],
>
> [RowCoder|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoder.java],
> and
> [RowCoderGenerator|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java].
> RowCoderGenerator contains the code relevant to this jira - it uses
> bytebuddy to generate Java code for the coder. We need it to generate code
> that puts fields in the order specified by encoding_position.
> h3. Testing
> Python and Java implementations should be tested with unit tests
> ([RowCoderTest|https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/RowCoderTest.java],
>
> [row_coder_test|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/row_coder_test.py]).
> We should also test them for compatibility by adding test cases that
> exercise the encoding_position in
> [standard_coders.yaml|https://github.com/apache/beam/blob/master/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml].
> These tests will be executed by
> [CommonCoderTest|https://github.com/apache/beam/blob/master/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java]
> and
> [standard_coders_test|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/standard_coders_test.py].
> There's some example code for generating a new test case
> [here|https://github.com/apache/beam/blob/master/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml#L387-L400].
--
This message was sent by Atlassian Jira
(v8.20.1#820001)