[ 
https://issues.apache.org/jira/browse/BEAM-10277?focusedWorklogId=590061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-590061
 ]

ASF GitHub Bot logged work on BEAM-10277:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Apr/21 00:07
            Start Date: 28/Apr/21 00:07
    Worklog Time Spent: 10m 
      Work Description: reuvenlax commented on pull request #14639:
URL: https://github.com/apache/beam/pull/14639#issuecomment-827991186


   proto3 does not provide has_xxx. the previous check I had is not perfectly
   reliable since it tries to deduce whether encoding_position is set, so
   there are edge cases that could cause data corruption. For that reason, I
   preferred having an explicit bit.
   
   On Tue, Apr 27, 2021 at 3:08 PM Brian Hulette ***@***.***>
   wrote:
   
   > *@TheNeuralBit* commented on this pull request.
   > ------------------------------
   >
   > In model/pipeline/src/main/proto/schema.proto
   > <https://github.com/apache/beam/pull/14639#discussion_r621638446>:
   >
   > > @@ -37,6 +37,8 @@ message Schema {
   >    // REQUIRED. An RFC 4122 UUID.
   >    string id = 2;
   >    repeated Option options = 3;
   > +  // Indicates that encoding positions have been overridden.
   > +  bool encoding_positions_set = 4;
   >
   > Why do we need to add this to the proto? Can't we continue to rely on
   > checking if encoding_position is set on the fields?
   >
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/beam/pull/14639#pullrequestreview-646400569>,
   > or unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AFAYJVPK3GLWPJ4JZZTRQZTTK4YVBANCNFSM43RPJLCQ>
   > .
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 590061)
    Time Spent: 2h 50m  (was: 2h 40m)

> beam:coder:row:v1 implementations should respect encoding_position
> ------------------------------------------------------------------
>
>                 Key: BEAM-10277
>                 URL: https://issues.apache.org/jira/browse/BEAM-10277
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core, sdk-py-core
>            Reporter: Brian Hulette
>            Priority: P3
>              Labels: Clarified
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> h3. Problem/Status
> The schema proto has an [encoding_position 
> field|https://github.com/apache/beam/blob/2c619c81082839e054f16efee9311b9f74b6e436/model/pipeline/src/main/proto/schema.proto#L55]
>  that is currently unused in every row coder implementation. The intention of 
> this field is that it indicates an alternative order for the fields to be 
> encoded in by [beam:coder:row:v1 
> implementations|https://github.com/apache/beam/blob/1e60f383fb39b9ff8d44edcbe5357da4c1e52378/model/pipeline/src/main/proto/beam_runner_api.proto#L937-L990].
>  Currently all the implementations ignore this field, and always encode the 
> fields in the order that they appear in the schema.
> h3. Motivation
> The idea with the encoding position is that it will give runners away to 
> enforce schema compatibility (BEAM-9502), by re-ordering the way fields are 
> encoded when the schema changes between two job submissions. Schema changes 
> could be due to fields re-ordering, or field additions/deletions.
> h3. Code pointers
> The Python beam:coder:row:v1 implementation lives in 
> [row_coder.py|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/row_coder.py]
> The Java implementation is a little more complicated, distributed between 
> [SchemaCoder|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaCoder.java],
>  
> [RowCoder|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoder.java],
>  and 
> [RowCoderGenerator|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java].
>  RowCoderGenerator contains the code relevant to this jira - it uses 
> bytebuddy to generate Java code for the coder. We need it to generate code 
> that puts fields in the order specified by encoding_position.
> h3. Testing
> Python and Java implementations should be tested with unit tests 
> ([RowCoderTest|https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/coders/RowCoderTest.java],
>  
> [row_coder_test|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/row_coder_test.py]).
>  We should also test them for compatibility by adding test cases that 
> exercise the encoding_position in 
> [standard_coders.yaml|https://github.com/apache/beam/blob/master/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml].
>  These tests will be executed by 
> [CommonCoderTest|https://github.com/apache/beam/blob/master/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java]
>  and 
> [standard_coders_test|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/standard_coders_test.py].
>  There's some example code for generating a new test case 
> [here|https://github.com/apache/beam/blob/master/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml#L387-L400].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to