[GitHub] [flink] twalthr opened a new pull request #14420: [FLINK-19981][core][table] Add name-based field mode for Row

GitBox Fri, 18 Dec 2020 00:40:33 -0800


twalthr opened a new pull request #14420:
URL: https://github.com/apache/flink/pull/14420



   ## What is the purpose of the change
   
   This adds a name-based field mode to the `Row` class. It simplifies the 
handling of large rows (possibly with hundreds of fields) and will make it 
easier to switch between DataStream API and Table API.
   
   See the documentation of Row:
   
   ```
    * <p>Fields of a row can be accessed either position-based or name-based. 
An implementer can decide
    * in which field mode a row should operate during creation. Rows that were 
produced by the framework
    * support a hybrid of both field modes (i.e. named positions):
    *
    * <h1>Position-based field mode</h1>
    *
    * <p>{@link Row#withPositions(int)} creates a fixed-length row. The fields 
can be accessed by position
    * (zero-based) using {@link #getField(int)} and {@link #setField(int, 
Object)}. Every field is initialized
    * with {@code null} by default.
    *
    * <h1>Name-based field mode</h1>
    *
    * <p>{@link Row#withNames()} creates a variable-length row. The fields can 
be accessed by name using
    * {@link #getField(String)} and {@link #setField(String, Object)}. Every 
field is initialized during
    * the first call to {@link #setField(String, Object)} for the given name. 
However, the framework will
    * initialize missing fields with {@code null} and reorder all fields once 
more type information is
    * available during serialization or input conversion. Thus, even name-based 
rows eventually become
    * fixed-length composite types with a deterministic field order.
    *
    * <h1>Hybrid / named-position field mode</h1>
    *
    * <p>Rows that were produced by the framework (after deserialization or 
output conversion) are fixed-length
    * rows with a deterministic field order that can map static field names to 
field positions. Thus, fields
    * can be accessed both via {@link #getField(int)} and {@link 
#getField(String)}. Both {@link #setField(int, Object)}
    * and {@link #setField(String, Object)} are supported for existing fields. 
However, adding new field
    * names via {@link #setField(String, Object)} is not allowed. A hybrid 
row's {@link #equals(Object)}
    * supports comparing to all kinds of rows. A hybrid row's {@link 
#hashCode()} is only valid for position-based
    * rows.
   ```
   
   
   ## Brief change log
   
   - Update `Row`
   - Update `RowSerializer`
   - Update `RowRowConverter`
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
   - `RowTest`
   - `RowSerializerTest`
   - `DataStructureConverterTest`
   - `RowFunctionTest`
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: yes
     - The serializers: yes
     - The runtime per-record code paths (performance sensitive): yes
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? yes
     - If yes, how is the feature documented? JavaDocs
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] twalthr opened a new pull request #14420: [FLINK-19981][core][table] Add name-based field mode for Row

Reply via email to