twalthr opened a new pull request #14420:
URL: https://github.com/apache/flink/pull/14420
## What is the purpose of the change
This adds a name-based field mode to the `Row` class. It simplifies the
handling of large rows (possibly with hundreds of fields) and will make it
easier to switch between DataStream API and Table API.
See the documentation of Row:
```
* <p>Fields of a row can be accessed either position-based or name-based.
An implementer can decide
* in which field mode a row should operate during creation. Rows that were
produced by the framework
* support a hybrid of both field modes (i.e. named positions):
*
* <h1>Position-based field mode</h1>
*
* <p>{@link Row#withPositions(int)} creates a fixed-length row. The fields
can be accessed by position
* (zero-based) using {@link #getField(int)} and {@link #setField(int,
Object)}. Every field is initialized
* with {@code null} by default.
*
* <h1>Name-based field mode</h1>
*
* <p>{@link Row#withNames()} creates a variable-length row. The fields can
be accessed by name using
* {@link #getField(String)} and {@link #setField(String, Object)}. Every
field is initialized during
* the first call to {@link #setField(String, Object)} for the given name.
However, the framework will
* initialize missing fields with {@code null} and reorder all fields once
more type information is
* available during serialization or input conversion. Thus, even name-based
rows eventually become
* fixed-length composite types with a deterministic field order.
*
* <h1>Hybrid / named-position field mode</h1>
*
* <p>Rows that were produced by the framework (after deserialization or
output conversion) are fixed-length
* rows with a deterministic field order that can map static field names to
field positions. Thus, fields
* can be accessed both via {@link #getField(int)} and {@link
#getField(String)}. Both {@link #setField(int, Object)}
* and {@link #setField(String, Object)} are supported for existing fields.
However, adding new field
* names via {@link #setField(String, Object)} is not allowed. A hybrid
row's {@link #equals(Object)}
* supports comparing to all kinds of rows. A hybrid row's {@link
#hashCode()} is only valid for position-based
* rows.
```
## Brief change log
- Update `Row`
- Update `RowSerializer`
- Update `RowRowConverter`
## Verifying this change
This change added tests and can be verified as follows:
- `RowTest`
- `RowSerializerTest`
- `DataStructureConverterTest`
- `RowFunctionTest`
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: yes
- The serializers: yes
- The runtime per-record code paths (performance sensitive): yes
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
- The S3 file system connector: no
## Documentation
- Does this pull request introduce a new feature? yes
- If yes, how is the feature documented? JavaDocs
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]