twalthr opened a new pull request #16793:
URL: https://github.com/apache/flink/pull/16793
## What is the purpose of the change
This enables batch mode for `StreamTableEnvironment`.
Both `StreamExecutionEnvironment`, `TableEnvironment`, and
`StreamTableEnvironment` use `StreamGraphGenerator` with the same
configuration. Previous work ensured that when `execution.runtime-mode` is set
to `BATCH` all batch properties are either set consistently (e.g. shuffle mode)
or have no impact on the pipeline (e.g. auto watermark interval, state
backends).
Most of the changes are removing checks and ensuring that internal (e.g.
values) and external (e.g. data stream, table source) source transformations
are set to `BOUNDED`. The latter is a complex topic as we currently use 4
different ways of expressing external sources:
- `InputFormat`: Boundedness needs to be explicitly set by the planner due
to custom formats that don't extend from `FileInputFormat`.
- `SourceFunctionProvider`: Boundedness needs to be explicitly set by the
planner via custom transformation to also disable progressive watermarks.
- `DataStreamScanProvider`: Boundedness needs to be explicitly set by the
planner to ensure old behavior again. New source interfaces + `FileInputFormat`
are fine.
- `TransformationScanProvider`: Boundedness can be derived automatically and
will only work with new source interfaces + `FileInputFormat`.
## Brief change log
- Fully rely on `StreamGraphGenerator` to enable batch mode and batch
properties
- Set boundedness of sources to satisfy checks in `StreamGraphGenerator`
## Verifying this change
This change added tests and can be verified as follows:
`DataStreamJavaITCase` and manual tests and experiments
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: yes
- The serializers: no
- The runtime per-record code paths (performance sensitive): yes
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
- The S3 file system connector: no
## Documentation
- Does this pull request introduce a new feature? yes
- If yes, how is the feature documented? not documented yet
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]