TsReaper opened a new pull request #12867:
URL: https://github.com/apache/flink/pull/12867
## What is the purpose of the change
Currently `TableResult#collect` and `DataStreamUtils#collect` can only
produce results if users explicitly enable checkpoint for infinite streaming
jobs. It would be strange to require the users to do so if they just want to
take a look at their data.
This PR introduces collect iterator with at least once semantics and exactly
once semantics without fault tolerance. When calling the collect method, we
automatically pick an iterator for the user:
* If the user does not explicitly enable a checkpoint, we use exactly once
iterator without fault tolerance. That is to say, the iterator will throw
exception once the job restarts.
* If the user explicitly enables an exactly once checkpoint, we use the
current implementation of collect iterator.
* If the user explicitly enables an at least once checkpoint, we use the at
least once iterator. That is to say, the iterator ignores both checkpoint
information and job restarts.
## Brief change log
- Refactor tests for datastream / table collect
- Introduce collect iterator with at least once semantics and exactly once
semantics without fault tolerance
## Verifying this change
This change is already covered by existing datastream / table collect tests,
and this change added tests and can be verified by running those unit test
cases.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: no
- The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
- The S3 file system connector: no
## Documentation
- Does this pull request introduce a new feature? yes
- If yes, how is the feature documented? not applicable
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]