[GitHub] [flink] stevenzwu edited a comment on pull request #13574: [FLINK-18323] Add a Kafka source implementation based on FLIP-27.

GitBox Wed, 28 Oct 2020 09:47:19 -0700


stevenzwu edited a comment on pull request #13574:
URL: https://github.com/apache/flink/pull/13574#issuecomment-718047126



   > adding a failJob() method to the SplitEnumeratorContext so the 
SplitEnumerator implementations can decide by themselves what to do in each 
method. And any exception thrown from the method invocation will just result in 
the job failure.
   
   @becketqin I was going to ask for this feature, although it still depends on 
our design choice. Let me paste a email question regarding 
StaticFileSplitEnumerator that I shared with @StephanEwen offline.
   
   Right now, for the static mode, FileSource discovers the splits in the 
createEnumerator step. We are trying to evaluate if we should follow the same 
pattern for Iceberg source. Here are the pros and cons from my understanding
   * Pro: job will fail fast if split enumeration fails, which may be required 
for batch jobs. Is this the reason for the decision?
   * Con: job creation/submission can be slow since split enumeration can be 
slow (dozens of seconds or longer) for large table scans.
   
   Alternatively, Static*Enumerator initiate split discovery during 
SplitEnumerator.start(). This is how the Kafka source is implemented. Depends 
on the `partitionDiscoveryIntervalMs` config, `KafkaSourceEnumerator` calls 
`context.callAsync` once or periodically. Then the pros and cons got reversed.
   * Pro: job submission/creation is fast. we can add retries internally to 
handle enumeration failure.
   * Con: job may be stuck in a failure loop if split discovery fails. Once 
SplitEnumerator starts, there is no way to fail fast (which might be a critical 
issue for batch jobs).
   
   If we go with the later approach, that `SplitEnumeratorContext.failJob()` 
would be very useful. So that we can fail the batch/bounded job after the 
initial enumeration failed once or a few times after retries. if we are going 
to add `SplitEnumeratorContext.failJob()`, it should be done in [PR 
13784](https://github.com/apache/flink/pull/13784), right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] stevenzwu edited a comment on pull request #13574: [FLINK-18323] Add a Kafka source implementation based on FLIP-27.

Reply via email to