je-ik commented on issue #31313:
URL: https://github.com/apache/beam/issues/31313#issuecomment-2130993220

   > Im surprised this is a bug considering restoring from a flink savepoint is 
a pretty common use case, is it possible there some configuration missing 
somewhere? I havent been able to find anyone else online experiencing this same 
issue but I was able to replicate it using both kinesis and kafka. Given how 
common of a use case it is, Im not 100% sure I believe this is in fact a bug 
and most likely some user error on my part.
   
   Can you please provide a minimal example and setup to reproduce the behavior?
   
   > I can make do without savepoints by utilizing kafka offset commits and 
consumer groups to ensure no data is lost, but cant figure out a way to not 
lose data that is windowed but not triggered when the flink application is 
stopped. Maybe you know of a solution to that problem?
   
   You can drain the Pipeline, see 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/cli/#terminating-a-job
   
   > it seems like a lot of the subtasks arent being utilized when stripping 
IDs with beam_fn_api despite the number of shards being 20 and parallelism 
being 24 (in theory should only be 4 idle subtasks)
   
   This is related to how Flink computes target splits. It is affected by 
maximal parallelism (which is computed automatically, if not specified). You 
can try increasing it via `--setMaxParallelism=32768` (32768 is maximal value), 
this could make the assignment more balanced.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to