1996fanrui commented on PR #801:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/801#issuecomment-2017164952

   Thanks @mxm for the review and discussion!
   
   > > This issue only affects the standalone autoscaler as the kubernetes 
operator has this logic already in place for setting the RUNNING state. Can we 
somehow deduplicate this logic?
   > 
   > Is that really the case? AFAIK we only check for a RUNNING job state.
   
   `AbstractFlinkService#getEffectiveStatus` adjusts the `JobStatus.RUNNING` to 
`JobStatus.CREATED`, thanks @gyfora for helping find it. I didn't extract it as 
a common class due to @gyfora mentioned `autoscaler` may be moved to the 
separated repo, so it's better to copy related logic to `autoscaler standalone` 
module.
   
   > This looks related to #699 which took a different approach by ignoring 
certain exceptions during the stabilization phase and effectively postponing 
metric collection.
   
   The adjustment logic is introduced before #699 , it means the some of 
metrics may be not ready even if all tasks are running(I guess some metrics are 
generated after running). That's what exactly what #699  solved.
   
   Why do we need to adjust the JobStatus?
   
   - If some of tasks are not running, autoscaler doesn't need to call metric 
collection related logic.
   - If `job.autoscaler.stabilization.interval` is set to small value by users, 
it's easy to throw metric not found exception.
   - As I understand, `job.autoscaler.stabilization.interval` hopes to filter 
out unstable metrics when all tasks just start running. 
     - For example, job starts at `09:00:00`, and all tasks start running at 
`09:03:00`, and  `job.autoscaler.stabilization.interval` = 1 min.
     - We hopes the stabilization period is `09:03:00` to `09:04:00` instead of 
`09:00:00` to `09:01:00`, right?
     - All tasks starts since `09:03:00`, so the metric may be not stable from 
`09:03:00` to `09:04:00`.
     - Of course, this issue might needs FLINK-34907 as well.
   
   Please correct me if anything is wrong, thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to