[ 
https://issues.apache.org/jira/browse/AURORA-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343085#comment-15343085
 ] 

Igor Morozov commented on AURORA-1722:
--------------------------------------

so here is specific use case we're targeting. Let's say there is a transient 
error in one of the job instances that is causing the task to crash. The 
scheduler keeps restarting the failing tasks so we end up accumulating dozens 
task statuses per instance.
We pull the status of all service instances to determine their health and act 
accordingly, there are systems been built around that.

How would we get the latest task status that was ran on every aurora instance 
using the approach with TaskQuery's limit that you are suggesting to use?


> Add new field to TaskQuery to allow querying latest statuses grouped by 
> instance id
> -----------------------------------------------------------------------------------
>
>                 Key: AURORA-1722
>                 URL: https://issues.apache.org/jira/browse/AURORA-1722
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>    Affects Versions: 0.16.0
>            Reporter: Igor Morozov
>
> Currently in order to get the status of all job instances both failed and 
> running one needs to issue a query for all task statuses, then group them by 
> instance id and sort by timestamp to get the lastest statuses per instance. 
> For tasks with a lot of churn that may cause unnecessary transferring huge 
> blobs of thrifts. 
> The proposal is to include new member into TaskQuery struct
> struct TaskQuery {
> ...
>   14: i32 limit_per_instance
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to