[
https://issues.apache.org/jira/browse/AURORA-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343085#comment-15343085
]
Igor Morozov commented on AURORA-1722:
--------------------------------------
so here is specific use case we're targeting. Let's say there is a transient
error in one of the job instances that is causing the task to crash. The
scheduler keeps restarting the failing tasks so we end up accumulating dozens
task statuses per instance.
We pull the status of all service instances to determine their health and act
accordingly, there are systems been built around that.
How would we get the latest task status that was ran on every aurora instance
using the approach with TaskQuery's limit that you are suggesting to use?
> Add new field to TaskQuery to allow querying latest statuses grouped by
> instance id
> -----------------------------------------------------------------------------------
>
> Key: AURORA-1722
> URL: https://issues.apache.org/jira/browse/AURORA-1722
> Project: Aurora
> Issue Type: Task
> Components: Scheduler
> Affects Versions: 0.16.0
> Reporter: Igor Morozov
>
> Currently in order to get the status of all job instances both failed and
> running one needs to issue a query for all task statuses, then group them by
> instance id and sort by timestamp to get the lastest statuses per instance.
> For tasks with a lot of churn that may cause unnecessary transferring huge
> blobs of thrifts.
> The proposal is to include new member into TaskQuery struct
> struct TaskQuery {
> ...
> 14: i32 limit_per_instance
> }
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)