[ 
https://issues.apache.org/jira/browse/AURORA-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343200#comment-15343200
 ] 

Maxim Khutornenko commented on AURORA-1722:
-------------------------------------------

If the amount of transferred data is an issue there are plenty of ways to 
optimize retrieval by batching the cluster pull. The first idea that comes to 
mind would be batching by job keys. However, if that's not an option there are 
a few ways to do it generically (i.e. without identifying a set of job keys):
- by status (active and terminal statuses can be pulled separately)
- by instance IDs (e.g.: batches of instances 0-99, 100-199...)
- by role or environment
- if none of the above fits there is always a way to paginate via limit and 
offset fields

The external service pulling that data would aggregate batches sequentially (or 
in parallel) to generate the cluster-wide view.

I definitely see how it could be tempting to attempt structural (i.e. 
field-based) reduction but I'd like to explore other already available ways 
before attempting that. After all, task events account anywhere from 20% (very 
simple task) to less than 10% for a task with ports, metadata and other 
TaskConfig/AssignedTask fields populated. It's easy to predict that any savings 
from reducing the event count will be quickly negated as the number of cluster 
tasks grows. As such, batching cluster pull seems like a better long-term 
solution to me.

> Add new field to TaskQuery to allow querying latest statuses grouped by 
> instance id
> -----------------------------------------------------------------------------------
>
>                 Key: AURORA-1722
>                 URL: https://issues.apache.org/jira/browse/AURORA-1722
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>    Affects Versions: 0.16.0
>            Reporter: Igor Morozov
>
> Currently in order to get the status of all job instances both failed and 
> running one needs to issue a query for all task statuses, then group them by 
> instance id and sort by timestamp to get the lastest statuses per instance. 
> For tasks with a lot of churn that may cause unnecessary transferring huge 
> blobs of thrifts. 
> The proposal is to include new member into TaskQuery struct
> struct TaskQuery {
> ...
>   14: i32 limit_per_instance
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to