[
https://issues.apache.org/jira/browse/AURORA-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343200#comment-15343200
]
Maxim Khutornenko commented on AURORA-1722:
-------------------------------------------
If the amount of transferred data is an issue there are plenty of ways to
optimize retrieval by batching the cluster pull. The first idea that comes to
mind would be batching by job keys. However, if that's not an option there are
a few ways to do it generically (i.e. without identifying a set of job keys):
- by status (active and terminal statuses can be pulled separately)
- by instance IDs (e.g.: batches of instances 0-99, 100-199...)
- by role or environment
- if none of the above fits there is always a way to paginate via limit and
offset fields
The external service pulling that data would aggregate batches sequentially (or
in parallel) to generate the cluster-wide view.
I definitely see how it could be tempting to attempt structural (i.e.
field-based) reduction but I'd like to explore other already available ways
before attempting that. After all, task events account anywhere from 20% (very
simple task) to less than 10% for a task with ports, metadata and other
TaskConfig/AssignedTask fields populated. It's easy to predict that any savings
from reducing the event count will be quickly negated as the number of cluster
tasks grows. As such, batching cluster pull seems like a better long-term
solution to me.
> Add new field to TaskQuery to allow querying latest statuses grouped by
> instance id
> -----------------------------------------------------------------------------------
>
> Key: AURORA-1722
> URL: https://issues.apache.org/jira/browse/AURORA-1722
> Project: Aurora
> Issue Type: Task
> Components: Scheduler
> Affects Versions: 0.16.0
> Reporter: Igor Morozov
>
> Currently in order to get the status of all job instances both failed and
> running one needs to issue a query for all task statuses, then group them by
> instance id and sort by timestamp to get the lastest statuses per instance.
> For tasks with a lot of churn that may cause unnecessary transferring huge
> blobs of thrifts.
> The proposal is to include new member into TaskQuery struct
> struct TaskQuery {
> ...
> 14: i32 limit_per_instance
> }
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)