[
https://issues.apache.org/jira/browse/TEZ-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737061#comment-14737061
]
Hitesh Shah edited comment on TEZ-2792 at 9/9/15 3:49 PM:
----------------------------------------------------------
Comment on this particular section:
{code}
else if(vertexMinIDs.size() > 0) {
560 for (Integer vertexMinID : vertexMinIDs) {
561 Vertex vertex = getVertexFromIndex(dag, vertexMinID);
562 List<Task> vertexTasks = new
ArrayList<>(vertex.getTasks().values());
563 tasks.addAll(vertexTasks.subList(0,
Math.min(vertexTasks.size(), limit - tasks.size())));
564
565 if(tasks.size() >= limit) {
566 break;
567 }
568 }
569 }
570 else {
571 Collection<Vertex> vertices = dag.getVertices().values();
572 for (Vertex vertex : vertices) {
573 List<Task> vertexTasks = new
ArrayList<>(vertex.getTasks().values());
574 tasks.addAll(vertexTasks.subList(0,
Math.min(vertexTasks.size(), limit - tasks.size())));
575
576 if(tasks.size() >= limit) {
577 break;
578 }
579 }
580 }
581 }
{code}
Is there a reason why all objects are first copied over into an array list and
then a subset is pulled out?
Could a different approach be taken? For example, if the ask is minTaskId = 501
and limit/max = 100, then just search for a given task by Id ( i.e 501 to 600 )
and put all of them into an array instead of getting all 10000 task objects and
then splitting the array? This might require some changes to first check
vertex::numTasks.
was (Author: hitesh):
Comment on this particular section:
{code}
else if(vertexMinIDs.size() > 0) {
560 for (Integer vertexMinID : vertexMinIDs) {
561 Vertex vertex = getVertexFromIndex(dag, vertexMinID);
562 List<Task> vertexTasks = new
ArrayList<>(vertex.getTasks().values());
563 tasks.addAll(vertexTasks.subList(0,
Math.min(vertexTasks.size(), limit - tasks.size())));
564
565 if(tasks.size() >= limit) {
566 break;
567 }
568 }
569 }
570 else {
571 Collection<Vertex> vertices = dag.getVertices().values();
572 for (Vertex vertex : vertices) {
573 List<Task> vertexTasks = new
ArrayList<>(vertex.getTasks().values());
574 tasks.addAll(vertexTasks.subList(0,
Math.min(vertexTasks.size(), limit - tasks.size())));
575
576 if(tasks.size() >= limit) {
577 break;
578 }
579 }
580 }
581 }
{code}
Is there a reason why all objects are first copied over into an array list and
then a subset is pulled out?
Could a different approach be taken? For example, if the ask is minTaskId = 501
and limit/max = 100, then just search for a given task by Id ( i.e 501 to 600 )
and put all of them into an array instead of getting all 10000 task objects and
then splitting the array?
> Add AM web service API for tasks.
> ---------------------------------
>
> Key: TEZ-2792
> URL: https://issues.apache.org/jira/browse/TEZ-2792
> Project: Apache Tez
> Issue Type: Sub-task
> Components: UI
> Reporter: Sreenath Somarajapuram
> Assignee: Sreenath Somarajapuram
> Attachments: TEZ-2792.1.patch
>
>
> Add AM API for fetching realtime tasks info:
> - API endpoint : /ws/v2/tez/tasksInfo
> - Query Params:
> -- dagMinID: dagMinID = dagIndex, (mandatory).
> -- vertexMinID: A comma separated list. vertexMinID = vertexIndex.
> -- taskMinID: A comma separated list. taskMinID = vertexIndex_taskIndex
> -- limit: Maximum number of items to be returned (Defaults to 100).
> - If taskMinID is passed: All (capped by limit) the specified tasks will be
> returned. vertexMinID if present wont be considered.
> - IF vertexMinID is passed: All (capped by limit) tasks under the vertices
> will be returned.
> - If just dagID is passed: All (capped by limit) tasks under the DAG will be
> returned.
> - Data returned: complete task id, progress, status
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)