[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265635#comment-15265635
 ] 

Varun Saxena edited comment on MAPREDUCE-6688 at 5/30/16 5:19 AM:
------------------------------------------------------------------

I actually wanted this point up for discussion. Forgot to mention it.
sync or async semantically is decided more on the basis of which entities we 
would want to publish immediately rather than if they have to be merged or not. 
Are configs something which have to be published immediately as part of sync 
put ?

There can be a fair argument in favor of sending together all entities in one 
shot for a sync though. But we can convert list to array outside as well. And 
for converting into an array I will have to first use a list anyways(as array 
size cannot be predetermined in some cases).

I guess you mean the same, but just to elaborate for others as well. 
The reason I am looping through a list and putting entities one by one instead 
of turning it into an array and publishing in a single put call is because of 
consideration to the fact that entities are merged together for async calls. 
>From what I remember of YARN-3367, we were waiting up to 10 TimelineEnties 
>object before publishing. Key is that we wait for 10 TimelineEntities objects 
>and not TimelineEntity ones. We do not check how many entities are wrapped 
>inside a single TimelineEntities object. Correct me if I am wrong.
If I pass an array of 10 entities, all those entities would be wrapped up in a 
single TimelineEntities object. And hence would count as a single addition to 
the queue. If I put them separately, it will be counted as 10 additions to the 
queue. Hence went with looping over.
Now, the reason I chose 100k as the limit was assuming that even if all 10 
entities go in single call, the payload size will be 1 M which IMO is fine 
enough. If 1M is not fine, we can change the limit size to something like 
50k(say).

Would like to hear views of others on the same.

bq. This solution looks fine as of now but would require changes if we adopt 
different approach for publishing metrics and configurations as per YARN-3401.
Even if we were to route our entities through RM, we would likely do that based 
on entity type(i.e. route entities with YARN entity type via RM). That is one 
solution which comes to my mind for YARN-3401.
In that case current structure of code should work well.


was (Author: varun_saxena):
I actually wanted this point up for discussion. Forgot to mention it.
sync or async semantically is decided more on the basis of which entities we 
would want to publish immediately rather than if they have to be merged or not. 
Are configs something which have to be published immediately as part of sync 
put ?

There can be a fair argument in favor of sending together all entities in one 
short for a sync though. But we can convert list to array outside as well. And 
for converting into an array I will have to first use a list anyways(as array 
size cannot be predetermined in some cases).

I guess you mean the same, but just to elaborate for others as well. 
The reason I am looping through a list and putting entities one by one instead 
of turning it into an array and publishing in a single put call is because of 
consideration to the fact that entities are merged together for async calls. 
>From what I remember of YARN-3367, we were waiting up to 10 TimelineEnties 
>object before publishing. Key is that we wait for 10 TimelineEntities objects 
>and not TimelineEntity ones. We do not check how many entities are wrapped 
>inside a single TimelineEntities object. Correct me if I am wrong.
If I pass an array of 10 entities, all those entities would be wrapped up in a 
single TimelineEntities object. And hence would count as a single addition to 
the queue. If I put them separately, it will be counted as 10 additions to the 
queue. Hence went with looping over.
Now, the reason I chose 100k as the limit was assuming that even if all 10 
entities go in single call, the payload size will be 1 M which IMO is fine 
enough. If 1M is not fine, we can change the limit size to something like 
50k(say).

Would like to hear views of others on the same.

bq. This solution looks fine as of now but would require changes if we adopt 
different approach for publishing metrics and configurations as per YARN-3401.
Even if we were to route our entities through RM, we would likely do that based 
on entity type(i.e. route entities with YARN entity type via RM). That is one 
solution which comes to my mind for YARN-3401.
In that case current structure of code should work well.

> Store job configurations in Timeline Service v2
> -----------------------------------------------
>
>                 Key: MAPREDUCE-6688
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6688
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster
>    Affects Versions: YARN-2928
>            Reporter: Junping Du
>            Assignee: Varun Saxena
>              Labels: yarn-2928-1st-milestone
>             Fix For: YARN-2928
>
>         Attachments: MAPREDUCE-6688-YARN-2928.01.patch, 
> MAPREDUCE-6688-YARN-2928.02.patch, MAPREDUCE-6688-YARN-2928.03.patch, 
> MAPREDUCE-6688-YARN-2928.04.patch, MAPREDUCE-6688-YARN-2928.v2.01.patch, 
> MAPREDUCE-6688-YARN-2928.v2.02.patch, YARN-3959-YARN-2928.01.patch
>
>
> We already have configuration field in HBase schema for application entity. 
> We need to make sure AM write it out when it get launched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to