[ 
https://issues.apache.org/jira/browse/TEZ-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095588#comment-17095588
 ] 

László Bodor commented on TEZ-4137:
-----------------------------------

[~mustafaiman]: I've started to digest your patch...as a conversation starter, 
could you please represent through a hive + tez example how this patch can 
improve the user payload handling in upstream components (especially if you 
have a corresponding HIVE- ticket already)?
 so as far as I understood, the most important part of this work is to replace
{code:java}
TezUtils.createConfFromUserPayload(getContext().getUserPayload());
{code}
(which was about [parsing the full 
payload|https://github.com/apache/tez/blob/0eeef27413db97b52242878301788ac5fd8def16/tez-api/src/main/java/org/apache/tez/common/TezUtils.java#L99]
 in order to initialize input/output/processor)
 with
{code:java}
TezUtils.createConfFromBaseConfAndPayload(getContext())
{code}
which is about to rely on [context's 
configuration|https://github.com/apache/tez/pull/64/commits/1ef1377679d57b5bd586e8617a90db83f2a78af0#diff-15940b032e9c868635d1ffd095f7daa5R141])
 this conf object, which is returned by getContainerConfiguration(), is coming 
from TezTaskRunner2 by merging default conf and taskspec conf, 
[here|https://github.com/apache/tez/blob/master/tez-runtime-internals/src/main/java/org/apache/tez/runtime/task/TezTaskRunner2.java#L142-L149]
 btw, on AM side, taskSpec.conf is simply a 
[vertexOnlyConf|https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java#L1754]

from hive side, [here is how a vertex user payload is 
created|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L907]
I'm assuming that with this patch, the intention is to reduce this payload size 
for instance, as, looking at createVertexFromReduceWork, it sets some vertex 
specific props, but it contains all the parent hive configuration
could you please provide some use-case on configuration property level (how 
could an upstream component (e.g. Hive) benefit from the change)?

> Input/Output/Processor should merge payload to local conf
> ---------------------------------------------------------
>
>                 Key: TEZ-4137
>                 URL: https://issues.apache.org/jira/browse/TEZ-4137
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Mustafa Iman
>            Assignee: Mustafa Iman
>            Priority: Major
>         Attachments: TEZ-4137.1.patch, TEZ-4137.2.patch, TEZ-4137.3.patch, 
> TEZ-4137.4.patch, TEZ-4137.4.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> This patch introduces config merging to various Input and Output processors. 
> As described in https://issues.apache.org/jira/browse/TEZ-4073 , we need to 
> reduce the size of the configuration objects transferred over the wire. There 
> are two improvements we are planning to do regarding to that:
>  # Skip sending default configs and configuration coming from xml files in 
> payload
>  # Send dag, vertex and session configurations in layers instead of sending 
> dag + vertex + session configs all together three times.
> In order to achieve these,
>  * We need to expose local config on Task side through TaskContext.
>  * Input/Output/Processors must merge the config from user payload to local 
> config in their TaskContext
> Since runtime components did not have access to local config before, tez 
> clients sent all config required at runtime in user payload. After this 
> change, tez clients can reduce their payload size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to