[ https://issues.apache.org/jira/browse/TEZ-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095588#comment-17095588 ]
László Bodor commented on TEZ-4137: ----------------------------------- [~mustafaiman]: I've started to digest your patch...as a conversation starter, could you please represent through a hive + tez example how this patch can improve the user payload handling in upstream components (especially if you have a corresponding HIVE- ticket already)? so as far as I understood, the most important part of this work is to replace {code:java} TezUtils.createConfFromUserPayload(getContext().getUserPayload()); {code} (which was about [parsing the full payload|https://github.com/apache/tez/blob/0eeef27413db97b52242878301788ac5fd8def16/tez-api/src/main/java/org/apache/tez/common/TezUtils.java#L99] in order to initialize input/output/processor) with {code:java} TezUtils.createConfFromBaseConfAndPayload(getContext()) {code} which is about to rely on [context's configuration|https://github.com/apache/tez/pull/64/commits/1ef1377679d57b5bd586e8617a90db83f2a78af0#diff-15940b032e9c868635d1ffd095f7daa5R141]) this conf object, which is returned by getContainerConfiguration(), is coming from TezTaskRunner2 by merging default conf and taskspec conf, [here|https://github.com/apache/tez/blob/master/tez-runtime-internals/src/main/java/org/apache/tez/runtime/task/TezTaskRunner2.java#L142-L149] btw, on AM side, taskSpec.conf is simply a [vertexOnlyConf|https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java#L1754] from hive side, [here is how a vertex user payload is created|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L907] I'm assuming that with this patch, the intention is to reduce this payload size for instance, as, looking at createVertexFromReduceWork, it sets some vertex specific props, but it contains all the parent hive configuration could you please provide some use-case on configuration property level (how could an upstream component (e.g. Hive) benefit from the change)? > Input/Output/Processor should merge payload to local conf > --------------------------------------------------------- > > Key: TEZ-4137 > URL: https://issues.apache.org/jira/browse/TEZ-4137 > Project: Apache Tez > Issue Type: Improvement > Reporter: Mustafa Iman > Assignee: Mustafa Iman > Priority: Major > Attachments: TEZ-4137.1.patch, TEZ-4137.2.patch, TEZ-4137.3.patch, > TEZ-4137.4.patch, TEZ-4137.4.patch > > Time Spent: 20m > Remaining Estimate: 0h > > This patch introduces config merging to various Input and Output processors. > As described in https://issues.apache.org/jira/browse/TEZ-4073 , we need to > reduce the size of the configuration objects transferred over the wire. There > are two improvements we are planning to do regarding to that: > # Skip sending default configs and configuration coming from xml files in > payload > # Send dag, vertex and session configurations in layers instead of sending > dag + vertex + session configs all together three times. > In order to achieve these, > * We need to expose local config on Task side through TaskContext. > * Input/Output/Processors must merge the config from user payload to local > config in their TaskContext > Since runtime components did not have access to local config before, tez > clients sent all config required at runtime in user payload. After this > change, tez clients can reduce their payload size. -- This message was sent by Atlassian Jira (v8.3.4#803005)