[ 
https://issues.apache.org/jira/browse/TEZ-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085915#comment-17085915
 ] 

Jonathan Turner Eagles commented on TEZ-4141:
---------------------------------------------

I don't think this will be a generally good way to save bytes going over the 
wire. Reading local config files like this weaken the idempotency of jobs, as 
the configuration can differ across attempts. As new software rolls out across 
the cluster, the guarantee of producing the correct result is lessened. In 
addition, there is no historical record of the configuration that was used, 
making debugging difficult or impossible. Lastly, reading configuration off of 
the disk is strictly slower than the process we have now. Better to reduce the 
configuration through a filter, or hierarchically structure the configuration 
so that Dag + delta for vertex + delta for task.

One technique may be to sort the configuration keys to increase compression 
locality for the compression codec.

> Let Input/Output Processors load local xml configs
> --------------------------------------------------
>
>                 Key: TEZ-4141
>                 URL: https://issues.apache.org/jira/browse/TEZ-4141
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Mustafa Iman
>            Assignee: Mustafa Iman
>            Priority: Major
>         Attachments: TEZ-4141.1.patch
>
>
> We would like to reduce the amount of configuration going over the wire from 
> a client to application master. If Input/Output processors load local config 
> files, we can reduce the configuration overhead when client and the 
> processors have the exact same config on both sides. It is on user of client 
> to keep the configs same on both sides. Currently, clients have to send all 
> config in payload. Even if we preload config with local xml files, these 
> should be overridden by the full config object coming in payload. Therefore, 
> old clients that send all the config anyway would not be affected in terms of 
> correctness from this change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to