[ 
https://issues.apache.org/jira/browse/HIVE-23175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman updated HIVE-23175:
--------------------------------
    Description: 
HiveServer spends a lot of time serializing configuration objects. We can skip 
putting hadoop and tez config xml files in payload assuming that the configs 
are the same on both HS and AM side. This depends on Tez to load local xml 
configs when creating config objects 
https://issues.apache.org/jira/browse/TEZ-4137 

Ideally we should be able to skip hive-site.xml too. However, if we skip 
hive-site.xml at that stage, then we make wrong choices at tez dag build stage 
due to missing configs.

In the ideal version of this, we should not be both looking up configs and 
putting new configs from and to the same config object at DAG and Vertex build 
phases. Instead we should be looking up from a HS2's HiveConf object and 
writing to a brand new JobConf for each vertex. That way we would not have any 
unnecessary item in the jobconf for any vertex. However Dag and Vertex build 
stages (TezTask#build) and a lot of other components called from there treat a 
single config object both the source of HS2 side config and the target JobConf 
that they are putting vertex level options into. It is very hard to separate 
these concerns now.

With this patch, we are reducing the size of JobConf (per vertex) by ~65%. It 
should improve the transmit latency. However, most significant gains are at CPU 
time while compressing job configs as the config objects are much smaller now.

  was:HiveServer spends a lot of time serializing configuration objects. We can 
skip putting hadoop and tez config xml files in payload assuming that the 
configs are the same on both HS and AM side. This depends on Tez to load local 
xml configs when creating config objects 
https://issues.apache.org/jira/browse/TEZ-4137 


> Skip serializing hadoop and tez config on HS side
> -------------------------------------------------
>
>                 Key: HIVE-23175
>                 URL: https://issues.apache.org/jira/browse/HIVE-23175
>             Project: Hive
>          Issue Type: Improvement
>          Components: Tez
>            Reporter: Mustafa Iman
>            Assignee: Mustafa Iman
>            Priority: Major
>         Attachments: HIVE-23175.1.patch
>
>
> HiveServer spends a lot of time serializing configuration objects. We can 
> skip putting hadoop and tez config xml files in payload assuming that the 
> configs are the same on both HS and AM side. This depends on Tez to load 
> local xml configs when creating config objects 
> https://issues.apache.org/jira/browse/TEZ-4137 
> Ideally we should be able to skip hive-site.xml too. However, if we skip 
> hive-site.xml at that stage, then we make wrong choices at tez dag build 
> stage due to missing configs.
> In the ideal version of this, we should not be both looking up configs and 
> putting new configs from and to the same config object at DAG and Vertex 
> build phases. Instead we should be looking up from a HS2's HiveConf object 
> and writing to a brand new JobConf for each vertex. That way we would not 
> have any unnecessary item in the jobconf for any vertex. However Dag and 
> Vertex build stages (TezTask#build) and a lot of other components called from 
> there treat a single config object both the source of HS2 side config and the 
> target JobConf that they are putting vertex level options into. It is very 
> hard to separate these concerns now.
> With this patch, we are reducing the size of JobConf (per vertex) by ~65%. It 
> should improve the transmit latency. However, most significant gains are at 
> CPU time while compressing job configs as the config objects are much smaller 
> now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to