[ 
https://issues.apache.org/jira/browse/HIVE-29419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-29419:
--------------------------------
    Description: 
This ticket is related to the Dockerized Hive and Tez initiative.
While Hive Docker was implemented in HIVE-26400, and a Tez AM image is 
currently under development in TEZ-4682, there is an open question about how to 
seamlessly integrate Hive and Tez docker containers (build and runtime also)
TEZ-4682 aims to build a generic Tez AM image, which is crucial for making Tez 
a modern execution engine, while Hive has a lot of dependencies on Tez. This 
makes the independent development of Hive (HiveServer2) and a Tez AM docker 
images quite hard.
Consider the different classes used in TezAM:
!Screenshot 2026-01-27 at 14.51.39.png|width=570,height=280!

Every yellow class induces a separate question about "how Hive jars make their 
way to an independent Tez image", and here is how this Jira could be a 
game-changer. Consider *HiveSplitGenerator* (in hive-exec module) and 
*LlapTaskCommunicator* (in llap-tez module) classes. In the Yarn world, their 
localization was taken care of by Yarn, but from the point we deploy loosely 
coupled Docker containers, we cannot rely on such a mechanism anymore.

Hence, the *proposal* is to include Tez jars into the Hive image (if they are 
not yet included since HIVE-26400), and make a Tez AM specific entrypoint (and 
separate Dockerfile if needed), that starts {*}DAGAppMaster{*}.

*Motivation:* a dockerized, real distributed testing of Apache Hive upstream as 
described 
onhttps://docs.google.com/document/d/1k92cIzRUXy9IIdivklXR877l8C-0wnmZUrszP_KJgs4

  was:
This ticket is related to the Dockerized Hive and Tez initiative.
While Hive Docker was implemented in HIVE-26400, and a Tez AM image is 
currently under development in TEZ-4682, there is an open question about how to 
seamlessly integrate Hive and Tez docker containers (build and runtime also)
TEZ-4682 aims to build a generic Tez AM image, which is crucial for making Tez 
a modern execution engine, while Hive has a lot of dependencies on Tez. This 
makes the independent development of Hive (HiveServer2) and a Tez AM docker 
images quite hard.
Consider the different classes used in TezAM.
!Screenshot 2026-01-27 at 14.51.39.png|width=462,height=227!

Every yellow class makes a separate question about "how Hive jars make their 
way to an independent Tez image", and here is how this Jira could be a 
game-changer. Consider *HiveSplitGenerator* (in hive-exec module) and 
*LlapTaskCommunicator* (in llap-tez module) classes. In the Yarn world, their 
localization was taken care of by Yarn, but from the point we deploy loosely 
coupled Docker containers, we cannot rely on such a mechanism anymore.

Hence, the proposal is to include Tez jars into the Hive image (if they are not 
yet included since HIVE-26400), and make a Tez AM specific entrypoint (and 
separate Dockerfile if needed), that starts {*}DAGAppMaster{*}.


> Provide a Hive-specific docker image for Tez AM
> -----------------------------------------------
>
>                 Key: HIVE-29419
>                 URL: https://issues.apache.org/jira/browse/HIVE-29419
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: László Bodor
>            Priority: Major
>         Attachments: Screenshot 2026-01-27 at 14.51.39.png
>
>
> This ticket is related to the Dockerized Hive and Tez initiative.
> While Hive Docker was implemented in HIVE-26400, and a Tez AM image is 
> currently under development in TEZ-4682, there is an open question about how 
> to seamlessly integrate Hive and Tez docker containers (build and runtime 
> also)
> TEZ-4682 aims to build a generic Tez AM image, which is crucial for making 
> Tez a modern execution engine, while Hive has a lot of dependencies on Tez. 
> This makes the independent development of Hive (HiveServer2) and a Tez AM 
> docker images quite hard.
> Consider the different classes used in TezAM:
> !Screenshot 2026-01-27 at 14.51.39.png|width=570,height=280!
> Every yellow class induces a separate question about "how Hive jars make 
> their way to an independent Tez image", and here is how this Jira could be a 
> game-changer. Consider *HiveSplitGenerator* (in hive-exec module) and 
> *LlapTaskCommunicator* (in llap-tez module) classes. In the Yarn world, their 
> localization was taken care of by Yarn, but from the point we deploy loosely 
> coupled Docker containers, we cannot rely on such a mechanism anymore.
> Hence, the *proposal* is to include Tez jars into the Hive image (if they are 
> not yet included since HIVE-26400), and make a Tez AM specific entrypoint 
> (and separate Dockerfile if needed), that starts {*}DAGAppMaster{*}.
> *Motivation:* a dockerized, real distributed testing of Apache Hive upstream 
> as described 
> onhttps://docs.google.com/document/d/1k92cIzRUXy9IIdivklXR877l8C-0wnmZUrszP_KJgs4



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to