[
https://issues.apache.org/jira/browse/HIVE-29419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
László Bodor updated HIVE-29419:
--------------------------------
Description:
This ticket is related to the Dockerized Hive and Tez initiative.
While Hive Docker was implemented in HIVE-26400, and a Tez AM image is
currently under development in TEZ-4682, there is an open question about how to
seamlessly integrate Hive and Tez docker containers (build and runtime also)
TEZ-4682 aims to build a generic Tez AM image, which is crucial for making Tez
a modern execution engine, while Hive has a lot of dependencies on Tez. This
makes the independent development of Hive (HiveServer2) and a Tez AM docker
images quite hard.
Consider the different classes used in TezAM:
!Screenshot 2026-01-27 at 14.51.39.png|width=570,height=280!
Every yellow class induces a separate question about "how Hive jars make their
way to an independent Tez image", and here is how this Jira could be a
game-changer. Consider *HiveSplitGenerator* (in hive-exec module) and
*LlapTaskCommunicator* (in llap-tez module) classes. In the Yarn world, their
localization was taken care of by Yarn, but from the point we deploy loosely
coupled Docker containers, we cannot rely on such a mechanism anymore.
Hence, the *proposal* is to include Tez jars into the Hive image (if they are
not yet included since HIVE-26400), and make a Tez AM specific entrypoint (and
separate Dockerfile if needed), that starts {*}DAGAppMaster{*}.
*Motivation:* a dockerized, real distributed testing of Apache Hive upstream as
described in
[https://docs.google.com/document/d/1k92cIzRUXy9IIdivklXR877l8C-0wnmZUrszP_KJgs4]
was:
This ticket is related to the Dockerized Hive and Tez initiative.
While Hive Docker was implemented in HIVE-26400, and a Tez AM image is
currently under development in TEZ-4682, there is an open question about how to
seamlessly integrate Hive and Tez docker containers (build and runtime also)
TEZ-4682 aims to build a generic Tez AM image, which is crucial for making Tez
a modern execution engine, while Hive has a lot of dependencies on Tez. This
makes the independent development of Hive (HiveServer2) and a Tez AM docker
images quite hard.
Consider the different classes used in TezAM:
!Screenshot 2026-01-27 at 14.51.39.png|width=570,height=280!
Every yellow class induces a separate question about "how Hive jars make their
way to an independent Tez image", and here is how this Jira could be a
game-changer. Consider *HiveSplitGenerator* (in hive-exec module) and
*LlapTaskCommunicator* (in llap-tez module) classes. In the Yarn world, their
localization was taken care of by Yarn, but from the point we deploy loosely
coupled Docker containers, we cannot rely on such a mechanism anymore.
Hence, the *proposal* is to include Tez jars into the Hive image (if they are
not yet included since HIVE-26400), and make a Tez AM specific entrypoint (and
separate Dockerfile if needed), that starts {*}DAGAppMaster{*}.
*Motivation:* a dockerized, real distributed testing of Apache Hive upstream as
described
onhttps://docs.google.com/document/d/1k92cIzRUXy9IIdivklXR877l8C-0wnmZUrszP_KJgs4
> Provide a Hive-specific docker image for Tez AM
> -----------------------------------------------
>
> Key: HIVE-29419
> URL: https://issues.apache.org/jira/browse/HIVE-29419
> Project: Hive
> Issue Type: Sub-task
> Reporter: László Bodor
> Priority: Major
> Attachments: Screenshot 2026-01-27 at 14.51.39.png
>
>
> This ticket is related to the Dockerized Hive and Tez initiative.
> While Hive Docker was implemented in HIVE-26400, and a Tez AM image is
> currently under development in TEZ-4682, there is an open question about how
> to seamlessly integrate Hive and Tez docker containers (build and runtime
> also)
> TEZ-4682 aims to build a generic Tez AM image, which is crucial for making
> Tez a modern execution engine, while Hive has a lot of dependencies on Tez.
> This makes the independent development of Hive (HiveServer2) and a Tez AM
> docker images quite hard.
> Consider the different classes used in TezAM:
> !Screenshot 2026-01-27 at 14.51.39.png|width=570,height=280!
> Every yellow class induces a separate question about "how Hive jars make
> their way to an independent Tez image", and here is how this Jira could be a
> game-changer. Consider *HiveSplitGenerator* (in hive-exec module) and
> *LlapTaskCommunicator* (in llap-tez module) classes. In the Yarn world, their
> localization was taken care of by Yarn, but from the point we deploy loosely
> coupled Docker containers, we cannot rely on such a mechanism anymore.
> Hence, the *proposal* is to include Tez jars into the Hive image (if they are
> not yet included since HIVE-26400), and make a Tez AM specific entrypoint
> (and separate Dockerfile if needed), that starts {*}DAGAppMaster{*}.
> *Motivation:* a dockerized, real distributed testing of Apache Hive upstream
> as described in
> [https://docs.google.com/document/d/1k92cIzRUXy9IIdivklXR877l8C-0wnmZUrszP_KJgs4]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)