Aggarwal-Raghav commented on PR #6343: URL: https://github.com/apache/hive/pull/6343#issuecomment-4063673798
Hi @abstractdog , **i have been able to create a fully dockerized env for this** i.e. docker image for 1. Minimal hdfs: [HIVE-29493](https://issues.apache.org/jira/browse/HIVE-29493), please check comments under this JIRA 2. tez-am: TEZ-4682 3. zookeeper 4. hive4.3.0-SNAPSHOT - using tez-1.0.0-SNAPSHOT jars and hdfs docker image. but the 4th docker image requires lot of custom changes: 1. hive docker image is built using `BUILD_ENV=local` , the tez tarball using `mvn clean install` has different naming convention compared to the one deployed in dlcdn artifactory. **This required tez fix IMO. I already have assembly file changes ready, let me know if this requires fix** ``` docker build -t hive4:4.3.0-SNAPSHOT . \ --build-arg BUILD_ENV=local \ --build-arg HADOOP_VERSION=3.4.2 \ --build-arg HIVE_VERSION=4.3.0-SNAPSHOT \ --build-arg TEZ_VERSION=1.0.0-SNAPSHOT ``` 2. The core-site.xml, hdfs-site.xml has to be mounted to point to minial hdfs docker image — Maybe we can create a new hive docker image using decoupled hdfs. Currently the tables are created on `file:///` but after mounting they are created on `hdfs:///` 3. the tez-api jar has to be manually replaced from 0.10.5 to 1.0.0-SNAPSHOT as the precedence of hive lib is greater than tez jars — this is only until tez-1.0.0 is released. 4. the hive scracth directory is shared across tez am and hs2. so a shared mount dir needs to be created. 5. `hive-site` changes can't be mouted because of customization in hive entrypoint.sh, hence manually updated configs in hive code before making docker image. 6. Created gist for docker compose.yml and logs for hs2 and tez am: https://gist.github.com/Aggarwal-Raghav/c152c8e2be4a40cc23368ad286b28645 7. the hive-exec jar is requied by tez-am to execute any query and for iceberg table even hive-iceberg jars as well. As per my understanding this will be fixed in future work — How HS2 will propogate resources to tez am docker. Question: 1. Should docker compose go in as part of TEZ-4682, if not is it ok to keep `host.docker.internal` as zk brew installation for now. 2. Should minimal hadoop be merged first and then TEZ-4682 i..e Hadoop + zk + tez am docker image + docker compose.yml (without hive docker image) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
