Aggarwal-Raghav commented on PR #456: URL: https://github.com/apache/tez/pull/456#issuecomment-3807049845
Thanks for the pointers @abstractdog . 1. Yes, the implementation is reminiscent of hive (TBH, pom.xml and build-docker.sh and some parts of Dockerfile are taken from hive to some extent) 2. For basic startup of tez am without hadoop jars, I didn't observed any issue. As tez tar ball contains few hadoop jars and i think they and their transitive dependency jars are sufficient for tez-am to be client of hadoop services (but I have commit ready just in case if we later want to remove hadoop tarball) 3. **No Update. I believe, code change in DagAppMaster is required for segregation.** 4. Raised https://github.com/apache/tez/pull/458 **Few additional things:** DAGAppMaster#serviceInit() => DAGAppMaster#createTaskSchedulerManager is trying to connect to ResourceManager even in zookeeper mode . I think we shouldn't use YARN scheduler and maybe move to [Yunikorn](https://yunikorn.apache.org/) (we are using that in spark internally). Let me know how to proceed for this? For now should I raise a PR for skipping it if zk mode is enabled? ``` 2026-01-27 19:13:06,207 INFO zookeeper.ZkAMRegistry: Added AMRecord to zkpath /tez-external-sessions/tez_am/server/application_1769280834537_0000 2026-01-27 19:13:06,208 INFO app.DAGAppMaster: Added AMRecord: {hostName=2d0733bd53ae, externalId=tez-session-, hostIp=172.17.0.2, port=10001, computeName=default-compute, appId=application_1769280834537_0000} to registry.. 2026-01-27 19:13:06,210 INFO rm.TaskSchedulerManager: Creating YARN TaskScheduler: org.apache.tez.dag.app.rm.DagAwareYarnTaskScheduler 2026-01-27 19:13:06,253 INFO conf.Configuration: resource-types.xml not found 2026-01-27 19:13:06,253 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 2026-01-27 19:13:06,259 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 2026-01-27 19:13:06,259 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2026-01-27 19:13:06,263 INFO rm.DagAwareYarnTaskScheduler: scheduler initialized with maxRMHeartbeatInterval:1000 reuseEnabled:true reuseRack:true reuseAny:false localityDelay:250 preemptPercentage:10 preemptMaxWaitTime:60000 numHeartbeatsBetweenPreemptions:3 idleContainerMinTimeout:5000 idleContainerMaxTimeout:10000 sessionMinHeldContainers:0 2026-01-27 19:13:06,267 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8030 2026-01-27 19:13:07,572 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2026-01-27 19:13:08,580 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2026-01-27 19:13:09,588 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2026-01-27 19:13:10,595 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
