zhangbutao commented on PR #6507: URL: https://github.com/apache/hive/pull/6507#issuecomment-4742498986
> Thanx @zhangbutao for the great insights!!! > > You hit the nail on the head regarding the shift from "YARN-thinking" to "Kubernetes-native thinking." > > 1. Physical vs. Logical Isolation > You are completely right about Workload Management (WLM). Trying to carve up a single JVM's heap and CPU cycles among competing tenants is incredibly complex and never gives you 100% true isolation. By shifting to Kubernetes, we get true physical isolation via namespaces, cgroups, and dedicated pod resources. > 2. How this could work technically > What you are describing is entirely feasible. The LLAP instances register themselves in ZooKeeper under a specific app name (defaulting to @llap0). If we update the Operator to support an array of LLAP profiles (e.g., llap-cluster1, llap-cluster2), the Operator would spin up multiple independent StatefulSets, each registering to a different ZK path. > > Then, exactly as you said, a user simply sets hive.llap.daemon.service.hosts=@llap-cluster1 in their JDBC string or session. TezAM would look up that specific ZK path, find those specific pods, and route the fragments exclusively to that tenant's dedicated executors. > > 3. The Autoscaling Synergy > The best part is how it ties into the autoscaling logic in this PR! Because each tenant's LLAP cluster would be its own independent K8s StatefulSet, the autoscaler would scale llap-cluster1 and llap-cluster2 completely independently. If user1 isn't running queries, their dedicated LLAP cluster scales to zero, costing nothing, while user2 can comfortably stay scaled up to 100 pods. > > This is a fantastic concept for multi-tenancy. Since the core autoscaling loop and K8s operator primitives are established in this PR, building out "Multi-Tenant LLAP Compute Groups" on top of it feels like a perfect follow-up Jira ticket. I think it is definitely worth exploring! I will definitely give it a shot :-) Your thoughts align completely with mine—this idea is both feasible and highly valuable. The reason I came up with this idea is that other MPP-architecture OLAP analytical engines, such as StarRocks and Doris, already have similar compute-group functionality that effectively isolates multi-tenant workloads. So the solution we've conceived is absolutely feasible and has practical value. Therefore, it is well worth our effort to explore this capability in depth. Thanks @ayushtkn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
