Copilot commented on code in PR #6561:
URL: https://github.com/apache/hive/pull/6561#discussion_r3469004605
##########
packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/reconciler/HiveClusterReconciler.java:
##########
@@ -636,16 +637,18 @@ private void reconcileLlapClusters(HiveCluster resource,
KubernetesClient client
// --- Per-LLAP TezAM resources (one TezAM per LLAP cluster) ---
if (resource.getSpec().tezAm().isEnabled()) {
int tezAmReplicas = resolveTezAmReplicaCount(resource, ns,
clusterName, llapSpec);
+ String tezAmName = LlapResourceBuilder.tezAmResourceName(resource,
llapSpec);
client.configMaps().inNamespace(ns)
.resource(LlapResourceBuilder.buildTezAmConfigMap(resource,
llapSpec))
.serverSideApply();
Review Comment:
TezAM was switched from StatefulSet to Deployment, but this reconcile path
doesn’t remove an existing legacy TezAM StatefulSet with the same name. On
upgrade this can leave both controllers running TezAM pods side-by-side, and
the autoscaler/scale patching will only manage the Deployment. Also,
`tezAmName` is currently unused.
Consider deleting the legacy StatefulSet (if present) before applying the
Deployment to make the migration safe and to use the `tezAmName` variable.
##########
packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/reconciler/HiveClusterReconciler.java:
##########
@@ -743,24 +746,45 @@ private void garbageCollectLlapResources(KubernetesClient
client, String ns,
Labels.APP_INSTANCE, clusterName,
Labels.APP_COMPONENT, ConfigUtils.COMPONENT_TEZAM);
-
client.apps().statefulSets().inNamespace(ns).withLabels(tezamSelector).list().getItems()
+
client.apps().deployments().inNamespace(ns).withLabels(tezamSelector).list().getItems()
.stream()
Review Comment:
Garbage collection for TezAM now only scans Deployments. If an older TezAM
StatefulSet (from before the Deployment migration) exists for a removed LLAP
cluster, it will no longer be garbage-collected by this method and can linger
indefinitely (along with its Service/ConfigMap/PDB), potentially leaving stale
TezAM pods running.
##########
packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/LlapResourceBuilder.java:
##########
@@ -229,9 +238,81 @@ public static PodDisruptionBudget
buildTezAmPdb(HiveCluster hc, LlapSpec llap) {
.build();
}
- /** Builds the TezAM StatefulSet for a specific LLAP cluster. */
- public static StatefulSet buildTezAmStatefulSet(HiveCluster hc, LlapSpec
llap, Integer replicas) {
- return INSTANCE.doBuildTezAmStatefulSet(hc, llap, replicas);
+ /** Builds the TezAM Deployment for a specific LLAP cluster. */
+ public static Deployment buildTezAmDeployment(HiveCluster hc, LlapSpec llap,
Integer replicas) {
+ return INSTANCE.doBuildTezAmDeployment(hc, llap, replicas);
+ }
+
+ /**
+ * Name for the operator-managed EndpointSlice that provides per-pod DNS for
TezAM.
+ * CoreDNS creates {@code <pod-name>.<svc>.<ns>.svc.cluster.local} A-records
using it.
+ */
+ public static String tezAmEndpointSliceName(HiveCluster hc, LlapSpec llap) {
+ return tezAmResourceName(hc, llap) + "-hostnames";
+ }
+
+ /**
+ * Builds a custom EndpointSlice for the TezAM headless Service.
+ * <p>
+ * Kubernetes only creates per-pod DNS records ({@code
<pod>.<svc>.<ns>.svc.cluster.local})
+ * when the Endpoints/EndpointSlice has a {@code hostname} field for each
address. The
+ * default EndpointSlice controller omits {@code hostname} for Deployment
pods (it only
+ * sets it automatically for StatefulSet pods). This operator-managed
EndpointSlice fills
+ * that gap, giving every ready TezAM pod a resolvable FQDN.
+ *
+ * @param pods list of TezAM pods
+ */
+ public static EndpointSlice buildTezAmEndpointSlice(HiveCluster hc, LlapSpec
llap, List<Pod> pods) {
+ String ns = hc.getMetadata().getNamespace();
+ String svcName = tezAmResourceName(hc, llap);
+ Map<String, String> labels = new HashMap<>(Map.of(
+ "kubernetes.io/service-name", svcName,
+ "endpointslice.kubernetes.io/managed-by", "hive-kubernetes-operator",
+ Labels.MANAGED_BY, Labels.MANAGED_BY_VALUE,
+ Labels.APP_INSTANCE, hc.getMetadata().getName(),
+ Labels.APP_COMPONENT, ConfigUtils.COMPONENT_TEZAM));
+
+ List<Endpoint> endpoints = new ArrayList<>();
+ for (var pod : pods) {
+ String ip = pod.getStatus() != null ? pod.getStatus().getPodIP() : null;
+ if (ip == null || ip.isEmpty()) {
+ continue;
+ }
+ boolean ready = isPodReady(pod);
+ endpoints.add(new EndpointBuilder()
+ .withHostname(pod.getMetadata().getName())
+ .withAddresses(ip)
+ .withNewConditions()
+ .withReady(ready)
+ .withServing(ready)
+ .withTerminating(false)
+ .endConditions()
+ .withNewTargetRef()
+ .withKind("Pod")
+ .withNamespace(ns)
+ .withName(pod.getMetadata().getName())
+ .endTargetRef()
+ .build());
+ }
+
+ return new EndpointSliceBuilder()
+ .withNewMetadata()
+ .withName(tezAmEndpointSliceName(hc, llap))
+ .withNamespace(ns)
+ .withLabels(labels)
+ .withOwnerReferences(ownerRef(hc))
+ .endMetadata()
+ .withAddressType("IPv4")
+ .withEndpoints(endpoints)
+ .build();
Review Comment:
`EndpointSlice.addressType` is hardcoded to `IPv4`, but `pod.status.podIP`
can be IPv6 on IPv6 / dual-stack clusters. If an IPv6 Pod IP is added under an
`IPv4` slice, Kubernetes will reject the object.
It would be safer to derive the addressType from the pod IPs you include (or
to omit creating the slice if you can’t determine a consistent address family).
##########
packaging/src/kubernetes/helm/hive-operator/templates/hivecluster.yaml:
##########
@@ -178,7 +181,6 @@ spec:
tezAm:
enabled: {{ .Values.cluster.tezAm.enabled }}
{{- if .Values.cluster.tezAm.enabled }}
- replicas: {{ .Values.cluster.tezAm.replicas }}
scratchStorageSize: {{ .Values.cluster.tezAm.scratchStorageSize | quote }}
{{- if .Values.cluster.tezAm.scratchStorageClassName }}
scratchStorageClassName: {{ .Values.cluster.tezAm.scratchStorageClassName
| quote }}
Review Comment:
The chart no longer renders `spec.tezAm.replicas`, but the values file still
defines (or may restore) `cluster.tezAm.replicas`. If TezAM replica ceilings
are meant to be configurable at top-level, this template should still emit the
field; otherwise users setting it in values will have no effect.
##########
packaging/src/kubernetes/helm/hive-operator/values.yaml:
##########
@@ -207,7 +211,6 @@ cluster:
# ---------------------------------------------------------------------------
tezAm:
enabled: true
- replicas: 2
scratchStorageSize: "1Gi"
scratchStorageClassName: ""
resources: {}
Review Comment:
`tezAm.replicas` was removed from the Helm chart defaults. Even if TezAM
replicas are typically controlled per-LLAP, this changes the rendered
HiveCluster spec (and the default max replica ceiling if it’s used), which
seems inconsistent with the PR description’s “No user-facing change”.
If this was unintentional, consider restoring the field (or explicitly
documenting the new default/behavior).
##########
packaging/src/kubernetes/src/java/org/apache/hive/kubernetes/operator/dependent/LlapResourceBuilder.java:
##########
@@ -229,9 +238,81 @@ public static PodDisruptionBudget
buildTezAmPdb(HiveCluster hc, LlapSpec llap) {
.build();
}
- /** Builds the TezAM StatefulSet for a specific LLAP cluster. */
- public static StatefulSet buildTezAmStatefulSet(HiveCluster hc, LlapSpec
llap, Integer replicas) {
- return INSTANCE.doBuildTezAmStatefulSet(hc, llap, replicas);
+ /** Builds the TezAM Deployment for a specific LLAP cluster. */
+ public static Deployment buildTezAmDeployment(HiveCluster hc, LlapSpec llap,
Integer replicas) {
+ return INSTANCE.doBuildTezAmDeployment(hc, llap, replicas);
+ }
+
+ /**
+ * Name for the operator-managed EndpointSlice that provides per-pod DNS for
TezAM.
+ * CoreDNS creates {@code <pod-name>.<svc>.<ns>.svc.cluster.local} A-records
using it.
+ */
+ public static String tezAmEndpointSliceName(HiveCluster hc, LlapSpec llap) {
+ return tezAmResourceName(hc, llap) + "-hostnames";
+ }
+
+ /**
+ * Builds a custom EndpointSlice for the TezAM headless Service.
+ * <p>
+ * Kubernetes only creates per-pod DNS records ({@code
<pod>.<svc>.<ns>.svc.cluster.local})
+ * when the Endpoints/EndpointSlice has a {@code hostname} field for each
address. The
+ * default EndpointSlice controller omits {@code hostname} for Deployment
pods (it only
+ * sets it automatically for StatefulSet pods). This operator-managed
EndpointSlice fills
+ * that gap, giving every ready TezAM pod a resolvable FQDN.
+ *
+ * @param pods list of TezAM pods
+ */
+ public static EndpointSlice buildTezAmEndpointSlice(HiveCluster hc, LlapSpec
llap, List<Pod> pods) {
+ String ns = hc.getMetadata().getNamespace();
+ String svcName = tezAmResourceName(hc, llap);
+ Map<String, String> labels = new HashMap<>(Map.of(
+ "kubernetes.io/service-name", svcName,
+ "endpointslice.kubernetes.io/managed-by", "hive-kubernetes-operator",
+ Labels.MANAGED_BY, Labels.MANAGED_BY_VALUE,
+ Labels.APP_INSTANCE, hc.getMetadata().getName(),
+ Labels.APP_COMPONENT, ConfigUtils.COMPONENT_TEZAM));
+
+ List<Endpoint> endpoints = new ArrayList<>();
+ for (var pod : pods) {
+ String ip = pod.getStatus() != null ? pod.getStatus().getPodIP() : null;
+ if (ip == null || ip.isEmpty()) {
+ continue;
+ }
+ boolean ready = isPodReady(pod);
+ endpoints.add(new EndpointBuilder()
+ .withHostname(pod.getMetadata().getName())
+ .withAddresses(ip)
+ .withNewConditions()
Review Comment:
EndpointSlice `endpoint.hostname` must be a valid DNS label (max 63 chars).
Using the full Pod `metadata.name` can exceed 63 chars (pod names are DNS
subdomains and may be up to 253), which can cause the EndpointSlice apply to be
rejected and break TezAM per-pod DNS.
Consider truncating/sanitizing the hostname to a DNS label-compatible value
before setting it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]