zhangbutao commented on code in PR #6507:
URL: https://github.com/apache/hive/pull/6507#discussion_r3414105333
##########
packaging/src/kubernetes/README.md:
##########
@@ -505,19 +505,602 @@ kubectl get hiveclusters
kubectl describe hivecluster hive
```
+---
+
+## Autoscaling
+
+The operator supports metric-based autoscaling for all four Hive components
using
+an **operator-driven control loop** that scrapes JMX Exporter metrics directly
from
+pods. No Prometheus server or external autoscaling tools are needed.
Autoscaling is
+opt-in per component and designed for **zero query failures** during
scale-down.
+
+### Prerequisites
+
+- No external dependencies — the operator handles all scaling decisions
internally
+
+### How It Works
+
+When `autoscaling.enabled: true` is set for a component, the operator:
+1. Attaches the JMX Exporter javaagent (port 9404) to each pod
+2. Polls `/metrics` on each pod at `metricsScrapeIntervalSeconds` intervals
+3. Computes desired replicas using component-specific formulas
+4. Applies HPA-like stabilization windows (scale-up/scale-down)
+5. Patches the workload `spec.replicas` directly
+
+### Graceful Scale-Down Architecture
+
+```
+ Scale Down Flow
+ 1. Operator reduces desired replicas (metric below threshold,
+ stabilization window elapsed)
+ 2. PodDisruptionBudget ensures minAvailable=1 (at least one pod
+ always running)
+ 3. Kubernetes sends SIGTERM to selected pod
+ 4. preStop hook runs:
+ - HS2: deregisters from ZK, drains open sessions, kills JVM
+ - HMS: kills JVM (stateless HTTP — no drain needed)
+ - LLAP: waits until all executors become idle, kills JVM
+ - TezAM: no drain (DAGAppMaster does not expose JMX metrics)
+ 5. terminationGracePeriodSeconds = gracePeriodSeconds (safety cap)
+ 6. Pod terminates immediately once drain completes (does NOT wait
+ the full grace period — it's only the upper safety bound)
+```
+
+> **Note:** Shell entrypoints (PID 1) in containers don't forward SIGTERM to
child
+> processes. The preStop hook explicitly sends SIGTERM to the Hive/Tez Java
process
+> after drain completes, ensuring prompt shutdown without waiting for the
grace period
+> to expire.
+
+### Scaling Timers
+
+The autoscaling system uses three independent timing controls:
+
+| Timer | Config Field | Default | Purpose |
+|-------|-------------|---------|---------|
+| **Metrics scrape interval** | `metricsScrapeIntervalSeconds` | `10` | How
often the operator scrapes JMX Exporter `/metrics` on each pod. This is the
**biggest bottleneck** for autoscaling reaction time. |
+| **Scale-up stabilization** | `scaleUpStabilizationSeconds` | `60` | Window:
picks the highest recommendation within this period before scaling up. Prevents
flapping when metrics oscillate. Set to `0` for LLAP and TezAM (reactive
dependents). |
+| **Scale-down stabilization** | `scaleDownStabilizationSeconds` | `300-900` |
Window: picks the most conservative (highest) recommendation within this period
before scaling down. Also acts as the cooldown between consecutive scale-downs
— no separate cooldown needed. |
+
+**How they interact:**
+- Load spike detected → operator scrapes metrics within
`metricsScrapeIntervalSeconds` → waits `scaleUpStabilizationSeconds` then
scales up
+- Load drops → operator waits `scaleDownStabilizationSeconds` (stabilization
window must confirm low demand consistently) then scales down
+
+**Tuning reaction time:** With defaults (`metricsScrapeIntervalSeconds: 10`,
`scaleUpStabilizationSeconds: 0` for LLAP/TezAM), scale-up latency is ~10-20s
(one scrape cycle). For HS2 with `scaleUpStabilizationSeconds: 60`, expect ~70s.
+
+### Per-Component Scaling Logic
+
+| Component | Scale-Up Formula | Scale-Down | JMX Metric |
+|-----------|-----------------|------------|------------|
+| **HiveServer2** | `max(ceil(sessions / threshold), cpu_desired)` | Sessions
drop to 0 AND CPU below threshold → scale to minReplicas | `hs2_open_sessions`,
`jvm_process_cpu_load` |
+| **Metastore** | `max(ceil(api_rate / threshold), cpu_desired)` | Rate drops
to 0 AND CPU below threshold → scale to minReplicas | `api_*_total`,
`jvm_process_cpu_load` |
+| **LLAP** | `ceil(avg(queued + configured - available) / scaleUpThreshold)` |
All executors idle + no HS2 sessions | `hadoop_llapdaemon_executor*` |
+| **Tez AM** | `max(sum(hs2_open_sessions), count(HS2_pods) *
sessions_per_queue)` | All HS2 sessions closed | `hs2_open_sessions` (from HS2
pods) |
+
+**TezAM Scaling Model:** TezAM uses demand-driven scaling with two formulas
(max wins):
+1. **Session demand** — `sum(hs2_open_sessions)`: scales to match the total
number of
+ concurrent sessions across all HS2 pods (each session needs its own
exclusive TezAM).
+2. **Pre-warm** — `count(HS2 pods with sessions) ×
hive.server2.tez.sessions.per.default.queue` (default 1):
+ ensures every active HS2 pod has enough TezAM sessions pre-claimed from
ZooKeeper.
+
+The operator takes the maximum across both formulas. This ensures TezAM
capacity
+is always sufficient for both current demand and eager session pre-warming.
+TezAM scaling is purely demand-driven from HS2 metrics.
+
+### Scale-to-Zero Architecture
+
+When `minReplicas: 0` is configured (LLAP, TezAM), the cluster scales those
+components down to zero pods when HS2 has no active sessions. HS2 itself always
+maintains at least 1 replica (`minReplicas >= 1`) so it is always available to
+accept connections.
+
+```
+ Scale-to-Zero (Idle Detection)
+
+ 1. HS2 reports hs2_open_sessions = 0 for scaleDownStabilization
+ → operator scales HS2 to minReplicas (>= 1)
+
+ 2. Operator sees hs2_open_sessions = 0 on next LLAP/TezAM eval
+ → activation gate fails
+ → scale LLAP and TezAM to 0 (if minReplicas=0)
+
+ 3. HMS stays at minReplicas=1 (always available)
+
+```
+
+```
+ Wake-from-Zero (LLAP/TezAM)
+
+ 1. Beeline connects to HS2 (always running, at least 1 pod)
+
+ 2. HS2 reports hs2_open_sessions > 0 via JMX Exporter
+
+ 3. Operator detects HS2 sessions on next scrape cycle:
+ - LLAP activation gate passes → scales up from 0
+ - TezAM activation gate passes → scales up from 0
+
+ 4. Query executes once LLAP/TezAM pods are ready
+
+```
+
+**Session protection:** The HS2 Service uses `sessionAffinity: ClientIP` to
ensure
+beeline clients always reach the same pod. The preStop hook deregisters the
pod from
+ZooKeeper (preventing new sessions) and waits for `hs2_open_sessions` to drain
to 0
+before terminating. The `gracePeriodSeconds` (default 3600s) is a safety cap —
the pod
+terminates immediately once sessions drain, not after the full grace period.
+
+**Component-specific behavior:**
+
+| Component | minReplicas | Scale-to-Zero Trigger | Wake Trigger |
+|-----------|-------------|----------------------|--------------|
+| **HS2** | 1 | N/A (always running) | N/A |
+| **HMS** | 1 | Never (always running) | N/A |
+| **LLAP** | 0 | No HS2 sessions (activation gate fails) | HS2 has open
sessions (next scrape) |
+| **TezAM** | 0 | No HS2 sessions (activation gate fails) | HS2 has open
sessions (next scrape) |
+
+### Auto-Suspend (Full Cluster Hibernation)
+
+Auto-suspend goes beyond scale-to-zero — it fully hibernates the **entire**
cluster
+(including HS2 and HMS) to 0 replicas after a configurable idle timeout. This
is
+useful for dev/test clusters that should not consume resources when nobody is
using
+them.
+
+**Prerequisites:** Auto-suspend requires autoscaling to be enabled on ALL
active
+components (HS2, LLAP if enabled, TezAM if enabled, and HMS if
`includeMetastore=true`).
+The operator will not auto-suspend unless it can confirm all components are at
their
+minimum state.
+
+**Idle criteria (all must hold simultaneously for `idleTimeoutMinutes`):**
+
+| Component | Idle Condition |
+|-----------|---------------|
+| **HS2** | At `minReplicas` with 0 open sessions |
+| **HMS** | At `minReplicas` (only checked if `includeMetastore=true`) |
+| **LLAP** | At `minReplicas` (default 0) |
+| **TezAM** | At `minReplicas` (default 0) |
+
+**Important:** HS2 can **only** scale to 0 replicas via auto-suspend. Normal
+autoscaling always maintains `minReplicas >= 1` for HS2. Auto-suspend is the
+only mechanism that overrides this to achieve full hibernation.
+
+```
+ Auto-Suspend Flow
+
+ 1. Autoscaling scales all components to their minReplicas
+ (HS2≥1, HMS≥1, LLAP/TezAM to configured min)
+
+ 2. Operator detects idle state:
+ - HS2 has 0 open sessions
+ - HMS at minReplicas (if includeMetastore=true)
+ - LLAP/TezAM at minReplicas
+
+ 3. Idle timer starts (status: clusterPhase=Idle, idleSince=<now>)
+
+ 4. After idleTimeoutMinutes (default 15):
+ - ALL components scaled to 0 (HMS excluded if includeMetastore=false)
+ - spec.suspend set to true (cluster stays suspended until user wakes it)
+ - Status: clusterPhase=Suspended, suspendedSince=<now>
+
+ 5. To wake: kubectl patch hivecluster hive --type=merge -p
'{"spec":{"suspend":false}}'
+ All components restored to minReplicas
+ (HS2/HMS ≥1, LLAP/TezAM ≥1 for immediate usability)
+
+```
+
+**Configuration:**
+
+```yaml
+cluster:
+ autoSuspend:
+ enabled: true
+ idleTimeoutMinutes: 15 # minutes idle before full hibernation
+ includeMetastore: true # set false to keep HMS running during suspend
+```
+
+**Manual Suspend/Wake Commands:**
+
+```bash
+# Suspend immediately (bypasses idle timer)
+kubectl patch hivecluster hive --type=merge -p '{"spec":{"suspend":true}}'
+
+# Wake cluster (restores to minReplicas)
+kubectl patch hivecluster hive --type=merge -p '{"spec":{"suspend":false}}'
+```
+
+Manual suspend works regardless of whether `autoSuspend.enabled` is true — it
+immediately scales all components to 0 without waiting for the idle timeout.
+When `includeMetastore: false`, HMS stays running even during manual suspend.
+
+**Observing cluster state:**
+
+```bash
+# Quick view — printer columns show phase and idle time
+kubectl get hivecluster
+```
+```
+NAME PHASE IDLE (MIN) AGE
+hive Idle 12 2h
+```
+
+```bash
+# After suspend triggers
+kubectl get hivecluster
+```
+```
+NAME PHASE IDLE (MIN) AGE
+hive Suspended 2h
+```
+
+```bash
+# Full status (kubectl get hivecluster hive -o yaml)
+```
+```yaml
+status:
+ clusterPhase: Suspended
+ idleSince: "2026-06-08T10:00:00Z"
+ idleForMinutes: 15
+ suspendedSince: "2026-06-08T10:15:00Z"
+ conditions:
+ - type: Suspended
+ status: "True"
+ reason: AutoSuspend # or ManualSuspend
+ message: "Cluster suspended after idle timeout"
+ lastTransitionTime: "2026-06-08T10:15:00Z"
+```
+
+When the cluster is running normally:
+```
+NAME PHASE IDLE (MIN) AGE
+hive Running 2h
+```
+
+**Full example (autoscaling + auto-suspend):**
+
+```yaml
+cluster:
+ autoSuspend:
+ enabled: true
+ idleTimeoutMinutes: 15
+ includeMetastore: false # keep HMS running during suspend
+
+ hiveServer2:
+ replicas: 10
+ autoscaling:
+ enabled: true
+ minReplicas: 1
+
+ metastore:
+ replicas: 6
+ autoscaling:
+ enabled: true
+ minReplicas: 1
+
+ llap:
+ replicas: 8
+ autoscaling:
+ enabled: true
+ minReplicas: 0 # scales to 0 via normal autoscaling when HS2 idle
+
+ tezAm:
+ replicas: 10
+ autoscaling:
+ enabled: true
+ minReplicas: 0 # scales to 0 via normal autoscaling when HS2 idle
+```
+
+With this configuration, the cluster lifecycle is:
+1. Under load → all components scaled up by autoscaler
+2. Load drops → autoscaler scales to minReplicas (HS2=1, HMS=1, LLAP=0,
TezAM=0)
+3. HS2 idle (0 sessions) for 15 minutes → auto-suspend kicks in → HS2, LLAP,
TezAM to 0 (HMS stays at minReplicas)
+4. `kubectl patch hivecluster hive --type=merge -p
'{"spec":{"suspend":false}}'` → wake → HS2=1, LLAP=1, TezAM=1
+5. User connects → autoscaler detects sessions → scales up as needed
+
+### CPU-Based Scaling (HS2 and HMS)
+
+In addition to the primary metrics (sessions for HS2, API request rate for
HMS),
+the operator supports a secondary **CPU-based scaling signal** for HiveServer2
and
+Metastore. The final desired replica count is:
+
+```
+final_desired = max(metric_desired, cpu_desired)
+```
+
+Either signal can trigger scale-up; neither can force scale-down below what the
+other recommends. CPU-based scaling uses the same stabilization windows as
metric-based
+scaling (no separate CPU stabilization).
+
+**How it works:**
+
+1. The operator scrapes `ProcessCpuLoad` from `java.lang:type=OperatingSystem`
via JMX
+ Exporter (exported as `jvm_process_cpu_load`, a 0.0–1.0 fraction)
+2. Averages across all pods, converts to percentage (0–100)
+3. If avg CPU >= `cpuScaleUpThreshold`: scales up proportionally
+ (`ceil(avgCpu * currentReplicas / cpuScaleUpThreshold)`)
+4. If avg CPU < `cpuScaleDownThreshold`: scales down
+ (`ceil(avgCpu * currentReplicas / cpuScaleUpThreshold)`, floored at
`minReplicas`)
+5. Between thresholds: holds current replica count
+
+**Configuration:**
+
+| Value | Default | Description |
+|-------|---------|-------------|
+| `cluster.<component>.autoscaling.cpuScaleUpThreshold` | `90` | CPU
percentage (0-100) that triggers scale-up. Set to `0` to disable CPU-based
scaling. |
+| `cluster.<component>.autoscaling.cpuScaleDownThreshold` | `30` | CPU
percentage (0-100) below which scale-down is considered. |
+
+**Example:**
+
+```yaml
+cluster:
+ hiveServer2:
+ replicas: 10
+ resources:
+ limitsCpu: "2" # Recommended: set CPU limits so ProcessCpuLoad is
relative to pod allocation
+ autoscaling:
+ enabled: true
+ cpuScaleUpThreshold: 90
+ cpuScaleDownThreshold: 30
+
+ metastore:
+ replicas: 6
+ resources:
+ limitsCpu: "2"
+ autoscaling:
+ enabled: true
+ cpuScaleUpThreshold: 90
+ cpuScaleDownThreshold: 30
+```
+
+**Important: CPU limits and metric accuracy**
+
+`ProcessCpuLoad` reports CPU usage as a fraction of **available processors**.
Without
+CPU limits, the JVM sees all node cores (e.g., 8 cores), so even heavy
single-pod
+load only shows ~12.5%. With `limitsCpu: "2"`, the JVM sees 2 processors and
the
+metric becomes "% of allocated CPU" — making thresholds meaningful.
+
+| Pod CPU Limit | JVM sees | 90% threshold means |
+|---------------|----------|---------------------|
+| None (no limit) | All node cores (e.g., 8) | Using 7.2 of 8 cores — very
hard to reach |
+| `2` | 2 cores | Using 1.8 of 2 allocated cores |
+| `4` | 4 cores | Using 3.6 of 4 allocated cores |
+
+**Recommendation:** Always set `resources.limitsCpu` when using CPU-based
autoscaling.
+
+**Status output:**
+
+The operator reports CPU metrics in the HiveCluster status:
+
+```yaml
+status:
+ hiveServer2:
+ autoscaling:
+ currentMetricValue: 5 # total sessions
+ scaleUpThreshold: 100
+ currentCpuPercent: 72.45 # avg ProcessCpuLoad * 100
+ cpuScaleUpThreshold: 90
+ cpuProposedReplicas: 2 # what CPU alone would recommend
+ proposedReplicas: 2
+ lastScaleTime: "2026-05-31T04:23:07Z"
+```
+
+**Applicability:** CPU-based scaling only applies to HS2 and HMS. LLAP and
TezAM
+do not use CPU as a scaling signal (LLAP scales on busy executor slots which
already
+correlates with CPU; TezAM is demand-based from HS2 session count).
+
+---
+
+### Enabling Autoscaling
+
+**CLI (with Ozone storage backend):**
+
+Each component has sensible per-component defaults (see [Configuration
Reference](#configuration-reference)).
+Only `enabled=true` is needed to turn on autoscaling:
+
+```bash
+helm install hive ./helm/hive-operator \
+ --set cluster.database.type=postgres \
+ --set
cluster.database.url="jdbc:postgresql://postgres-postgresql:5432/metastore" \
+ --set cluster.database.driver="org.postgresql.Driver" \
+ --set cluster.database.username=hive \
+ --set cluster.database.passwordSecretRef.name=hive-db-secret \
+ --set cluster.database.passwordSecretRef.key=password \
+ --set
cluster.database.driverJarUrl="https://repo1.maven.org/maven2/org/postgresql/postgresql/42.7.5/postgresql-42.7.5.jar"
\
+ --set cluster.zookeeper.quorum="zookeeper:2181" \
+ --set cluster.storage.coreSiteOverrides."fs\.defaultFS"="s3a://hive" \
+ --set
cluster.storage.coreSiteOverrides."fs\.s3a\.endpoint"="http://ozone-s3g-rest:9878"
\
+ --set-string
cluster.storage.coreSiteOverrides."fs\.s3a\.path\.style\.access"=true \
+ --set 'cluster.storage.envVars[0].name=HADOOP_OPTIONAL_TOOLS' \
+ --set 'cluster.storage.envVars[0].value=hadoop-aws' \
+ --set 'cluster.storage.envVars[1].name=AWS_ACCESS_KEY_ID' \
+ --set 'cluster.storage.envVars[1].value=ozone' \
+ --set 'cluster.storage.envVars[2].name=AWS_SECRET_ACCESS_KEY' \
+ --set 'cluster.storage.envVars[2].value=ozone' \
+ --set cluster.hiveServer2.autoscaling.enabled=true \
+ --set cluster.metastore.autoscaling.enabled=true \
+ --set cluster.llap.autoscaling.enabled=true \
+ --set cluster.tezAm.autoscaling.enabled=true
+```
+
+**Values file (for customizing beyond defaults):**
+
+```yaml
+# values-autoscaling.yaml — only override what you need
+cluster:
+ database:
+ type: postgres
+ url: "jdbc:postgresql://postgres-postgresql:5432/metastore"
+ driver: "org.postgresql.Driver"
+ username: hive
+ passwordSecretRef:
+ name: hive-db-secret
+ key: password
+ driverJarUrl:
"https://repo1.maven.org/maven2/org/postgresql/postgresql/42.7.5/postgresql-42.7.5.jar"
+
+ zookeeper:
+ quorum: "zookeeper:2181"
+
+ storage:
+ coreSiteOverrides:
+ fs.defaultFS: "s3a://hive"
+ fs.s3a.endpoint: "http://ozone-s3g-rest:9878"
+ fs.s3a.path.style.access: "true"
+ envVars:
+ - name: HADOOP_OPTIONAL_TOOLS
+ value: "hadoop-aws"
+ - name: AWS_ACCESS_KEY_ID
+ value: "ozone"
+ - name: AWS_SECRET_ACCESS_KEY
+ value: "ozone"
+
+ hiveServer2:
+ replicas: 10 # Acts as maxReplicas when autoscaling is enabled
+ autoscaling:
+ enabled: true
+ # minReplicas: 1 # default — always keep at least 1 HS2 running
+ # scaleUpThreshold: 80 # default — avg open sessions per pod triggering
scale-up
+ # scaleUpStabilizationSeconds: 60 # default — scale-up window
+ # scaleDownStabilizationSeconds: 600 # default — scale-down window (also
acts as cooldown)
+ # metricsScrapeIntervalSeconds: 10 # default — operator scrape interval
(lower = faster reaction)
+
+ metastore:
+ replicas: 6 # Acts as maxReplicas when autoscaling is enabled
+ autoscaling:
+ enabled: true
+ # minReplicas: 1 # default — always keep at least 1 metastore
running
+ # scaleUpThreshold: 75 # default — API request rate (req/s) triggering
scale-up
+ # scaleUpStabilizationSeconds: 60 # default — scale-up window
+ # scaleDownStabilizationSeconds: 300 # default — scale-down window (also
acts as cooldown)
+ # gracePeriodSeconds: 60 # default — fast drain (HMS is stateless)
+ # metricsScrapeIntervalSeconds: 10 # default — operator scrape interval
+
+ llap:
+ replicas: 8 # Acts as maxReplicas when autoscaling is enabled
+ autoscaling:
+ enabled: true
+ # minReplicas: 0 # default — scale to zero when no HS2 sessions
+ # scaleUpThreshold: 1 # default — total busy slots (queued+running)
triggering scale-up
+ # scaleUpStabilizationSeconds: 60 # default — scale-up window
+ # scaleDownStabilizationSeconds: 900 # default — scale-down window (long
— scaling down destroys cache)
+ # gracePeriodSeconds: 600 # default — 10 min drain for in-flight
fragments
+ # metricsScrapeIntervalSeconds: 10 # default — operator scrape interval
(lower = faster reaction)
+
+ tezAm:
+ replicas: 10 # Acts as maxReplicas when autoscaling is enabled
+ autoscaling:
+ enabled: true
+ # minReplicas: 0 # default — scale to zero when no HS2 sessions
+ # scaleUpThreshold: 1 # default — threshold for demand metric (1 =
match HS2 pod count)
+ # scaleUpStabilizationSeconds: 60 # default — HPA scale-up window
+ # scaleDownStabilizationSeconds: 300 # default — HPA scale-down window
+ # gracePeriodSeconds: 120 # default — 2 min drain for DAG completion
+ # metricsScrapeIntervalSeconds: 10 # default — operator scrape interval
(lower = faster reaction)
+```
+
+```bash
+helm install hive ./helm/hive-operator -f values-autoscaling.yaml
+```
+
+When autoscaling is enabled, the operator automatically:
+- Deploys the JMX Exporter javaagent (port 9404, `/metrics`)
+- Enables `hive.server2.metrics.enabled` / `metastore.metrics.enabled` (JMX
reporter)
+- Attaches JMX Exporter javaagent (port 9404, `/metrics`) to each pod
+- Creates PodDisruptionBudgets (minAvailable: 1)
+- Configures preStop lifecycle hooks for graceful drain
+- Sets `terminationGracePeriodSeconds` to the configured grace period
+- LLAP/TezAM use HS2 metrics as activation gate (only scale when HS2 has
sessions)
+
+**JMX Metrics Scraped by Operator (per component):**
+
+| Component | Key Metrics | Purpose |
+|-----------|---------|---------|
+| **HiveServer2** | `hs2_open_sessions`, `jvm_process_cpu_load` | Session
count for primary scaling + CPU for secondary scaling signal |
+| **Metastore** | `api_*_total`, `jvm_process_cpu_load` | API call counters
(operator computes request rate from deltas) + CPU for secondary scaling signal
|
+| **LLAP** | `hadoop_llapdaemon_executornumqueuedrequests`,
`hadoop_llapdaemon_executornumexecutorsconfigured`,
`hadoop_llapdaemon_executornumexecutorsavailable` | Total busy slots = queued +
configured - available |
+| **Tez AM** | N/A (scales on HS2 metrics) | TezAM scaling is demand-driven
from `hs2_open_sessions` — no TezAM-specific metrics needed |
+
+### Enabling Autoscaling — Example
+
+To enable autoscaling for HS2 and Metastore:
+
+```yaml
+cluster:
+ hiveServer2:
+ replicas: 4 # max replicas ceiling
+ autoscaling:
+ enabled: true
+ scaleUpThreshold: 1 # scale up when total sessions > 1
+ minReplicas: 1 # always keep at least 1 HS2 pod running
+
+ metastore:
+ replicas: 3 # max replicas ceiling
+ autoscaling:
+ enabled: true
+ minReplicas: 1 # always keep at least 1 running
+ scaleUpThreshold: 75 # API requests/sec threshold
+```
+
+> **Note:** LLAP scales on total busy slots (queued + running executors).
+> TezAM scales on demand — the number of active HS2 pods multiplied by
+> `hive.server2.tez.sessions.per.default.queue` (default 1).
+
+### Helm Values Reference (Autoscaling)
+
+| Value | Default | Description |
+|-------|---------|-------------|
+| `cluster.<component>.replicas` | `1-2` | Static replica count, or max
replicas ceiling when autoscaling is enabled |
+| `cluster.<component>.autoscaling.enabled` | `false` | Enable operator-driven
autoscaling |
+| `cluster.<component>.autoscaling.minReplicas` | `1` (HS2/HMS), `0`
(LLAP/TezAM) | Minimum replica count. Set to 0 for scale-to-zero (LLAP, TezAM
only; HS2 minimum is 1) |
+| `cluster.<component>.autoscaling.scaleUpThreshold` | varies | Metric
threshold triggering scale-up |
+| `cluster.<component>.autoscaling.scaleUpStabilizationSeconds` | `60` |
Stabilization window for scale-up (picks highest recommendation in window) |
+| `cluster.<component>.autoscaling.scaleDownStabilizationSeconds` | `300-900`
| Stabilization window for scale-down (picks most conservative recommendation
in window). Also acts as cooldown between consecutive scale-downs. |
+| `cluster.<component>.autoscaling.gracePeriodSeconds` | `3600` | Safety cap:
max drain time before forced termination. Pod exits immediately once drain
completes. |
+| `cluster.<component>.autoscaling.metricsScrapeIntervalSeconds` | `10` | How
often the operator scrapes JMX metrics from pods. Lower = faster reaction. |
+| `cluster.<component>.autoscaling.cpuScaleUpThreshold` | `90` | CPU
percentage (0-100) triggering scale-up. Only HS2/HMS. Set to 0 to disable. |
+| `cluster.<component>.autoscaling.cpuScaleDownThreshold` | `30` | CPU
percentage (0-100) below which scale-down is considered. Only HS2/HMS. |
+
+---
+
## Connect to HiveServer2
+HiveServer2 runs in **HTTP transport mode** by default (recommended for
Kubernetes
+environments as it works well with load balancers, ingress controllers, and
proxies).
+
+### Standard Connection (minReplicas >= 1)
+
+When HS2 always has at least one pod running, connect directly to the service:
+
```bash
-kubectl exec -it deployment/hive-hiveserver2 -- beeline -u
"jdbc:hive2://hive-hiveserver2:10000/"
+kubectl exec -it deployment/hive-hiveserver2 -- beeline -u
"jdbc:hive2://hive-hiveserver2:10001/;transportMode=http;httpPath=cliservice"
```
Or via port-forward:
```bash
-kubectl port-forward svc/hive-hiveserver2 10000:10000
-beeline -u "jdbc:hive2://localhost:10000/"
+kubectl port-forward svc/hive-hiveserver2 10001:10001
+beeline -u
"jdbc:hive2://localhost:10001/;transportMode=http;httpPath=cliservice"
```
+### LLAP/TezAM Scale-to-Zero Behavior
+
+When LLAP and TezAM are configured with `minReplicas: 0` (the default), they
start
+with zero pods on fresh install. The operator automatically scales them up
when HS2
+reports open sessions, and scales them back to zero when HS2 is idle.
Review Comment:
That’s interesting—LLAP can be spun up on demand just like Tez tasks on
YARN. I’m curious about the speed of spinning up LLAP on Kubernetes
concurrently. For example, if I need to start 100 LLAP instances at the same
time to run tasks, will the concurrent startup take a long time?
If LLAP on K8s can start up very quickly, then we might explore using LLAP
on K8s for certain batch processing tasks that require many LLAP instances to
run concurrently during specific time windows. If that’s feasible, perhaps Tez
tasks on K8s wouldn’t be as important—LLAP on K8s might be sufficient.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]