Re: [PR] HIVE-29492: Add AutoScaling to K8s operator [hive]

via GitHub Mon, 15 Jun 2026 07:10:48 -0700


zhangbutao commented on code in PR #6507:
URL: https://github.com/apache/hive/pull/6507#discussion_r3414105333



##########
packaging/src/kubernetes/README.md:
##########
@@ -505,19 +505,602 @@ kubectl get hiveclusters
 kubectl describe hivecluster hive
 ```
 
+---
+
+## Autoscaling
+
+The operator supports metric-based autoscaling for all four Hive components 
using
+an **operator-driven control loop** that scrapes JMX Exporter metrics directly 
from
+pods. No Prometheus server or external autoscaling tools are needed. 
Autoscaling is
+opt-in per component and designed for **zero query failures** during 
scale-down.
+
+### Prerequisites
+
+- No external dependencies — the operator handles all scaling decisions 
internally
+
+### How It Works
+
+When `autoscaling.enabled: true` is set for a component, the operator:
+1. Attaches the JMX Exporter javaagent (port 9404) to each pod
+2. Polls `/metrics` on each pod at `metricsScrapeIntervalSeconds` intervals
+3. Computes desired replicas using component-specific formulas
+4. Applies HPA-like stabilization windows (scale-up/scale-down)
+5. Patches the workload `spec.replicas` directly
+
+### Graceful Scale-Down Architecture
+
+```
+                        Scale Down Flow                                
+ 1. Operator reduces desired replicas (metric below threshold,      
+    stabilization window elapsed)                  
+ 2. PodDisruptionBudget ensures minAvailable=1 (at least one pod    
+    always running)                                                 
+ 3. Kubernetes sends SIGTERM to selected pod                        
+ 4. preStop hook runs:                                              
+    - HS2: deregisters from ZK, drains open sessions, kills JVM    
+    - HMS: kills JVM (stateless HTTP — no drain needed)             
+    - LLAP: waits until all executors become idle, kills JVM        
+    - TezAM: no drain (DAGAppMaster does not expose JMX metrics)            
+ 5. terminationGracePeriodSeconds = gracePeriodSeconds (safety cap) 
+ 6. Pod terminates immediately once drain completes (does NOT wait  
+    the full grace period — it's only the upper safety bound)
+```
+
+> **Note:** Shell entrypoints (PID 1) in containers don't forward SIGTERM to 
child
+> processes. The preStop hook explicitly sends SIGTERM to the Hive/Tez Java 
process
+> after drain completes, ensuring prompt shutdown without waiting for the 
grace period
+> to expire.
+
+### Scaling Timers
+
+The autoscaling system uses three independent timing controls:
+
+| Timer | Config Field | Default | Purpose |
+|-------|-------------|---------|---------|
+| **Metrics scrape interval** | `metricsScrapeIntervalSeconds` | `10` | How 
often the operator scrapes JMX Exporter `/metrics` on each pod. This is the 
**biggest bottleneck** for autoscaling reaction time. |
+| **Scale-up stabilization** | `scaleUpStabilizationSeconds` | `60` | Window: 
picks the highest recommendation within this period before scaling up. Prevents 
flapping when metrics oscillate. Set to `0` for LLAP and TezAM (reactive 
dependents). |
+| **Scale-down stabilization** | `scaleDownStabilizationSeconds` | `300-900` | 
Window: picks the most conservative (highest) recommendation within this period 
before scaling down. Also acts as the cooldown between consecutive scale-downs 
— no separate cooldown needed. |
+
+**How they interact:**
+- Load spike detected → operator scrapes metrics within 
`metricsScrapeIntervalSeconds` → waits `scaleUpStabilizationSeconds` then 
scales up
+- Load drops → operator waits `scaleDownStabilizationSeconds` (stabilization 
window must confirm low demand consistently) then scales down
+
+**Tuning reaction time:** With defaults (`metricsScrapeIntervalSeconds: 10`, 
`scaleUpStabilizationSeconds: 0` for LLAP/TezAM), scale-up latency is ~10-20s 
(one scrape cycle). For HS2 with `scaleUpStabilizationSeconds: 60`, expect ~70s.
+
+### Per-Component Scaling Logic
+
+| Component | Scale-Up Formula | Scale-Down | JMX Metric |
+|-----------|-----------------|------------|------------|
+| **HiveServer2** | `max(ceil(sessions / threshold), cpu_desired)` | Sessions 
drop to 0 AND CPU below threshold → scale to minReplicas | `hs2_open_sessions`, 
`jvm_process_cpu_load` |
+| **Metastore** | `max(ceil(api_rate / threshold), cpu_desired)` | Rate drops 
to 0 AND CPU below threshold → scale to minReplicas | `api_*_total`, 
`jvm_process_cpu_load` |
+| **LLAP** | `ceil(avg(queued + configured - available) / scaleUpThreshold)` | 
All executors idle + no HS2 sessions | `hadoop_llapdaemon_executor*` |
+| **Tez AM** | `max(sum(hs2_open_sessions), count(HS2_pods) * 
sessions_per_queue)` | All HS2 sessions closed | `hs2_open_sessions` (from HS2 
pods) |
+
+**TezAM Scaling Model:** TezAM uses demand-driven scaling with two formulas 
(max wins):
+1. **Session demand** — `sum(hs2_open_sessions)`: scales to match the total 
number of
+   concurrent sessions across all HS2 pods (each session needs its own 
exclusive TezAM).
+2. **Pre-warm** — `count(HS2 pods with sessions) × 
hive.server2.tez.sessions.per.default.queue` (default 1):
+   ensures every active HS2 pod has enough TezAM sessions pre-claimed from 
ZooKeeper.
+
+The operator takes the maximum across both formulas. This ensures TezAM 
capacity
+is always sufficient for both current demand and eager session pre-warming.
+TezAM scaling is purely demand-driven from HS2 metrics.
+
+### Scale-to-Zero Architecture
+
+When `minReplicas: 0` is configured (LLAP, TezAM), the cluster scales those
+components down to zero pods when HS2 has no active sessions. HS2 itself always
+maintains at least 1 replica (`minReplicas >= 1`) so it is always available to
+accept connections.
+
+```
+                   Scale-to-Zero (Idle Detection)                    
+
+  1. HS2 reports hs2_open_sessions = 0 for scaleDownStabilization     
+     → operator scales HS2 to minReplicas (>= 1)                    
+                                                                     
+  2. Operator sees hs2_open_sessions = 0 on next LLAP/TezAM eval     
+     → activation gate fails                                         
+     → scale LLAP and TezAM to 0 (if minReplicas=0)                 
+                                                                     
+  3. HMS stays at minReplicas=1 (always available)                   
+
+```
+
+```
+                   Wake-from-Zero (LLAP/TezAM)                       
+
+  1. Beeline connects to HS2 (always running, at least 1 pod)        
+                                                                     
+  2. HS2 reports hs2_open_sessions > 0 via JMX Exporter              
+                                                                     
+  3. Operator detects HS2 sessions on next scrape cycle:             
+     - LLAP activation gate passes → scales up from 0                
+     - TezAM activation gate passes → scales up from 0              
+                                                                     
+  4. Query executes once LLAP/TezAM pods are ready                   
+
+```
+
+**Session protection:** The HS2 Service uses `sessionAffinity: ClientIP` to 
ensure
+beeline clients always reach the same pod. The preStop hook deregisters the 
pod from
+ZooKeeper (preventing new sessions) and waits for `hs2_open_sessions` to drain 
to 0
+before terminating. The `gracePeriodSeconds` (default 3600s) is a safety cap — 
the pod
+terminates immediately once sessions drain, not after the full grace period.
+
+**Component-specific behavior:**
+
+| Component | minReplicas | Scale-to-Zero Trigger | Wake Trigger |
+|-----------|-------------|----------------------|--------------|
+| **HS2** | 1 | N/A (always running) | N/A |
+| **HMS** | 1 | Never (always running) | N/A |
+| **LLAP** | 0 | No HS2 sessions (activation gate fails) | HS2 has open 
sessions (next scrape) |
+| **TezAM** | 0 | No HS2 sessions (activation gate fails) | HS2 has open 
sessions (next scrape) |
+
+### Auto-Suspend (Full Cluster Hibernation)
+
+Auto-suspend goes beyond scale-to-zero — it fully hibernates the **entire** 
cluster
+(including HS2 and HMS) to 0 replicas after a configurable idle timeout. This 
is
+useful for dev/test clusters that should not consume resources when nobody is 
using
+them.
+
+**Prerequisites:** Auto-suspend requires autoscaling to be enabled on ALL 
active
+components (HS2, LLAP if enabled, TezAM if enabled, and HMS if 
`includeMetastore=true`).
+The operator will not auto-suspend unless it can confirm all components are at 
their
+minimum state.
+
+**Idle criteria (all must hold simultaneously for `idleTimeoutMinutes`):**
+
+| Component | Idle Condition |
+|-----------|---------------|
+| **HS2** | At `minReplicas` with 0 open sessions |
+| **HMS** | At `minReplicas` (only checked if `includeMetastore=true`) |
+| **LLAP** | At `minReplicas` (default 0) |
+| **TezAM** | At `minReplicas` (default 0) |
+
+**Important:** HS2 can **only** scale to 0 replicas via auto-suspend. Normal
+autoscaling always maintains `minReplicas >= 1` for HS2. Auto-suspend is the
+only mechanism that overrides this to achieve full hibernation.
+
+```
+                    Auto-Suspend Flow
+
+  1. Autoscaling scales all components to their minReplicas
+     (HS2≥1, HMS≥1, LLAP/TezAM to configured min)
+
+  2. Operator detects idle state:
+     - HS2 has 0 open sessions
+     - HMS at minReplicas (if includeMetastore=true)
+     - LLAP/TezAM at minReplicas
+
+  3. Idle timer starts (status: clusterPhase=Idle, idleSince=<now>)
+
+  4. After idleTimeoutMinutes (default 15):
+     - ALL components scaled to 0 (HMS excluded if includeMetastore=false)
+     - spec.suspend set to true (cluster stays suspended until user wakes it)
+     - Status: clusterPhase=Suspended, suspendedSince=<now>
+
+  5. To wake: kubectl patch hivecluster hive --type=merge -p 
'{"spec":{"suspend":false}}'
+     All components restored to minReplicas
+     (HS2/HMS ≥1, LLAP/TezAM ≥1 for immediate usability)
+
+```
+
+**Configuration:**
+
+```yaml
+cluster:
+  autoSuspend:
+    enabled: true
+    idleTimeoutMinutes: 15    # minutes idle before full hibernation
+    includeMetastore: true    # set false to keep HMS running during suspend
+```
+
+**Manual Suspend/Wake Commands:**
+
+```bash
+# Suspend immediately (bypasses idle timer)
+kubectl patch hivecluster hive --type=merge -p '{"spec":{"suspend":true}}'
+
+# Wake cluster (restores to minReplicas)
+kubectl patch hivecluster hive --type=merge -p '{"spec":{"suspend":false}}'
+```
+
+Manual suspend works regardless of whether `autoSuspend.enabled` is true — it
+immediately scales all components to 0 without waiting for the idle timeout.
+When `includeMetastore: false`, HMS stays running even during manual suspend.
+
+**Observing cluster state:**
+
+```bash
+# Quick view — printer columns show phase and idle time
+kubectl get hivecluster
+```
+```
+NAME   PHASE   IDLE (MIN)   AGE
+hive   Idle    12           2h
+```
+
+```bash
+# After suspend triggers
+kubectl get hivecluster
+```
+```
+NAME   PHASE       IDLE (MIN)   AGE
+hive   Suspended                2h
+```
+
+```bash
+# Full status (kubectl get hivecluster hive -o yaml)
+```
+```yaml
+status:
+  clusterPhase: Suspended
+  idleSince: "2026-06-08T10:00:00Z"
+  idleForMinutes: 15
+  suspendedSince: "2026-06-08T10:15:00Z"
+  conditions:
+    - type: Suspended
+      status: "True"
+      reason: AutoSuspend        # or ManualSuspend
+      message: "Cluster suspended after idle timeout"
+      lastTransitionTime: "2026-06-08T10:15:00Z"
+```
+
+When the cluster is running normally:
+```
+NAME   PHASE     IDLE (MIN)   AGE
+hive   Running                2h
+```
+
+**Full example (autoscaling + auto-suspend):**
+
+```yaml
+cluster:
+  autoSuspend:
+    enabled: true
+    idleTimeoutMinutes: 15
+    includeMetastore: false   # keep HMS running during suspend
+
+  hiveServer2:
+    replicas: 10
+    autoscaling:
+      enabled: true
+      minReplicas: 1
+
+  metastore:
+    replicas: 6
+    autoscaling:
+      enabled: true
+      minReplicas: 1
+
+  llap:
+    replicas: 8
+    autoscaling:
+      enabled: true
+      minReplicas: 0        # scales to 0 via normal autoscaling when HS2 idle
+
+  tezAm:
+    replicas: 10
+    autoscaling:
+      enabled: true
+      minReplicas: 0        # scales to 0 via normal autoscaling when HS2 idle
+```
+
+With this configuration, the cluster lifecycle is:
+1. Under load → all components scaled up by autoscaler
+2. Load drops → autoscaler scales to minReplicas (HS2=1, HMS=1, LLAP=0, 
TezAM=0)
+3. HS2 idle (0 sessions) for 15 minutes → auto-suspend kicks in → HS2, LLAP, 
TezAM to 0 (HMS stays at minReplicas)
+4. `kubectl patch hivecluster hive --type=merge -p 
'{"spec":{"suspend":false}}'` → wake → HS2=1, LLAP=1, TezAM=1
+5. User connects → autoscaler detects sessions → scales up as needed
+
+### CPU-Based Scaling (HS2 and HMS)
+
+In addition to the primary metrics (sessions for HS2, API request rate for 
HMS),
+the operator supports a secondary **CPU-based scaling signal** for HiveServer2 
and
+Metastore. The final desired replica count is:
+
+```
+final_desired = max(metric_desired, cpu_desired)
+```
+
+Either signal can trigger scale-up; neither can force scale-down below what the
+other recommends. CPU-based scaling uses the same stabilization windows as 
metric-based
+scaling (no separate CPU stabilization).
+
+**How it works:**
+
+1. The operator scrapes `ProcessCpuLoad` from `java.lang:type=OperatingSystem` 
via JMX
+   Exporter (exported as `jvm_process_cpu_load`, a 0.0–1.0 fraction)
+2. Averages across all pods, converts to percentage (0–100)
+3. If avg CPU >= `cpuScaleUpThreshold`: scales up proportionally
+   (`ceil(avgCpu * currentReplicas / cpuScaleUpThreshold)`)
+4. If avg CPU < `cpuScaleDownThreshold`: scales down
+   (`ceil(avgCpu * currentReplicas / cpuScaleUpThreshold)`, floored at 
`minReplicas`)
+5. Between thresholds: holds current replica count
+
+**Configuration:**
+
+| Value | Default | Description |
+|-------|---------|-------------|
+| `cluster.<component>.autoscaling.cpuScaleUpThreshold` | `90` | CPU 
percentage (0-100) that triggers scale-up. Set to `0` to disable CPU-based 
scaling. |
+| `cluster.<component>.autoscaling.cpuScaleDownThreshold` | `30` | CPU 
percentage (0-100) below which scale-down is considered. |
+
+**Example:**
+
+```yaml
+cluster:
+  hiveServer2:
+    replicas: 10
+    resources:
+      limitsCpu: "2"        # Recommended: set CPU limits so ProcessCpuLoad is 
relative to pod allocation
+    autoscaling:
+      enabled: true
+      cpuScaleUpThreshold: 90
+      cpuScaleDownThreshold: 30
+
+  metastore:
+    replicas: 6
+    resources:
+      limitsCpu: "2"
+    autoscaling:
+      enabled: true
+      cpuScaleUpThreshold: 90
+      cpuScaleDownThreshold: 30
+```
+
+**Important: CPU limits and metric accuracy**
+
+`ProcessCpuLoad` reports CPU usage as a fraction of **available processors**. 
Without
+CPU limits, the JVM sees all node cores (e.g., 8 cores), so even heavy 
single-pod
+load only shows ~12.5%. With `limitsCpu: "2"`, the JVM sees 2 processors and 
the
+metric becomes "% of allocated CPU" — making thresholds meaningful.
+
+| Pod CPU Limit | JVM sees | 90% threshold means |
+|---------------|----------|---------------------|
+| None (no limit) | All node cores (e.g., 8) | Using 7.2 of 8 cores — very 
hard to reach |
+| `2` | 2 cores | Using 1.8 of 2 allocated cores |
+| `4` | 4 cores | Using 3.6 of 4 allocated cores |
+
+**Recommendation:** Always set `resources.limitsCpu` when using CPU-based 
autoscaling.
+
+**Status output:**
+
+The operator reports CPU metrics in the HiveCluster status:
+
+```yaml
+status:
+  hiveServer2:
+    autoscaling:
+      currentMetricValue: 5           # total sessions
+      scaleUpThreshold: 100
+      currentCpuPercent: 72.45        # avg ProcessCpuLoad * 100
+      cpuScaleUpThreshold: 90
+      cpuProposedReplicas: 2          # what CPU alone would recommend
+      proposedReplicas: 2
+      lastScaleTime: "2026-05-31T04:23:07Z"
+```
+
+**Applicability:** CPU-based scaling only applies to HS2 and HMS. LLAP and 
TezAM
+do not use CPU as a scaling signal (LLAP scales on busy executor slots which 
already
+correlates with CPU; TezAM is demand-based from HS2 session count).
+
+---
+
+### Enabling Autoscaling
+
+**CLI (with Ozone storage backend):**
+
+Each component has sensible per-component defaults (see [Configuration 
Reference](#configuration-reference)).
+Only `enabled=true` is needed to turn on autoscaling:
+
+```bash
+helm install hive ./helm/hive-operator \
+  --set cluster.database.type=postgres \
+  --set 
cluster.database.url="jdbc:postgresql://postgres-postgresql:5432/metastore" \
+  --set cluster.database.driver="org.postgresql.Driver" \
+  --set cluster.database.username=hive \
+  --set cluster.database.passwordSecretRef.name=hive-db-secret \
+  --set cluster.database.passwordSecretRef.key=password \
+  --set 
cluster.database.driverJarUrl="https://repo1.maven.org/maven2/org/postgresql/postgresql/42.7.5/postgresql-42.7.5.jar";
 \
+  --set cluster.zookeeper.quorum="zookeeper:2181" \
+  --set cluster.storage.coreSiteOverrides."fs\.defaultFS"="s3a://hive" \
+  --set 
cluster.storage.coreSiteOverrides."fs\.s3a\.endpoint"="http://ozone-s3g-rest:9878";
 \
+  --set-string 
cluster.storage.coreSiteOverrides."fs\.s3a\.path\.style\.access"=true \
+  --set 'cluster.storage.envVars[0].name=HADOOP_OPTIONAL_TOOLS' \
+  --set 'cluster.storage.envVars[0].value=hadoop-aws' \
+  --set 'cluster.storage.envVars[1].name=AWS_ACCESS_KEY_ID' \
+  --set 'cluster.storage.envVars[1].value=ozone' \
+  --set 'cluster.storage.envVars[2].name=AWS_SECRET_ACCESS_KEY' \
+  --set 'cluster.storage.envVars[2].value=ozone' \
+  --set cluster.hiveServer2.autoscaling.enabled=true \
+  --set cluster.metastore.autoscaling.enabled=true \
+  --set cluster.llap.autoscaling.enabled=true \
+  --set cluster.tezAm.autoscaling.enabled=true
+```
+
+**Values file (for customizing beyond defaults):**
+
+```yaml
+# values-autoscaling.yaml — only override what you need
+cluster:
+  database:
+    type: postgres
+    url: "jdbc:postgresql://postgres-postgresql:5432/metastore"
+    driver: "org.postgresql.Driver"
+    username: hive
+    passwordSecretRef:
+      name: hive-db-secret
+      key: password
+    driverJarUrl: 
"https://repo1.maven.org/maven2/org/postgresql/postgresql/42.7.5/postgresql-42.7.5.jar";
+
+  zookeeper:
+    quorum: "zookeeper:2181"
+
+  storage:
+    coreSiteOverrides:
+      fs.defaultFS: "s3a://hive"
+      fs.s3a.endpoint: "http://ozone-s3g-rest:9878";
+      fs.s3a.path.style.access: "true"
+    envVars:
+      - name: HADOOP_OPTIONAL_TOOLS
+        value: "hadoop-aws"
+      - name: AWS_ACCESS_KEY_ID
+        value: "ozone"
+      - name: AWS_SECRET_ACCESS_KEY
+        value: "ozone"
+
+  hiveServer2:
+    replicas: 10              # Acts as maxReplicas when autoscaling is enabled
+    autoscaling:
+      enabled: true
+      # minReplicas: 1        # default — always keep at least 1 HS2 running
+      # scaleUpThreshold: 80  # default — avg open sessions per pod triggering 
scale-up
+      # scaleUpStabilizationSeconds: 60   # default — scale-up window
+      # scaleDownStabilizationSeconds: 600 # default — scale-down window (also 
acts as cooldown)
+      # metricsScrapeIntervalSeconds: 10  # default — operator scrape interval 
(lower = faster reaction)
+
+  metastore:
+    replicas: 6               # Acts as maxReplicas when autoscaling is enabled
+    autoscaling:
+      enabled: true
+      # minReplicas: 1        # default — always keep at least 1 metastore 
running
+      # scaleUpThreshold: 75  # default — API request rate (req/s) triggering 
scale-up
+      # scaleUpStabilizationSeconds: 60   # default — scale-up window
+      # scaleDownStabilizationSeconds: 300 # default — scale-down window (also 
acts as cooldown)
+      # gracePeriodSeconds: 60 # default — fast drain (HMS is stateless)
+      # metricsScrapeIntervalSeconds: 10  # default — operator scrape interval
+
+  llap:
+    replicas: 8               # Acts as maxReplicas when autoscaling is enabled
+    autoscaling:
+      enabled: true
+      # minReplicas: 0        # default — scale to zero when no HS2 sessions
+      # scaleUpThreshold: 1   # default — total busy slots (queued+running) 
triggering scale-up
+      # scaleUpStabilizationSeconds: 60   # default — scale-up window
+      # scaleDownStabilizationSeconds: 900 # default — scale-down window (long 
— scaling down destroys cache)
+      # gracePeriodSeconds: 600 # default — 10 min drain for in-flight 
fragments
+      # metricsScrapeIntervalSeconds: 10  # default — operator scrape interval 
(lower = faster reaction)
+
+  tezAm:
+    replicas: 10              # Acts as maxReplicas when autoscaling is enabled
+    autoscaling:
+      enabled: true
+      # minReplicas: 0        # default — scale to zero when no HS2 sessions
+      # scaleUpThreshold: 1   # default — threshold for demand metric (1 = 
match HS2 pod count)
+      # scaleUpStabilizationSeconds: 60   # default — HPA scale-up window
+      # scaleDownStabilizationSeconds: 300 # default — HPA scale-down window
+      # gracePeriodSeconds: 120 # default — 2 min drain for DAG completion
+      # metricsScrapeIntervalSeconds: 10  # default — operator scrape interval 
(lower = faster reaction)
+```
+
+```bash
+helm install hive ./helm/hive-operator -f values-autoscaling.yaml
+```
+
+When autoscaling is enabled, the operator automatically:
+- Deploys the JMX Exporter javaagent (port 9404, `/metrics`)
+- Enables `hive.server2.metrics.enabled` / `metastore.metrics.enabled` (JMX 
reporter)
+- Attaches JMX Exporter javaagent (port 9404, `/metrics`) to each pod
+- Creates PodDisruptionBudgets (minAvailable: 1)
+- Configures preStop lifecycle hooks for graceful drain
+- Sets `terminationGracePeriodSeconds` to the configured grace period
+- LLAP/TezAM use HS2 metrics as activation gate (only scale when HS2 has 
sessions)
+
+**JMX Metrics Scraped by Operator (per component):**
+
+| Component | Key Metrics | Purpose |
+|-----------|---------|---------|
+| **HiveServer2** | `hs2_open_sessions`, `jvm_process_cpu_load` | Session 
count for primary scaling + CPU for secondary scaling signal |
+| **Metastore** | `api_*_total`, `jvm_process_cpu_load` | API call counters 
(operator computes request rate from deltas) + CPU for secondary scaling signal 
|
+| **LLAP** | `hadoop_llapdaemon_executornumqueuedrequests`, 
`hadoop_llapdaemon_executornumexecutorsconfigured`, 
`hadoop_llapdaemon_executornumexecutorsavailable` | Total busy slots = queued + 
configured - available |
+| **Tez AM** | N/A (scales on HS2 metrics) | TezAM scaling is demand-driven 
from `hs2_open_sessions` — no TezAM-specific metrics needed |
+
+### Enabling Autoscaling — Example
+
+To enable autoscaling for HS2 and Metastore:
+
+```yaml
+cluster:
+  hiveServer2:
+    replicas: 4                 # max replicas ceiling
+    autoscaling:
+      enabled: true
+      scaleUpThreshold: 1       # scale up when total sessions > 1
+      minReplicas: 1            # always keep at least 1 HS2 pod running
+
+  metastore:
+    replicas: 3                 # max replicas ceiling
+    autoscaling:
+      enabled: true
+      minReplicas: 1            # always keep at least 1 running
+      scaleUpThreshold: 75      # API requests/sec threshold
+```
+
+> **Note:** LLAP scales on total busy slots (queued + running executors).
+> TezAM scales on demand — the number of active HS2 pods multiplied by
+> `hive.server2.tez.sessions.per.default.queue` (default 1).
+
+### Helm Values Reference (Autoscaling)
+
+| Value | Default | Description |
+|-------|---------|-------------|
+| `cluster.<component>.replicas` | `1-2` | Static replica count, or max 
replicas ceiling when autoscaling is enabled |
+| `cluster.<component>.autoscaling.enabled` | `false` | Enable operator-driven 
autoscaling |
+| `cluster.<component>.autoscaling.minReplicas` | `1` (HS2/HMS), `0` 
(LLAP/TezAM) | Minimum replica count. Set to 0 for scale-to-zero (LLAP, TezAM 
only; HS2 minimum is 1) |
+| `cluster.<component>.autoscaling.scaleUpThreshold` | varies | Metric 
threshold triggering scale-up |
+| `cluster.<component>.autoscaling.scaleUpStabilizationSeconds` | `60` | 
Stabilization window for scale-up (picks highest recommendation in window) |
+| `cluster.<component>.autoscaling.scaleDownStabilizationSeconds` | `300-900` 
| Stabilization window for scale-down (picks most conservative recommendation 
in window). Also acts as cooldown between consecutive scale-downs. |
+| `cluster.<component>.autoscaling.gracePeriodSeconds` | `3600` | Safety cap: 
max drain time before forced termination. Pod exits immediately once drain 
completes. |
+| `cluster.<component>.autoscaling.metricsScrapeIntervalSeconds` | `10` | How 
often the operator scrapes JMX metrics from pods. Lower = faster reaction. |
+| `cluster.<component>.autoscaling.cpuScaleUpThreshold` | `90` | CPU 
percentage (0-100) triggering scale-up. Only HS2/HMS. Set to 0 to disable. |
+| `cluster.<component>.autoscaling.cpuScaleDownThreshold` | `30` | CPU 
percentage (0-100) below which scale-down is considered. Only HS2/HMS. |
+
+---
+
 ## Connect to HiveServer2
 
+HiveServer2 runs in **HTTP transport mode** by default (recommended for 
Kubernetes
+environments as it works well with load balancers, ingress controllers, and 
proxies).
+
+### Standard Connection (minReplicas >= 1)
+
+When HS2 always has at least one pod running, connect directly to the service:
+
 ```bash
-kubectl exec -it deployment/hive-hiveserver2 -- beeline -u 
"jdbc:hive2://hive-hiveserver2:10000/"
+kubectl exec -it deployment/hive-hiveserver2 -- beeline -u 
"jdbc:hive2://hive-hiveserver2:10001/;transportMode=http;httpPath=cliservice"
 ```
 
 Or via port-forward:
 
 ```bash
-kubectl port-forward svc/hive-hiveserver2 10000:10000
-beeline -u "jdbc:hive2://localhost:10000/"
+kubectl port-forward svc/hive-hiveserver2 10001:10001
+beeline -u 
"jdbc:hive2://localhost:10001/;transportMode=http;httpPath=cliservice"
 ```
 
+### LLAP/TezAM Scale-to-Zero Behavior
+
+When LLAP and TezAM are configured with `minReplicas: 0` (the default), they 
start
+with zero pods on fresh install. The operator automatically scales them up 
when HS2
+reports open sessions, and scales them back to zero when HS2 is idle.

Review Comment:
   That’s interesting—LLAP can be spun up on demand just like Tez tasks on 
YARN. I’m curious about the speed of spinning up LLAP on Kubernetes 
concurrently. For example, if I need to start 100 LLAP instances at the same 
time to run tasks, will the concurrent startup take a long time?
   
   If LLAP on K8s can start up very quickly, then we might explore using LLAP 
on K8s for certain batch processing tasks that require many LLAP instances to 
run concurrently during specific time windows. If that’s feasible, perhaps Tez 
tasks on K8s wouldn’t be as important—LLAP on K8s might be sufficient.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-29492: Add AutoScaling to K8s operator [hive]

Reply via email to