This is an automated email from the ASF dual-hosted git repository.
jshao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/gravitino.git
The following commit(s) were added to refs/heads/main by this push:
new 7fdba42e06 [#10096] docs(optimizer): add architecture-first optimizer
guide and improve discoverability (#10203)
7fdba42e06 is described below
commit 7fdba42e06425e432e80887e51bde59cc373368f
Author: FANNG <[email protected]>
AuthorDate: Mon Mar 9 22:50:57 2026 +0900
[#10096] docs(optimizer): add architecture-first optimizer guide and
improve discoverability (#10203)
### What changes were proposed in this pull request?
This PR improves optimizer documentation end-to-end and makes it easier
for users to discover and run.
1. Add a dedicated optimizer guide:
- `docs/optimizer.md`
- Includes architecture overview, execution modes, lifecycle,
configuration model, quick starts, CLI reference, troubleshooting, and
related docs.
2. Add docs entry point:
- `docs/index.md`
- Add a link to the optimizer guide from docs home.
3. Improve local Spark prerequisite clarity:
- `docs/manage-jobs-in-gravitino.md`
- Explicitly document `gravitino.jobExecutor.local.sparkHome` /
`SPARK_HOME` for local Spark template execution.
### Why are the changes needed?
Users currently lack a single, structured guide for optimizer workflows.
This patch is needed to:
1. Explain how optimizer works (stats/metrics -> policy -> template ->
job -> verification).
2. Provide practical quick-start steps for both built-in workflow and
CLI-based workflow.
3. Reduce user confusion around local Spark job execution setup and
status verification.
Fix: #10096
### Does this PR introduce _any_ user-facing change?
Yes, docs-only user-facing changes:
1. A new optimizer guide page (`/table-maintenance-service`).
2. A new docs home link pointing to the optimizer guide.
3. Clear prerequisite notes for local Spark job execution in job docs.
No runtime behavior or API contract is changed.
### How was this patch tested?
1. Built docs locally:
- `./gradlew :docs:build -x test`
2. Verified docs build succeeded, including OpenAPI lint as part of docs
build.
3. Manually validated the documented quick-start workflow against a
local deployment (job template checks, job submission, and status/log
verification path).
---------
Co-authored-by: Qi Yu <[email protected]>
Co-authored-by: Copilot <[email protected]>
---
docs/index.md | 3 +
docs/manage-jobs-in-gravitino.md | 7 +
.../optimizer-cli-reference.md | 229 ++++++++++++++++++++
.../optimizer-configuration.md | 106 +++++++++
.../optimizer-extension-guide.md | 128 +++++++++++
.../optimizer-quick-start.md | 241 +++++++++++++++++++++
.../optimizer-troubleshooting.md | 69 ++++++
docs/table-maintenance-service/optimizer.md | 105 +++++++++
8 files changed, 888 insertions(+)
diff --git a/docs/index.md b/docs/index.md
index 3e777cba60..d07e892512 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -104,6 +104,9 @@ If you want to operate table and partition statistics, you
can refer to the [doc
* [**Model catalog**](./model-catalog.md)
+If you want to automate table maintenance workflows, see [Table Maintenance
Service (Optimizer)](./table-maintenance-service/optimizer.md).
+Start with Gravitino built-in policies and built-in job templates, and extend
via optimizer interfaces when needed.
+
Catalogs with an asterisk (\*) aren’t in the standard release tarball and
Docker image since 1.2.0. In 1.2.0, Gravitino introduces
folder `catalogs-contrib` to host the contributed catalogs, which aren’t in
the standard release but can be built and used separately. See [how to build
Gravitino](./how-to-build.md#quick-start) for details.
diff --git a/docs/manage-jobs-in-gravitino.md b/docs/manage-jobs-in-gravitino.md
index 9d31215da7..d72e0fdc87 100644
--- a/docs/manage-jobs-in-gravitino.md
+++ b/docs/manage-jobs-in-gravitino.md
@@ -436,6 +436,13 @@ that the local job executor is only for testing. If you
want to run the job in a
you need to implement your own `JobExecutor` and set the configuration, please
see
[Implement a custom job executor](#implement-a-custom-job-executor) section
below.
+When running a Spark job template with the local executor, configure one of:
+
+- `gravitino.jobExecutor.local.sparkHome`
+- `SPARK_HOME`
+
+If neither is set before the Gravitino server starts, Spark jobs may fail to
start.
+
<Tabs groupId='language' queryString>
<TabItem value="shell" label="Shell">
diff --git a/docs/table-maintenance-service/optimizer-cli-reference.md
b/docs/table-maintenance-service/optimizer-cli-reference.md
new file mode 100644
index 0000000000..6df4e44696
--- /dev/null
+++ b/docs/table-maintenance-service/optimizer-cli-reference.md
@@ -0,0 +1,229 @@
+---
+title: "Optimizer CLI Reference"
+slug: /table-maintenance-service/optimizer-cli-reference
+keyword: table maintenance, optimizer, cli, commands, metrics, statistics
+license: This software is licensed under the Apache License version 2.
+---
+
+Use `--help` to list all commands, or `--help --type <command>` for
command-specific help.
+
+By default, optimizer CLI loads `conf/gravitino-optimizer.conf` from the
current working
+directory. Use `--conf-path` only when you need a custom config file.
+
+## Command quick reference
+
+| Command (`--type`) | Required options | Optional options | Purpose |
+| --- | --- | --- | --- |
+| `submit-strategy-jobs` | `--identifiers`, `--strategy-name` | `--dry-run`,
`--limit` | Recommend and optionally submit jobs |
+| `update-statistics` | `--calculator-name` | `--identifiers`,
`--statistics-payload`, `--file-path` | Calculate and persist statistics |
+| `append-metrics` | `--calculator-name` | `--identifiers`,
`--statistics-payload`, `--file-path` | Calculate and append metrics |
+| `monitor-metrics` | `--identifiers`, `--action-time` | `--range-seconds`,
`--partition-path` | Evaluate rules with before/after metrics |
+| `list-table-metrics` | `--identifiers` | `--partition-path` | Query stored
table or partition metrics |
+| `list-job-metrics` | `--identifiers` | None | Query stored job metrics |
+| `submit-update-stats-job` | `--identifiers` | `--dry-run`, `--update-mode`,
`--updater-options`, `--spark-conf` | Submit built-in Iceberg update
stats/metrics Spark jobs |
+
+### Option field meanings
+
+| Option | Meaning | Used by |
+| --- | --- | --- |
+| `--identifiers` | Comma-separated identifiers. Table format supports
`catalog.schema.table` (or `schema.table` when default catalog is configured).
| Most commands |
+| `--strategy-name` | Policy name to evaluate, for example
`iceberg_compaction_default`. | `submit-strategy-jobs` |
+| `--dry-run` | Preview mode. Prints recommendations or job configs without
submitting jobs. | `submit-strategy-jobs`, `submit-update-stats-job` |
+| `--limit` | Maximum number of strategy jobs to process. Must be `> 0`. |
`submit-strategy-jobs` |
+| `--calculator-name` | Statistics/metrics calculator implementation name (for
example `local-stats-calculator`). | `update-statistics`, `append-metrics` |
+| `--statistics-payload` | Inline JSON Lines content as input. Mutually
exclusive with `--file-path`. | `update-statistics`, `append-metrics` |
+| `--file-path` | Path to JSON Lines input file. Mutually exclusive with
`--statistics-payload`. | `update-statistics`, `append-metrics` |
+| `--action-time` | Action timestamp in epoch seconds used as evaluation
anchor. | `monitor-metrics` |
+| `--range-seconds` | Time window (seconds) for monitor evaluation. Default is
`86400` (24h). | `monitor-metrics` |
+| `--partition-path` | Partition path JSON array, for example
`'[{"dt":"2026-01-01"}]'`. Requires exactly one identifier. |
`monitor-metrics`, `list-table-metrics` |
+| `--update-mode` | Controls what built-in update job updates: `stats`,
`metrics`, or `all` (default). | `submit-update-stats-job` |
+| `--updater-options` | Flat JSON map passed to updater logic. For
`stats`/`all`, include `gravitino_uri` and `metalake`. |
`submit-update-stats-job` |
+| `--spark-conf` | Flat JSON map of Spark and Iceberg catalog configs used by
the job. | `submit-update-stats-job` |
+
+Global option:
+
+- `--conf-path`: Optional custom config file path. If omitted, CLI uses
`conf/gravitino-optimizer.conf`.
+
+## Input format for `local-stats-calculator`
+
+`local-stats-calculator` reads JSON Lines (one JSON object per line).
+
+### Reserved fields
+
+- `stats-type`: `table`, `partition`, or `job`
+- `identifier`: object identifier
+- `partition-path`: only for partition data, for example `{"dt":"2026-01-01"}`
+- `timestamp`: optional epoch seconds (record-level default timestamp for
metric points)
+
+All other fields are treated as metric or statistic values.
+
+### Supported examples by scope
+
+Use JSON Lines (one JSON object per line). The following examples focus on
table, partition, and
+job scopes with multiple metric/statistic fields:
+
+```json
+{"stats-type":"table","identifier":"catalog.db.t1","timestamp":1735689600,"row_count":100}
+{"stats-type":"table","identifier":"catalog.db.t1","row_count":100,"total_file_size":1048576}
+{"stats-type":"table","identifier":"catalog.db.t1","timestamp":1735689660,"row_count":120,"file_count":24,"avg_file_size":10485.76}
+{"stats-type":"partition","identifier":"catalog.db.t1","timestamp":1735689720,"partition-path":{"dt":"2026-01-01"},"row_count":20}
+{"stats-type":"partition","identifier":"catalog.db.t1","partition-path":{"dt":"2026-01-01","region":"us"},"row_count":12,"file_count":3}
+{"stats-type":"job","identifier":"job-1","timestamp":1735689800,"duration_ms":12500,"rewritten_files":18}
+```
+
+### Identifier rules
+
+- Table and partition records: `catalog.schema.table`
+- If `gravitino.optimizer.gravitinoDefaultCatalog` is set, `schema.table` is
also accepted
+- Job records: parsed as a regular Gravitino `NameIdentifier`
+
+## CLI workflow examples
+
+### Update statistics in batch
+
+Calculate and persist table or partition statistics from JSONL input.
+
+```bash
+./bin/gravitino-optimizer.sh \
+ --type update-statistics \
+ --calculator-name local-stats-calculator \
+ --file-path ./table-stats.jsonl
+```
+
+### Append metrics in batch
+
+Calculate and append table or job metrics from JSONL input.
+
+```bash
+./bin/gravitino-optimizer.sh \
+ --type append-metrics \
+ --calculator-name local-stats-calculator \
+ --file-path ./table-stats.jsonl
+```
+
+### Dry-run strategy submission
+
+Preview recommendations without actually submitting jobs.
+
+```bash
+./bin/gravitino-optimizer.sh \
+ --type submit-strategy-jobs \
+ --identifiers rest_catalog.db.t1 \
+ --strategy-name iceberg_compaction_default \
+ --dry-run \
+ --limit 10
+```
+
+### Submit strategy jobs
+
+Submit jobs for identifiers that match the given policy name.
+
+```bash
+./bin/gravitino-optimizer.sh \
+ --type submit-strategy-jobs \
+ --identifiers rest_catalog.db.t1 \
+ --strategy-name iceberg_compaction_default \
+ --limit 10
+```
+
+### Monitor metrics
+
+Evaluate monitor rules around an action time.
+
+```bash
+./bin/gravitino-optimizer.sh \
+ --type monitor-metrics \
+ --identifiers catalog.db.sales \
+ --action-time 1735689600 \
+ --range-seconds 86400
+```
+
+You can configure evaluator rules in `gravitino-optimizer.conf`:
+
+```properties
+gravitino.optimizer.monitor.gravitinoMetricsEvaluator.rules =
table:row_count:avg:le,job:duration:latest:le
+```
+
+Rule format is `scope:metricName:aggregation:comparison`:
+
+- `scope`: `table` or `job` (`table` rules also apply to partition scope)
+- `aggregation`: `max|min|avg|latest`
+- `comparison`: `lt|le|gt|ge|eq|ne`
+
+### Submit built-in update stats jobs
+
+Submit built-in Iceberg update stats/metrics Spark jobs directly.
+
+```bash
+./bin/gravitino-optimizer.sh \
+ --type submit-update-stats-job \
+ --identifiers rest_catalog.db.t1 \
+ --update-mode all \
+ --updater-options
'{"gravitino_uri":"http://localhost:8090","metalake":"test"}' \
+ --spark-conf
'{"spark.sql.catalog.rest_catalog.type":"rest","spark.sql.catalog.rest_catalog.uri":"http://localhost:9001/iceberg","spark.hadoop.fs.defaultFS":"file:///"}'
+```
+
+Notes:
+
+- `--identifiers` supports `catalog.schema.table` or `schema.table` (when
default catalog is configured).
+- `--update-mode` supports `stats|metrics|all` (default `all`).
+- For `stats` or `all`, `--updater-options` must include `gravitino_uri` and
`metalake`.
+- `--spark-conf` and `--updater-options` are flat JSON maps.
+
+### List table metrics
+
+Query stored metrics at table scope.
+
+```bash
+./bin/gravitino-optimizer.sh \
+ --type list-table-metrics \
+ --identifiers catalog.db.sales
+```
+
+For partition scope, provide a partition path JSON array:
+
+```bash
+./bin/gravitino-optimizer.sh \
+ --type list-table-metrics \
+ --identifiers catalog.db.sales \
+ --partition-path '[{"dt":"2026-01-01"}]'
+```
+
+### List job metrics
+
+Query stored metrics at job scope.
+
+```bash
+./bin/gravitino-optimizer.sh \
+ --type list-job-metrics \
+ --identifiers catalog.db.optimizer_job
+```
+
+## Output guide
+
+- `SUMMARY: ...`: summary for `update-statistics` and `append-metrics`
+- `DRY-RUN: ...`: recommendation preview without job submission
+- `SUBMIT: ...`: strategy job or built-in update-stats job submitted
successfully
+- `SUMMARY: submit-update-stats-job ...`: summary for built-in update-stats
submission
+- `MetricsResult{...}`: returned by list commands
+- `EvaluationResult{...}`: returned by monitor command
+
+Examples:
+
+```text
+SUMMARY: statistics totalRecords=3 tableRecords=2 partitionRecords=1
jobRecords=0
+DRY-RUN: strategy=iceberg-data-compaction identifier=rest_catalog.db.t1
score=95 jobTemplate=builtin-iceberg-rewrite-data-files
jobOptions={catalog_name=rest_catalog, table_identifier=db.t1}
+SUBMIT: strategy=iceberg-data-compaction identifier=rest_catalog.db.t1
score=95 jobTemplate=builtin-iceberg-rewrite-data-files
jobOptions={catalog_name=rest_catalog, table_identifier=db.t1}
jobId=1f54c6d3-4e27-4cc8-bdfa-b05ecf59a4c2
+DRY-RUN: identifier=rest_catalog.db.t1
jobTemplate=builtin-iceberg-update-stats jobConfig={catalog_name=rest_catalog,
table_identifier=db.t1, update_mode=all,
updater_options={"gravitino_uri":"http://localhost:8090","metalake":"test"},
spark_conf={"spark.master":"local[2]","spark.hadoop.fs.defaultFS":"file:///"}}
+SUMMARY: submit-update-stats-job total=1 submitted=1 dryRun=false
+MetricsResult{scopeType=TABLE, identifier=rest_catalog.db.t1,
partitionPath=<table-or-job-scope>, metrics={row_count=[{timestamp=1735689600,
value=100}]}}
+EvaluationResult{scopeType=TABLE, identifier=rest_catalog.db.t1,
partitionPath=<table-or-job-scope>, evaluation=true,
evaluatorName=gravitino-metrics-evaluator, actionTimeSeconds=1735689600,
rangeSeconds=86400,
beforeMetrics={row_count=[MetricSample{timestampSeconds=1735686000,
value=120}]},
afterMetrics={row_count=[MetricSample{timestampSeconds=1735689600, value=100}]}}
+```
+
+## Related docs
+
+- [Table Maintenance Service (Optimizer)](./optimizer.md)
+- [Optimizer Configuration](./optimizer-configuration.md)
+- [Optimizer Extension Guide](./optimizer-extension-guide.md)
+- [Optimizer Quick Start and Verification](./optimizer-quick-start.md)
+- [Optimizer Troubleshooting](./optimizer-troubleshooting.md)
diff --git a/docs/table-maintenance-service/optimizer-configuration.md
b/docs/table-maintenance-service/optimizer-configuration.md
new file mode 100644
index 0000000000..bb67584452
--- /dev/null
+++ b/docs/table-maintenance-service/optimizer-configuration.md
@@ -0,0 +1,106 @@
+---
+title: "Optimizer Configuration"
+slug: /table-maintenance-service/optimizer-configuration
+keyword: table maintenance, optimizer, configuration, job template, spark
+license: This software is licensed under the Apache License version 2.
+---
+
+## Configuration layers
+
+Use these layers together:
+
+| Layer | Scope | Typical keys |
+| --- | --- | --- |
+| Gravitino server config | Runtime for job manager and executor |
`gravitino.job.executor`, `gravitino.job.statusPullIntervalInMs`,
`gravitino.jobExecutor.local.sparkHome` |
+| Job submission `jobConf` | Per job run | `catalog_name`, `table_identifier`,
`spark_*`, template-specific args |
+| Optimizer CLI config | CLI commands | `gravitino.optimizer.*` in
`conf/gravitino-optimizer.conf` |
+
+## Server-side configuration
+
+Set server-level runtime behavior in `gravitino.conf`.
+
+```properties
+gravitino.job.executor=local
+gravitino.job.statusPullIntervalInMs=300000
+gravitino.jobExecutor.local.sparkHome=/path/to/spark
+```
+
+For local demo environments, you can reduce
`gravitino.job.statusPullIntervalInMs` to get faster status updates.
+
+## Built-in update stats `jobConf`
+
+Use `builtin-iceberg-update-stats` with at least these keys:
+
+```json
+{
+ "catalog_name": "rest_catalog",
+ "table_identifier": "db.t1",
+ "update_mode": "all",
+ "updater_options":
"{\"gravitino_uri\":\"http://localhost:8090\",\"metalake\":\"test\",\"statistics_updater\":\"gravitino-statistics-updater\",\"metrics_updater\":\"gravitino-metrics-updater\"}",
+ "spark_conf":
"{\"spark.master\":\"local[2]\",\"spark.hadoop.fs.defaultFS\":\"file:///\"}",
+ "spark_master": "local[2]",
+ "spark_executor_instances": "1",
+ "spark_executor_cores": "1",
+ "spark_executor_memory": "1g",
+ "spark_driver_memory": "1g",
+ "catalog_type": "rest",
+ "catalog_uri": "http://localhost:9001/iceberg",
+ "warehouse_location": ""
+}
+```
+
+`warehouse_location` can be empty for local filesystem testing. Set it to your
warehouse URI
+for HDFS or cloud object storage environments.
+
+## Strategy submission configuration
+
+`submit-strategy-jobs` needs optimizer CLI config. This is a minimal working
example:
+
+```properties
+gravitino.optimizer.gravitinoUri = http://localhost:8090
+gravitino.optimizer.gravitinoMetalake = test
+gravitino.optimizer.gravitinoDefaultCatalog = rest_catalog
+gravitino.optimizer.recommender.statisticsProvider =
gravitino-statistics-provider
+gravitino.optimizer.recommender.strategyProvider = gravitino-strategy-provider
+gravitino.optimizer.recommender.tableMetaProvider =
gravitino-table-metadata-provider
+gravitino.optimizer.recommender.jobSubmitter = gravitino-job-submitter
+gravitino.optimizer.strategyHandler.iceberg-data-compaction.className =
org.apache.gravitino.maintenance.optimizer.recommender.handler.compaction.CompactionStrategyHandler
+gravitino.optimizer.jobSubmitterConfig.catalog_name = rest_catalog
+gravitino.optimizer.jobSubmitterConfig.spark_master = local[2]
+gravitino.optimizer.jobSubmitterConfig.spark_executor_instances = 1
+gravitino.optimizer.jobSubmitterConfig.spark_executor_cores = 1
+gravitino.optimizer.jobSubmitterConfig.spark_executor_memory = 1g
+gravitino.optimizer.jobSubmitterConfig.spark_driver_memory = 1g
+gravitino.optimizer.jobSubmitterConfig.catalog_type = rest
+gravitino.optimizer.jobSubmitterConfig.catalog_uri =
http://localhost:9001/iceberg
+# Leave empty for local filesystem; set to your warehouse URI for cloud/HDFS
storage.
+gravitino.optimizer.jobSubmitterConfig.warehouse_location =
+gravitino.optimizer.jobSubmitterConfig.spark_conf =
{"spark.master":"local[2]","spark.hadoop.fs.defaultFS":"file:///"}
+```
+
+`--strategy-name` must be the policy name, for example
`iceberg_compaction_default`.
+
+## Local filesystem note
+
+If your environment is local and not HDFS-based, set:
+
+```properties
+spark.hadoop.fs.defaultFS=file:///
+```
+
+Without this, Spark jobs may try `hdfs://localhost:9000` and fail.
+
+## Recommended validation checklist
+
+- Job templates exist: `builtin-iceberg-update-stats`,
`builtin-iceberg-rewrite-data-files`.
+- Policies are attached to target tables.
+- `submit-strategy-jobs` prints `SUBMIT` lines.
+- Rewrite logs show `Rewritten data files: <N>` where `N > 0` for non-empty
tables.
+
+## Related docs
+
+- [Table Maintenance Service (Optimizer)](./optimizer.md)
+- [Optimizer Extension Guide](./optimizer-extension-guide.md)
+- [Optimizer Quick Start and Verification](./optimizer-quick-start.md)
+- [Optimizer CLI Reference](./optimizer-cli-reference.md)
+- [Optimizer Troubleshooting](./optimizer-troubleshooting.md)
diff --git a/docs/table-maintenance-service/optimizer-extension-guide.md
b/docs/table-maintenance-service/optimizer-extension-guide.md
new file mode 100644
index 0000000000..25d1c0ee58
--- /dev/null
+++ b/docs/table-maintenance-service/optimizer-extension-guide.md
@@ -0,0 +1,128 @@
+---
+title: "Optimizer Extension Guide"
+slug: /table-maintenance-service/extension-guide
+keyword: table maintenance, optimizer, extension, provider, ServiceLoader
+license: This software is licensed under the Apache License version 2.
+---
+
+Use this guide when built-in optimizer components do not match your
environment and you need custom implementations.
+
+## Extension model
+
+Optimizer supports three loading patterns:
+
+1. `Provider` SPI (`name()` + `initialize()`): loaded by `ServiceLoader` and
selected by config value.
+2. Class-name mapping for strategy handlers and job adapters.
+3. Typed SPI for `StatisticsCalculator` and `MetricsEvaluator`.
+
+## Extension points and config keys
+
+| Area | Interface / type | Config key | Loading mode |
+| --- | --- | --- | --- |
+| Recommender statistics | `StatisticsProvider` |
`gravitino.optimizer.recommender.statisticsProvider` | `Provider` SPI by
`name()` |
+| Recommender strategy source | `StrategyProvider` |
`gravitino.optimizer.recommender.strategyProvider` | `Provider` SPI by `name()`
|
+| Recommender table metadata | `TableMetadataProvider` |
`gravitino.optimizer.recommender.tableMetaProvider` | `Provider` SPI by
`name()` |
+| Recommender job submission | `JobSubmitter` |
`gravitino.optimizer.recommender.jobSubmitter` | `Provider` SPI by `name()` |
+| Strategy evaluation logic | `StrategyHandler` |
`gravitino.optimizer.strategyHandler.<strategyType>.className` | Reflection by
class name |
+| Job template adaptation | `GravitinoJobAdapter` |
`gravitino.optimizer.jobAdapter.<jobTemplate>.className` | Reflection by class
name |
+| Update statistics sink | `StatisticsUpdater` |
`gravitino.optimizer.updater.statisticsUpdater` | `Provider` SPI by `name()` |
+| Update metrics sink | `MetricsUpdater` |
`gravitino.optimizer.updater.metricsUpdater` | `Provider` SPI by `name()` |
+| Monitor metrics source | `MetricsProvider` |
`gravitino.optimizer.monitor.metricsProvider` | `Provider` SPI by `name()` |
+| Monitor table-job relation | `TableJobRelationProvider` |
`gravitino.optimizer.monitor.tableJobRelationProvider` | `Provider` SPI by
`name()` |
+| Monitor evaluator | `MetricsEvaluator` |
`gravitino.optimizer.monitor.metricsEvaluator` | Typed SPI
(`ServiceLoader<MetricsEvaluator>`) |
+| Monitor callbacks | `MonitorCallback` |
`gravitino.optimizer.monitor.callbacks` | `Provider` SPI by `name()`
(comma-separated) |
+| CLI calculator | `StatisticsCalculator` | CLI `--calculator-name` | Typed
SPI (`ServiceLoader<StatisticsCalculator>`) |
+
+## Implement a custom provider
+
+Most extension points use `Provider`:
+
+```java
+public class MyStatisticsProvider implements StatisticsProvider {
+ @Override
+ public String name() {
+ return "my-statistics-provider";
+ }
+
+ @Override
+ public void initialize(OptimizerEnv optimizerEnv) {
+ // Initialize clients/resources from optimizer config.
+ }
+
+ @Override
+ public void close() throws Exception {}
+}
+```
+
+Requirements:
+
+- Keep a stable `name()` value; config resolves by this name
(case-insensitive).
+- Provide a public no-arg constructor.
+- Implement `initialize(OptimizerEnv)` and `close()` lifecycle correctly.
+
+## Register with ServiceLoader
+
+### For `Provider` implementations
+
+Create file:
+
+`META-INF/services/org.apache.gravitino.maintenance.optimizer.api.common.Provider`
+
+Add your implementation class name per line:
+
+```text
+com.example.optimizer.MyStatisticsProvider
+com.example.optimizer.MyJobSubmitter
+```
+
+### For `StatisticsCalculator`
+
+Create file:
+
+`META-INF/services/org.apache.gravitino.maintenance.optimizer.api.updater.StatisticsCalculator`
+
+### For `MetricsEvaluator`
+
+Create file:
+
+`META-INF/services/org.apache.gravitino.maintenance.optimizer.api.monitor.MetricsEvaluator`
+
+## Configure `gravitino-optimizer.conf`
+
+```properties
+gravitino.optimizer.recommender.statisticsProvider = my-statistics-provider
+gravitino.optimizer.recommender.jobSubmitter = my-job-submitter
+
+gravitino.optimizer.strategyHandler.my-strategy.className =
com.example.optimizer.MyStrategyHandler
+gravitino.optimizer.jobAdapter.my-job-template.className =
com.example.optimizer.MyJobAdapter
+
+gravitino.optimizer.monitor.metricsEvaluator = my-metrics-evaluator
+```
+
+Notes:
+
+- `strategyHandler.<strategyType>.className` must match `strategy.type` in
policy content.
+- `jobAdapter.<jobTemplate>.className` must match the target job template name.
+- `jobSubmitterConfig.*` entries are passed to job submitters as shared
runtime options.
+
+## Package and deploy
+
+- Build a JAR containing your classes and `META-INF/services` files.
+- Put the JAR on optimizer runtime classpath, for example
`${GRAVITINO_HOME}/optimizer/libs/`.
+- Restart optimizer process before testing.
+
+If you also extend Gravitino server job execution, see [Manage jobs in
Gravitino](../manage-jobs-in-gravitino.md).
+
+## Validation checklist
+
+1. `--help` shows no load-time SPI errors.
+2. Commands using your extension run without `No ... found for provider name`
errors.
+3. Strategy flow can resolve both handler and job adapter mappings.
+4. Dry-run (`submit-strategy-jobs --dry-run`) prints expected recommendations.
+
+## Related docs
+
+- [Table Maintenance Service (Optimizer)](./optimizer.md)
+- [Optimizer Configuration](./optimizer-configuration.md)
+- [Optimizer CLI Reference](./optimizer-cli-reference.md)
+- [Optimizer Troubleshooting](./optimizer-troubleshooting.md)
diff --git a/docs/table-maintenance-service/optimizer-quick-start.md
b/docs/table-maintenance-service/optimizer-quick-start.md
new file mode 100644
index 0000000000..182402dbcd
--- /dev/null
+++ b/docs/table-maintenance-service/optimizer-quick-start.md
@@ -0,0 +1,241 @@
+---
+title: "Optimizer Quick Start and Verification"
+slug: /table-maintenance-service/quick-start
+keyword: table maintenance, optimizer, quick start, compaction, update stats
+license: This software is licensed under the Apache License version 2.
+---
+
+## Before running quick start
+
+- Prepare a running Gravitino server.
+- Ensure target metalake exists (examples use `test`).
+- Configure `SPARK_HOME` or `gravitino.jobExecutor.local.sparkHome` for Spark
templates.
+- If your Iceberg REST backend is in-memory, avoid restarting it during this
quick start because
+ restart resets metadata and data files.
+
+For full config details, see [Optimizer
Configuration](./optimizer-configuration.md).
+
+## Success criteria
+
+- Update-stats job finishes and statistics include `custom-data-file-mse` and
`custom-delete-file-number`.
+- `submit-strategy-jobs` prints `SUBMIT` with a rewrite job ID.
+- Rewrite job log shows `Rewritten data files: <N>` where `N > 0` for
non-empty tables.
+
+## Quick start A: built-in table maintenance workflow
+
+This workflow uses:
+
+- Built-in policy type: `system_iceberg_compaction`
+- Built-in update stats job template: `builtin-iceberg-update-stats`
+- Built-in rewrite data files job template:
`builtin-iceberg-rewrite-data-files`
+
+### 1. Preflight checks
+
+```bash
+# Check metalake
+curl -sS "http://localhost:8090/api/metalakes/test" | jq
+
+# Check built-in templates
+curl -sS
"http://localhost:8090/api/metalakes/test/jobs/templates?details=true" | jq
'.jobTemplates[].name'
+```
+
+Expected names include:
+
+- `builtin-iceberg-update-stats`
+- `builtin-iceberg-rewrite-data-files`
+
+If missing, verify `gravitino-jobs` JAR in `auxlib`, then restart Gravitino.
+
+### 2. Prepare demo metadata objects
+
+Create a REST Iceberg catalog, schema, and table:
+
+```bash
+# Create catalog (ignore "already exists" errors)
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "name": "rest_catalog",
+ "type": "RELATIONAL",
+ "comment": "Iceberg REST catalog",
+ "provider": "lakehouse-iceberg",
+ "properties": {
+ "catalog-backend": "rest",
+ "uri": "http://localhost:9001/iceberg"
+ }
+ }' \
+ http://localhost:8090/api/metalakes/test/catalogs
+
+# Create schema
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "name": "db",
+ "comment": "optimizer demo schema",
+ "properties": {}
+ }' \
+ http://localhost:8090/api/metalakes/test/catalogs/rest_catalog/schemas
+
+# Create table
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "name": "t1",
+ "comment": "optimizer demo table",
+ "columns": [
+ {"name": "id", "type": "integer", "nullable": true},
+ {"name": "name", "type": "string", "nullable": true}
+ ],
+ "properties": {}
+ }' \
+
http://localhost:8090/api/metalakes/test/catalogs/rest_catalog/schemas/db/tables
+```
+
+### 3. Seed demo data (recommended)
+
+Use Spark SQL to create enough small files so compaction has visible effect:
+
+```bash
+${SPARK_HOME}/bin/spark-sql \
+ --conf spark.hadoop.fs.defaultFS=file:/// \
+ --conf spark.sql.catalog.rest_demo=org.apache.iceberg.spark.SparkCatalog \
+ --conf spark.sql.catalog.rest_demo.type=rest \
+ --conf spark.sql.catalog.rest_demo.uri=http://localhost:9001/iceberg \
+ -e "CREATE NAMESPACE IF NOT EXISTS rest_demo.db; \
+ SET spark.sql.files.maxRecordsPerFile=1000; \
+ INSERT INTO rest_demo.db.t1 \
+ SELECT id, concat('name_', CAST(id AS STRING)) FROM range(0, 100000);"
+```
+
+### 4. Create and attach built-in compaction policy
+
+```bash
+# Create policy
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "name": "iceberg_compaction_default",
+ "comment": "Built-in iceberg compaction policy",
+ "policyType": "system_iceberg_compaction",
+ "enabled": true,
+ "content": {}
+ }' \
+ http://localhost:8090/api/metalakes/test/policies
+
+# Attach policy to table
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "policiesToAdd": ["iceberg_compaction_default"]
+ }' \
+
http://localhost:8090/api/metalakes/test/objects/table/rest_catalog.db.t1/policies
+```
+
+Verify association:
+
+```bash
+curl -sS
"http://localhost:8090/api/metalakes/test/objects/table/rest_catalog.db.t1/policies?details=true"
| jq
+```
+
+### 5. Submit built-in update stats job
+
+```bash
+update_stats_job_id=$(curl -sS -X POST -H "Accept:
application/vnd.gravitino.v1+json" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "jobTemplateName": "builtin-iceberg-update-stats",
+ "jobConf": {
+ "catalog_name": "rest_catalog",
+ "table_identifier": "db.t1",
+ "update_mode": "all",
+ "updater_options":
"{\"gravitino_uri\":\"http://localhost:8090\",\"metalake\":\"test\",\"statistics_updater\":\"gravitino-statistics-updater\",\"metrics_updater\":\"gravitino-metrics-updater\"}",
+ "spark_conf":
"{\"spark.master\":\"local[2]\",\"spark.hadoop.fs.defaultFS\":\"file:///\"}",
+ "spark_master": "local[2]",
+ "spark_executor_instances": "1",
+ "spark_executor_cores": "1",
+ "spark_executor_memory": "1g",
+ "spark_driver_memory": "1g",
+ "catalog_type": "rest",
+ "catalog_uri": "http://localhost:9001/iceberg",
+ "warehouse_location": ""
+ }
+ }' \
+ http://localhost:8090/api/metalakes/test/jobs/runs | jq -r '.job.jobId')
+
+echo "update-stats job id: ${update_stats_job_id}"
+```
+
+### 6. Trigger rewrite submission with `submit-strategy-jobs`
+
+```bash
+# Required optimizer CLI config for strategy submission.
+# Note: --strategy-name is policy name, not strategy.type.
+cat > /tmp/gravitino-optimizer-submit.conf <<'EOF_CONF'
+gravitino.optimizer.gravitinoUri = http://localhost:8090
+gravitino.optimizer.gravitinoMetalake = test
+gravitino.optimizer.gravitinoDefaultCatalog = rest_catalog
+gravitino.optimizer.recommender.statisticsProvider =
gravitino-statistics-provider
+gravitino.optimizer.recommender.strategyProvider = gravitino-strategy-provider
+gravitino.optimizer.recommender.tableMetaProvider =
gravitino-table-metadata-provider
+gravitino.optimizer.recommender.jobSubmitter = gravitino-job-submitter
+gravitino.optimizer.strategyHandler.iceberg-data-compaction.className =
org.apache.gravitino.maintenance.optimizer.recommender.handler.compaction.CompactionStrategyHandler
+gravitino.optimizer.jobSubmitterConfig.catalog_name = rest_catalog
+gravitino.optimizer.jobSubmitterConfig.spark_master = local[2]
+gravitino.optimizer.jobSubmitterConfig.spark_executor_instances = 1
+gravitino.optimizer.jobSubmitterConfig.spark_executor_cores = 1
+gravitino.optimizer.jobSubmitterConfig.spark_executor_memory = 1g
+gravitino.optimizer.jobSubmitterConfig.spark_driver_memory = 1g
+gravitino.optimizer.jobSubmitterConfig.catalog_type = rest
+gravitino.optimizer.jobSubmitterConfig.catalog_uri =
http://localhost:9001/iceberg
+# Leave empty for local filesystem; set to your warehouse URI for cloud/HDFS
storage.
+gravitino.optimizer.jobSubmitterConfig.warehouse_location =
+gravitino.optimizer.jobSubmitterConfig.spark_conf =
{"spark.master":"local[2]","spark.hadoop.fs.defaultFS":"file:///"}
+EOF_CONF
+
+# Optional: preview recommendations without submitting jobs.
+./bin/gravitino-optimizer.sh \
+ --type submit-strategy-jobs \
+ --identifiers rest_catalog.db.t1 \
+ --strategy-name iceberg_compaction_default \
+ --dry-run \
+ --limit 10 \
+ --conf-path /tmp/gravitino-optimizer-submit.conf
+
+# Submit rewrite job through strategy evaluation.
+submit_output=$(./bin/gravitino-optimizer.sh \
+ --type submit-strategy-jobs \
+ --identifiers rest_catalog.db.t1 \
+ --strategy-name iceberg_compaction_default \
+ --limit 10 \
+ --conf-path /tmp/gravitino-optimizer-submit.conf)
+echo "${submit_output}"
+
+strategy_job_id=$(echo "${submit_output}" | sed -n
's/.*jobId=\([^[:space:]]*\).*/\1/p')
+[[ -z "${strategy_job_id}" ]] && echo 'ERROR: failed to extract strategy job
ID' && exit 1
+echo "strategy rewrite job id: ${strategy_job_id}"
+```
+
+### 7. Track status and verify results
+
+```bash
+# Check job status by id
+curl -sS
"http://localhost:8090/api/metalakes/test/jobs/runs/${update_stats_job_id}" | jq
+curl -sS
"http://localhost:8090/api/metalakes/test/jobs/runs/${strategy_job_id}" | jq
+
+# Verify table statistics after update-stats
+curl -sS
"http://localhost:8090/api/metalakes/test/objects/table/rest_catalog.db.t1/statistics"
| jq
+
+# Staging path is controlled by `gravitino.job.stagingDir` (default:
`/tmp/gravitino/jobs/staging`).
+# Verify rewrite actually rewrote files (N should be > 0 for non-empty table)
+grep -E "Rewritten data files|Added data files|completed successfully" \
+
"/tmp/gravitino/jobs/staging/test/builtin-iceberg-rewrite-data-files/${strategy_job_id}/error.log"
+```
+
+By default, Gravitino pulls job status every `300000` ms
(`gravitino.job.statusPullIntervalInMs`).
+REST status may lag real Spark process state by up to about 5 minutes.
+
+## Next read
+
+- [Optimizer Configuration](./optimizer-configuration.md)
+- [Optimizer CLI Reference](./optimizer-cli-reference.md)
+- [Optimizer Troubleshooting](./optimizer-troubleshooting.md)
diff --git a/docs/table-maintenance-service/optimizer-troubleshooting.md
b/docs/table-maintenance-service/optimizer-troubleshooting.md
new file mode 100644
index 0000000000..bb9879da9d
--- /dev/null
+++ b/docs/table-maintenance-service/optimizer-troubleshooting.md
@@ -0,0 +1,69 @@
+---
+title: "Optimizer Troubleshooting"
+slug: /table-maintenance-service/troubleshooting
+keyword: table maintenance, optimizer, troubleshooting, spark, strategy
+license: This software is licensed under the Apache License version 2.
+---
+
+## `Invalid --type`
+
+Use kebab-case values such as `update-statistics`, not `update_statistics`.
+
+## `--statistics-payload and --file-path cannot be used together`
+
+For `local-stats-calculator`, use exactly one of them.
+
+## `requires one of --statistics-payload or --file-path`
+
+When `--calculator-name local-stats-calculator` is used, one input source is
required.
+
+## `--partition-path must be a JSON array`
+
+Use a JSON array format, for example:
+
+```text
+[{"dt":"2026-01-01"}]
+```
+
+## Job status appears stale (`queued` or `started` for a long time)
+
+Check `gravitino.job.statusPullIntervalInMs` and local staging logs under:
+
+`/tmp/gravitino/jobs/staging/<metalake>/<job-template-name>/<job-id>/error.log`.
+
+## `No identifiers matched strategy name ...`
+
+`--strategy-name` must be the policy name (for example
`iceberg_compaction_default`), not the policy type
(`system_iceberg_compaction`) and not the strategy type
(`iceberg-data-compaction`).
+
+## Dry-run returns no `DRY-RUN` or `SUBMIT` lines
+
+This usually means trigger conditions are not met. For compaction, verify
`custom-data-file-mse` and `custom-delete-file-number` in table statistics are
large enough to satisfy policy rules.
+
+## `No StrategyHandler class configured for strategy type ...`
+
+Add strategy handler mapping to optimizer config, for example:
+
+```properties
+gravitino.optimizer.strategyHandler.iceberg-data-compaction.className =
org.apache.gravitino.maintenance.optimizer.recommender.handler.compaction.CompactionStrategyHandler
+```
+
+If you already use the packaged default optimizer config, this mapping may
already exist.
+
+## Spark job fails with `hdfs://localhost:9000` or filesystem errors
+
+Set local filesystem explicitly in Spark config:
+
+```properties
+spark.hadoop.fs.defaultFS=file:///
+```
+
+## `Specified optimizer config file does not exist`
+
+Check your `--conf-path` and file permissions.
+
+## Related docs
+
+- [Table Maintenance Service (Optimizer)](./optimizer.md)
+- [Optimizer Configuration](./optimizer-configuration.md)
+- [Optimizer Quick Start and Verification](./optimizer-quick-start.md)
+- [Optimizer CLI Reference](./optimizer-cli-reference.md)
diff --git a/docs/table-maintenance-service/optimizer.md
b/docs/table-maintenance-service/optimizer.md
new file mode 100644
index 0000000000..ed19ead78f
--- /dev/null
+++ b/docs/table-maintenance-service/optimizer.md
@@ -0,0 +1,105 @@
+---
+title: "Table Maintenance Service (Optimizer)"
+slug: /table-maintenance-service/optimizer
+keyword: table maintenance, optimizer, statistics, metrics, monitor
+license: This software is licensed under the Apache License version 2.
+---
+
+## What is this service
+
+The Table Maintenance Service (Optimizer) automates table maintenance by
connecting:
+
+- Statistics and metrics collection
+- Rule evaluation and strategy recommendation
+- Job template based execution
+
+The CLI commands and configuration keys use the `optimizer` name.
+
+## Architecture overview
+
+The optimizer workflow is based on six parts:
+
+1. Metadata objects: catalog/schema/table in a metalake.
+2. Statistics and metrics: table/partition signals used for decision making.
+3. Policies: strategy intent, for example `system_iceberg_compaction`.
+4. Job templates: executable contracts, for example built-in Spark templates.
+5. Job executor: local or custom backend that runs submitted jobs.
+6. Status and logs: REST job state plus local staging logs.
+
+Typical data flow:
+
+1. Collect statistics and metrics for target tables.
+2. Evaluate rules and produce candidate actions.
+3. Submit jobs using a concrete template and `jobConf`.
+4. Track status and verify results on table metadata and logs.
+
+## Execution modes
+
+| Mode | Main entry | Best for | Output |
+| --- | --- | --- | --- |
+| Built-in maintenance workflow | Gravitino REST + built-in templates |
Server-side operational runs | Submitted Spark jobs and updated metadata |
+| Optimizer CLI local calculator | `gravitino-optimizer.sh` | Local
file-driven testing and batch scripts | Statistics/metrics updates and optional
submissions |
+
+Use built-in maintenance workflow when you want policy-driven server execution.
+Use CLI local calculator when you want to feed JSONL input directly.
+
+## Start here
+
+- Configuration first: read [Optimizer
Configuration](./optimizer-configuration.md).
+- Need custom integrations: read [Optimizer Extension
Guide](./optimizer-extension-guide.md).
+- First-time enablement: run [Optimizer Quick Start and
Verification](./optimizer-quick-start.md).
+- CLI-only usage: read [Optimizer CLI Reference](./optimizer-cli-reference.md).
+- Runtime failures or mismatched results: check [Optimizer
Troubleshooting](./optimizer-troubleshooting.md).
+
+## Lifecycle
+
+### 1. Collect
+
+Generate or ingest table and partition statistics/metrics.
+
+### 2. Evaluate
+
+Apply policies and rules to decide whether maintenance should run.
+
+### 3. Submit
+
+Pick a job template and submit job with concrete `jobConf`.
+
+### 4. Observe
+
+Check REST job status and validate resulting statistics, metrics, or rewritten
data files.
+
+## Configuration model
+
+| Layer | Scope | Typical keys |
+| --- | --- | --- |
+| Gravitino server config | Runtime for job manager and executor |
`gravitino.job.executor`, `gravitino.job.statusPullIntervalInMs`,
`gravitino.jobExecutor.local.sparkHome` |
+| Job submission `jobConf` | Per job run | `catalog_name`, `table_identifier`,
`spark_*`, template-specific args |
+| Optimizer CLI config | CLI commands | `gravitino.optimizer.*` in
`conf/gravitino-optimizer.conf` |
+
+## Terminology mapping
+
+| Term | Example value | Used in |
+| --- | --- | --- |
+| Policy name | `iceberg_compaction_default` | Policy identity and CLI
`--strategy-name` |
+| Policy type | `system_iceberg_compaction` | REST policy creation field
`policyType` |
+| Strategy type | `iceberg-data-compaction` | Policy content field
`strategy.type` and strategy handler config key |
+
+For strategy submission, `--strategy-name` must use policy name, not policy
type or strategy type.
+
+## Prerequisites and verification
+
+Quick start prerequisites and success checks are documented in
+[Optimizer Quick Start and Verification](./optimizer-quick-start.md).
+
+## Related docs
+
+- [Optimizer Configuration](./optimizer-configuration.md)
+- [Optimizer Extension Guide](./optimizer-extension-guide.md)
+- [Optimizer Quick Start and Verification](./optimizer-quick-start.md)
+- [Optimizer CLI Reference](./optimizer-cli-reference.md)
+- [Optimizer Troubleshooting](./optimizer-troubleshooting.md)
+- [Manage policies in Gravitino](../manage-policies-in-gravitino.md)
+- [Iceberg compaction policy](../iceberg-compaction-policy.md)
+- [Manage jobs in Gravitino](../manage-jobs-in-gravitino.md)
+- [Manage statistics in Gravitino](../manage-statistics-in-gravitino.md)