Re: [PR] [MINOR] docs: add architecture-first optimizer guide and improve discoverability [gravitino]

via GitHub Mon, 09 Mar 2026 05:49:26 -0700


FANNG1 commented on code in PR #10203:
URL: https://github.com/apache/gravitino/pull/10203#discussion_r2905223915



##########
docs/table-maintenance-service/optimizer-quick-start.md:
##########
@@ -0,0 +1,237 @@
+---
+title: "Optimizer Quick Start and Verification"
+slug: /table-maintenance-service/quick-start
+keyword: table maintenance, optimizer, quick start, compaction, update stats
+license: This software is licensed under the Apache License version 2.
+---
+
+## Before running quick start
+
+- Prepare a running Gravitino server.
+- Ensure target metalake exists (examples use `test`).
+- Configure `SPARK_HOME` or `gravitino.jobExecutor.local.sparkHome` for Spark 
templates.
+- If your Iceberg REST backend is in-memory, metadata is reset after restart.

Review Comment:
   Good point. I clarified this in quick start prerequisites: if the Iceberg 
REST backend is in-memory, avoid restarting it during the walkthrough because 
restart resets metadata and data files, which can break the following steps.



##########
docs/table-maintenance-service/optimizer.md:
##########
@@ -0,0 +1,115 @@
+---
+title: "Table Maintenance Service (Optimizer)"
+slug: /table-maintenance-service
+keyword: table maintenance, optimizer, statistics, metrics, monitor
+license: This software is licensed under the Apache License version 2.
+---
+
+## What is this service
+
+The Table Maintenance Service (Optimizer) automates table maintenance by 
connecting:
+
+- Statistics and metrics collection
+- Rule evaluation and strategy recommendation
+- Job template based execution
+
+The CLI commands and configuration keys use the `optimizer` name.
+
+## Architecture overview
+
+The optimizer workflow is based on six parts:
+
+1. Metadata objects: catalog/schema/table in a metalake.
+2. Statistics and metrics: table/partition signals used for decision making.
+3. Policies: strategy intent, for example `system_iceberg_compaction`.
+4. Job templates: executable contracts, for example built-in Spark templates.
+5. Job executor: local or custom backend that runs submitted jobs.
+6. Status and logs: REST job state plus local staging logs.

Review Comment:
   Updated. I added a relationship graph in optimizer.md under Architecture 
overview to show the flow among metadata, stats/metrics, policy strategy, 
templates, executor, and job status/logs.



##########
docs/table-maintenance-service/optimizer-quick-start.md:
##########
@@ -0,0 +1,237 @@
+---
+title: "Optimizer Quick Start and Verification"
+slug: /table-maintenance-service/quick-start
+keyword: table maintenance, optimizer, quick start, compaction, update stats
+license: This software is licensed under the Apache License version 2.
+---
+
+## Before running quick start
+
+- Prepare a running Gravitino server.
+- Ensure target metalake exists (examples use `test`).
+- Configure `SPARK_HOME` or `gravitino.jobExecutor.local.sparkHome` for Spark 
templates.
+- If your Iceberg REST backend is in-memory, metadata is reset after restart.
+
+For full config details, see [Optimizer 
Configuration](./optimizer-configuration.md).
+
+## Success criteria
+
+- Update-stats job finishes and statistics include `custom-data-file-mse` and 
`custom-delete-file-number`.
+- `submit-strategy-jobs` prints `SUBMIT` with a rewrite job ID.
+- Rewrite job log shows `Rewritten data files: <N>` where `N > 0` for 
non-empty tables.
+
+## Quick start A: built-in table maintenance workflow
+
+This workflow uses:
+
+- Built-in policy type: `system_iceberg_compaction`
+- Built-in update stats job template: `builtin-iceberg-update-stats`
+- Built-in rewrite data files job template: 
`builtin-iceberg-rewrite-data-files`
+
+### 1. Preflight checks
+
+```bash
+# Check metalake
+curl -sS "http://localhost:8090/api/metalakes/test"; | jq
+
+# Check built-in templates
+curl -sS 
"http://localhost:8090/api/metalakes/test/jobs/templates?details=true"; | jq 
'.jobTemplates[].name'
+```
+
+Expected names include:
+
+- `builtin-iceberg-update-stats`
+- `builtin-iceberg-rewrite-data-files`
+
+If missing, verify `gravitino-jobs` JAR in `auxlib`, then restart Gravitino.
+
+### 2. Prepare demo metadata objects
+
+Create a REST Iceberg catalog, schema, and table:
+
+```bash
+# Create catalog (ignore "already exists" errors)
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "rest_catalog",
+    "type": "RELATIONAL",
+    "comment": "Iceberg REST catalog",
+    "provider": "lakehouse-iceberg",
+    "properties": {
+      "catalog-backend": "rest",

Review Comment:
   Yes. This quick start uses a standalone Iceberg REST service endpoint as the 
catalog backend. I added an explicit note in step 2, and you can replace it 
with any reachable Iceberg REST endpoint in your environment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [MINOR] docs: add architecture-first optimizer guide and improve discoverability [gravitino]

Reply via email to