roryqi commented on code in PR #10203: URL: https://github.com/apache/gravitino/pull/10203#discussion_r2905170458
########## docs/table-maintenance-service/optimizer-quick-start.md: ########## @@ -0,0 +1,237 @@ +--- +title: "Optimizer Quick Start and Verification" +slug: /table-maintenance-service/quick-start +keyword: table maintenance, optimizer, quick start, compaction, update stats +license: This software is licensed under the Apache License version 2. +--- + +## Before running quick start + +- Prepare a running Gravitino server. +- Ensure target metalake exists (examples use `test`). +- Configure `SPARK_HOME` or `gravitino.jobExecutor.local.sparkHome` for Spark templates. +- If your Iceberg REST backend is in-memory, metadata is reset after restart. + +For full config details, see [Optimizer Configuration](./optimizer-configuration.md). + +## Success criteria + +- Update-stats job finishes and statistics include `custom-data-file-mse` and `custom-delete-file-number`. +- `submit-strategy-jobs` prints `SUBMIT` with a rewrite job ID. +- Rewrite job log shows `Rewritten data files: <N>` where `N > 0` for non-empty tables. + +## Quick start A: built-in table maintenance workflow + +This workflow uses: + +- Built-in policy type: `system_iceberg_compaction` +- Built-in update stats job template: `builtin-iceberg-update-stats` +- Built-in rewrite data files job template: `builtin-iceberg-rewrite-data-files` + +### 1. Preflight checks + +```bash +# Check metalake +curl -sS "http://localhost:8090/api/metalakes/test" | jq + +# Check built-in templates +curl -sS "http://localhost:8090/api/metalakes/test/jobs/templates?details=true" | jq '.jobTemplates[].name' +``` + +Expected names include: + +- `builtin-iceberg-update-stats` +- `builtin-iceberg-rewrite-data-files` + +If missing, verify `gravitino-jobs` JAR in `auxlib`, then restart Gravitino. + +### 2. Prepare demo metadata objects + +Create a REST Iceberg catalog, schema, and table: + +```bash +# Create catalog (ignore "already exists" errors) +curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "rest_catalog", + "type": "RELATIONAL", + "comment": "Iceberg REST catalog", + "provider": "lakehouse-iceberg", + "properties": { + "catalog-backend": "rest", Review Comment: Table maintenance can use the standalone Iceberg REST catalog? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
