FANNG1 opened a new pull request, #9974:
URL: https://github.com/apache/gravitino/pull/9974
## What changes were proposed in this pull request?
### User perspective
- Configure the optimizer to point at your updater providers:
- Set `OptimizerConfig.STATISTICS_UPDATER_CONFIG` to your
`StatisticsUpdater` provider
name.
- Set `OptimizerConfig.METRICS_UPDATER_CONFIG` to your `MetricsUpdater`
provider name.
- Implement and register a `StatisticsCalculator` via: `META-INF/services/
org.apache.gravitino.maintenance.optimizer.api.updater.StatisticsCalculator`.
- Construct an `Updater` with an `OptimizerEnv`.
- Run one of the entry points:
- `update(calculatorName, identifiers, UpdateType.STATISTICS)` to persist
raw statistics
for specific tables/jobs.
- `update(calculatorName, identifiers, UpdateType.METRICS)` to persist
derived metrics
(and job metrics if the calculator supports them).
- `updateAll(calculatorName, UpdateType.METRICS|STATISTICS)` for bulk
refresh, using the
calculator’s bulk methods.
### System perspective
- Adds new updater-focused optimizer APIs and implementations for
statistics/metrics flow:
- New API types: `MetricSample`, `PartitionMetricSample`, `MetricsUpdater`,
`StatisticsUpdater`, `StatisticsCalculator`,
`SupportsCalculateTableStatistics`,
`SupportsCalculateBulkTableStatistics`,
`SupportsCalculateJobStatistics`, `SupportsCalculateBulkJobStatistics`,
`UpdateType`, `ToStatistic`, and
`TableStatisticsBundle`.
- New impls: `MetricSampleImpl`, `PartitionMetricSampleImpl`,
`StatisticEntryImpl`,
`MetricRecordImpl`, `MetricsRepository`, plus `Updater` as the entrypoint
to calculate and persist
statistics/metrics.
- Adds provider wiring:
- `OptimizerConfig` now includes `STATISTICS_UPDATER_CONFIG` and
`METRICS_UPDATER_CONFIG`.
- `ProviderUtils` creates `StatisticsUpdater` and `MetricsUpdater` instances.
- `InstanceLoaderUtils` loads `StatisticsCalculator` by provider name.
- Removes unused/legacy updater/metrics components and SPI entries that are
out of scope
for the new updater flow.
- Adds and updates unit tests for the updater/statistics logic and keeps
existing
recommender code unchanged.
## Why are the changes needed?
- This change introduces a clear, provider-based updater API to calculate
statistics and
persist either raw statistics or derived metrics, with explicit entrypoints
(`update`,
`updateAll`) and consistent data models (`MetricSample`,
`PartitionMetricSample`,
`TableStatisticsBundle`).
- It enables pluggable implementations for both statistics calculation and
storage, which
is required for supporting multiple sources/backends and batch refresh
workflows.
- It removes outdated or incomplete updater/metrics components so the new
API surface is
clean and consistent.
## Does this PR introduce any user-facing change?
- No.
## How was this patch tested?
- `./gradlew :maintenance:optimizer:test -PskipITs`
<!--
1. Title: [#<issue>] <type>(<scope>): <subject>
Examples:
- "[#123] feat(operator): support xxx"
- "[#233] fix: check null before access result in xxx"
- "[MINOR] refactor: fix typo in variable name"
- "[MINOR] docs: fix typo in README"
- "[#255] test: fix flaky test NameOfTheTest"
Reference: https://www.conventionalcommits.org/en/v1.0.0/
2. If the PR is unfinished, please mark this PR as draft.
-->
### What changes were proposed in this pull request?
(Please outline the changes and how this PR fixes the issue.)
### Why are the changes needed?
(Please clarify why the changes are needed. For instance,
1. If you propose a new API, clarify the use case for a new API.
2. If you fix a bug, describe the bug.)
Fix: #(issue)
### Does this PR introduce _any_ user-facing change?
(Please list the user-facing changes introduced by your change, including
1. Change in user-facing APIs.
2. Addition or removal of property keys.)
### How was this patch tested?
(Please test your changes, and provide instructions on how to test it:
1. If you add a feature or fix a bug, add a test to cover your changes.
2. If you fix a flaky test, repeat it for many times to prove it works.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]