suxiaogang223 opened a new pull request, #60937:
URL: https://github.com/apache/doris/pull/60937

   
   ### What problem does this PR solve?
   
   Part of #60686
   
   ## Summary
   
   Introduce a unified external metadata cache framework and migrate Iceberg / 
Paimon / Hudi / MaxCompute / Doris to engine-level cache adapters while keeping 
Common/Hive on the legacy path for incremental rollout.
   
   ### New Meta Cache Framework
   
   The new framework standardizes external metadata caching into a 3-level 
model:
   
   - **Engine level**: `ExternalMetaCacheMgr` routes by engine 
(`hive/iceberg/paimon/hudi/maxcompute/doris`).
   - **Catalog level**: each engine cache keeps isolated cache entries per 
catalog id.
   - **Entry level**: each engine declares explicit `MetaCacheEntryDef` (name, 
key type, value type, loader, default spec).
   
   Structure (simplified):
   
   ```
   ExternalMetaCache (interface)
     ^
     |
   AbstractExternalMetaCache
     |-- metaCacheEntryDefs: Map<String, MetaCacheEntryDef>
     |-- catalogEntries: Map<CatalogId, CatalogEntryGroup>
     |
     +-- MetaCacheEntryDef (definition only)
     |     |-- name / keyType / valueType / loader / defaultCacheSpec
     |
     +-- MetaCacheEntry (runtime cache, per catalog)
           |-- get / put / invalidate* / stats
   ```
   
   Core components:
   
   - `ExternalMetaCache`: unified engine-level contract (`initCatalog`, scoped 
invalidation, `stats`).
   - `AbstractExternalMetaCache`: shared implementation for entry registration, 
per-catalog entry group creation, type-safe entry lookup, lifecycle management.
   - `MetaCacheEntryDef`: immutable declaration of an entry.
   - `MetaCacheEntry`: generic cache runtime (load on miss, invalidate by 
key/predicate/all, per-entry stats).
   - `CacheSpec`: unified cache policy (`enable`, `ttl-second`, `capacity`) and 
compatibility key mapping.
   - `CatalogEntryGroup`: container for all entries in one catalog.
   
   Initialization and lifecycle:
   
   - During `ExternalCatalog.makeSureInitialized()`, 
`ExternalMetaCacheMgr.prepareCatalog(...)` eagerly initializes engine entries 
for that catalog.
   - On property updates or refresh events, manager-level invalidation APIs 
trigger scoped rebuild/invalidation at engine/catalog/db/table/partition level.
   - Stats are exposed as `engine.entry -> metric map`, so each entry can be 
observed independently.
   
   Configuration model:
   
   - Unified keys: `meta.cache.<engine>.<entry>.enable|ttl-second|capacity`
   - Defaults come from each `MetaCacheEntryDef` default `CacheSpec`.
   - `CacheSpec.applyCompatibilityMap(...)` supports smooth migration from 
legacy keys.
   
   ### ExternalMetaCacheMgr `engineCaches` Organization
   
   `engineCaches` is a concurrent map: `Map<String, ExternalMetaCache>`.
   
   - `initEngineCaches()` pre-registers built-in engines.
   - `engine(engineName)` normalizes to lowercase and uses 
`computeIfAbsent(...)`.
   - `routeEngine(engine, action)`:
     - routes to all engines when `engine == null`
     - routes to a single engine otherwise.
   - unknown engines fall back to a no-op legacy adapter path.
   
   ### Migration Status (Engine View)
   
   - Migrated to new framework:
     - `iceberg` (`table`, `view`, `manifest`)
     - `paimon` (`table`)
     - `hudi` (`partition`, `fs_view`, `meta_client`)
     - `maxcompute` (`metadata`)
     - `doris` (`backends`)
   - Kept on legacy bridge intentionally:
     - `common`
     - `hive`
   
   ### Key Changes
   
   - Add framework abstractions under `datasource.metacache`:
     - `ExternalMetaCache`
     - `AbstractExternalMetaCache`
     - `MetaCacheEntryDef`
     - `MetaCacheEntry`
     - `CacheSpec`
     - `CatalogEntryGroup`
   - Refactor `ExternalMetaCacheMgr` to route cache lifecycle by engine 
(`prepareCatalog`, `invalidateCatalog/db/table/partitions`, `stats`).
   - Eagerly initialize engine cache entries during 
`ExternalCatalog.makeSureInitialized()`.
   - Migrate Iceberg to `IcebergExternalMetaCache` entries (`table`, `view`, 
`manifest`) and move call sites to the engine cache path.
   - Migrate Paimon to `PaimonExternalMetaCache` (`table`) and route related 
call sites via engine cache.
   - Migrate Hudi metadata cache to `HudiExternalMetaCache` entries 
(`partition`, `fs_view`, `meta_client`) and route scan/utils through the new 
path.
   - Migrate MaxCompute cache to `MaxComputeExternalMetaCache` and remove 
`MaxComputeMetadataCacheMgr`.
   - Migrate Remote Doris backend cache to `DorisExternalMetaCache` and remove 
`DorisExternalMetaCacheMgr`.
   - Add catalog property compatibility mapping support in `CacheSpec` for 
gradual key migration.
   - Keep `ENGINE_COMMON` and `ENGINE_HIVE` on `LegacyExternalMetaCache` to 
preserve existing behavior.
   - Add/adjust tests:
     - `IcebergExternalMetaCacheTest`
     - `PaimonExternalMetaCacheTest`
     - `MetaCacheDeadlockTest`
   
   ### Compatibility / Behavior
   
   - Existing Hive/Common cache behavior is intentionally unchanged in this PR.
   - New cache keys follow:
     - `meta.cache.<engine>.<entry>.enable`
     - `meta.cache.<engine>.<entry>.ttl-second`
     - `meta.cache.<engine>.<entry>.capacity`
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to