prashantwason opened a new pull request, #18179:
URL: https://github.com/apache/hudi/pull/18179
### Describe the issue this Pull Request addresses
In multi-tenant scenarios where multiple Hudi tables share the same process
(e.g., in a long-running Spark application), metrics from different tables need
to be isolated. Currently, the `Registry` interface uses a simple string key
which can lead to metric collisions when multiple tables use the same registry
names.
This PR adds support for table-specific metric registries by introducing a
compound key format (`tableName::registryName`) that allows each table to have
its own isolated metrics.
### Summary and Changelog
This PR ports the table-specific metrics registry functionality to enable
proper metric isolation in multi-tenant deployments.
**Key changes:**
1. **Registry.java**:
- Added compound key support using `tableName::registryName` format
- Added `getRegistryOfClass(tableName, registryName, clazz)` method for
table-specific registries
- Added `getAllMetrics()` overload with `commonPrefix` parameter
- Added `setRegistries()` for bulk registry registration on executors
- Added `getName()` method to the interface
2. **LocalRegistry.java & DistributedRegistry.java**:
- Added `getName()` method implementation
3. **HoodieEngineContext.java**:
- Added `getMetricRegistry(tableName, registryName)` method for creating
distributed registries
4. **HoodieSparkEngineContext.java**:
- Override `getMetricRegistry()` to return `DistributedRegistry` with
SparkContext registration
- Added `DISTRIBUTED_REGISTRY_MAP` for tracking registries and
propagating to executors
- Added `setRegistries()` propagation to executor operations (map,
flatMap, mapToPair, etc.)
5. **HoodieWrapperFileSystem.java**:
- Added `REGISTRY_NAME` and `REGISTRY_META_NAME` constants
6. **DistributedRegistryUtil.java** (new):
- Utility class for creating wrapper file system registries with proper
table-specific support
7. **BaseHoodieClient.java & SparkRDDWriteClient.java**:
- Moved wrapper FS metrics initialization to `SparkRDDWriteClient` using
`DistributedRegistryUtil`
8. **TestDistributedRegistry.java** (new):
- Comprehensive tests for distributed registry functionality including
parallel operations
### Impact
- **Public API**: Adds `getName()` method to `Registry` interface (backward
compatible - default implementation not required for existing implementations)
- **User-facing**: No user-facing changes; this is an internal
infrastructure improvement
- **Performance**: No performance impact; the compound key lookup is O(1)
using ConcurrentHashMap
### Risk Level
Low - The changes are additive and backward compatible. Existing code using
`getRegistry(String)` continues to work unchanged. The new functionality is
opt-in via the new `getRegistryOfClass(tableName, registryName, clazz)` method.
### Documentation Update
None - This is an internal infrastructure change with no new configs or
user-facing features.
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]