prashantwason opened a new pull request, #18179:
URL: https://github.com/apache/hudi/pull/18179

   ### Describe the issue this Pull Request addresses
   
   In multi-tenant scenarios where multiple Hudi tables share the same process 
(e.g., in a long-running Spark application), metrics from different tables need 
to be isolated. Currently, the `Registry` interface uses a simple string key 
which can lead to metric collisions when multiple tables use the same registry 
names.
   
   This PR adds support for table-specific metric registries by introducing a 
compound key format (`tableName::registryName`) that allows each table to have 
its own isolated metrics.
   
   ### Summary and Changelog
   
   This PR ports the table-specific metrics registry functionality to enable 
proper metric isolation in multi-tenant deployments.
   
   **Key changes:**
   
   1. **Registry.java**:
      - Added compound key support using `tableName::registryName` format
      - Added `getRegistryOfClass(tableName, registryName, clazz)` method for 
table-specific registries
      - Added `getAllMetrics()` overload with `commonPrefix` parameter
      - Added `setRegistries()` for bulk registry registration on executors
      - Added `getName()` method to the interface
   
   2. **LocalRegistry.java & DistributedRegistry.java**:
      - Added `getName()` method implementation
   
   3. **HoodieEngineContext.java**:
      - Added `getMetricRegistry(tableName, registryName)` method for creating 
distributed registries
   
   4. **HoodieSparkEngineContext.java**:
      - Override `getMetricRegistry()` to return `DistributedRegistry` with 
SparkContext registration
      - Added `DISTRIBUTED_REGISTRY_MAP` for tracking registries and 
propagating to executors
      - Added `setRegistries()` propagation to executor operations (map, 
flatMap, mapToPair, etc.)
   
   5. **HoodieWrapperFileSystem.java**:
      - Added `REGISTRY_NAME` and `REGISTRY_META_NAME` constants
   
   6. **DistributedRegistryUtil.java** (new):
      - Utility class for creating wrapper file system registries with proper 
table-specific support
   
   7. **BaseHoodieClient.java & SparkRDDWriteClient.java**:
      - Moved wrapper FS metrics initialization to `SparkRDDWriteClient` using 
`DistributedRegistryUtil`
   
   8. **TestDistributedRegistry.java** (new):
      - Comprehensive tests for distributed registry functionality including 
parallel operations
   
   ### Impact
   
   - **Public API**: Adds `getName()` method to `Registry` interface (backward 
compatible - default implementation not required for existing implementations)
   - **User-facing**: No user-facing changes; this is an internal 
infrastructure improvement
   - **Performance**: No performance impact; the compound key lookup is O(1) 
using ConcurrentHashMap
   
   ### Risk Level
   
   Low - The changes are additive and backward compatible. Existing code using 
`getRegistry(String)` continues to work unchanged. The new functionality is 
opt-in via the new `getRegistryOfClass(tableName, registryName, clazz)` method.
   
   ### Documentation Update
   
   None - This is an internal infrastructure change with no new configs or 
user-facing features.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to