suryaprasanna opened a new issue, #18308: URL: https://github.com/apache/hudi/issues/18308
### Feature Description **What the feature achieves:** This feature would enable Apache Hudi table format support in Meta's Velox execution engine, allowing Velox-based query engines (Presto, Gluten, etc.) to read and potentially write Hudi tables natively. Key capabilities: - Native C++ reader for Hudi Copy-on-Write (COW) tables - Native C++ reader for Hudi Merge-on-Read (MOR) tables with real-time view support - Integration with Velox's vectorized execution model for high-performance data processing - Support for Hudi's time travel and incremental queries - Compatibility with Hudi's metadata table for efficient file pruning - Support for Hudi's column statistics and data skipping **Why this feature is needed:** Velox is a high-performance C++ vectorized execution engine that powers multiple query engines including Meta's Presto, Apache Spark (via Gluten), and other data processing systems. Currently: - **No native Hudi support**: Velox lacks native support for reading Hudi tables, limiting its ability to process Hudi datasets efficiently - **Performance opportunity**: Velox's C++ vectorized engine can provide significant performance improvements (minimum 10x in some cases) compared to JVM-based readers - **Growing adoption**: Velox is being adopted by major projects (Presto, Gluten for Spark, Meta's infrastructure) and Hudi support would enable these ecosystems - **Modern lake house architecture**: As Hudi becomes a standard lakehouse format, Velox integration is essential for high-performance query engines - **Reduced dependency on JVM**: Native C++ implementation would enable non-JVM systems to consume Hudi tables efficiently ### User Experience **How users will use this feature:** - Configuration changes needed Velox connector configuration to recognize Hudi table format - API changes Velox Connector API to register Hudi file format - Usage examples All the read queries remain the same no change. ### Hudi RFC Requirements **RFC PR link:** (if applicable) **Why RFC is/isn't needed:** - Does this change public interfaces/APIs? (Yes/No) - Does this change storage format? (Yes/No) - Justification: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
