n3nash commented on PR #8679: URL: https://github.com/apache/hudi/pull/8679#issuecomment-1624445753
@vinothchandar Great write up and some very cool ideas and suggestions already! It's an impressive roadmap. Just adding some thoughts and personal opinions from previous experiences, what I've heard from others users and gaps on the most interesting (and impactful) aspects as features are being prioritized. - Table APIs : Like already called out in the RFC, enables Hudi for faster query engine integrations and expands the ecosystem for varied kinds of data users in the community. This is one of the pieces that has been missing for Hudi's unique features to be adopted more widely. Additionally, with these API's, folks who want (and miss) a higher level, "simpler" interface can start to play with Hudi more easily. - Caching Service : One of the powerful and unique parts of Hudi's design is the file group layout. Building an integrated caching service that can tap into this will provide higher performance than a general table format design. With DuckDB et all taking off, having low latency access to hot data more smartly would go a long way. - Metastore / Catalog service : Look forward to how writers and readers can deeply integrate and utilize the power of Hudi's data model without being constricted on Hive serde, query planning etc. Additionally, this will centralize other table services / operations making it easier for users to manage them uniformly (across deltastreamer, stand-alone spark or flink jobs etc) - Additional Indexing techniques : Due to the pluggable architecture of Hudi and the work already done upfront, it's easier to support different forms of indexing. Personally, I've seen geospatial indexing to be a requirement for many users. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
