n3nash commented on PR #8679:
URL: https://github.com/apache/hudi/pull/8679#issuecomment-1624445753

   @vinothchandar Great write up and some very cool ideas and suggestions 
already! It's an impressive roadmap. Just adding some thoughts and personal 
opinions from previous experiences, what I've heard from others users and gaps 
on the most interesting (and impactful) aspects as features are being 
prioritized. 
   
   - Table APIs : Like already called out in the RFC, enables Hudi for faster 
query engine integrations and expands the ecosystem for varied kinds of data 
users in the community. This is one of the pieces that has been missing for 
Hudi's unique features to be adopted more widely. Additionally, with these 
API's, folks who want (and miss) a higher level, "simpler" interface can start 
to play with Hudi more easily. 
   - Caching Service : One of the powerful and unique parts of Hudi's design is 
the file group layout. Building an integrated caching service that can tap into 
this will provide higher performance than a general table format design. With 
DuckDB et all taking off, having low latency access to hot data more smartly 
would go a long way. 
   - Metastore / Catalog service : Look forward to how writers and readers can 
deeply integrate and utilize the power of Hudi's data model without being 
constricted on Hive serde, query planning etc. Additionally, this will 
centralize other table services / operations making it easier for users to 
manage them uniformly (across deltastreamer, stand-alone spark or flink jobs 
etc) 
   - Additional Indexing techniques : Due to the pluggable architecture of Hudi 
and the work already done upfront, it's easier to support different forms of 
indexing. Personally, I've seen geospatial indexing to be a requirement for 
many users.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to