the-other-tim-brown commented on code in PR #18013: URL: https://github.com/apache/hudi/pull/18013#discussion_r2790556273
########## rfc/rfc-100/rfc-100.md: ########## @@ -222,13 +220,31 @@ HoodieWriteConfig config = HoodieWriteConfig.newBuilder() - Efficient BLOB streaming for distributed ML workloads - Integration with Ray's object store for large BLOB caching -### 8. Metadata Table Extensions +### 6. Metadata Table Extensions - Track BLOB references for garbage collection - Store maintain indexes for parquet based blob storage - Maintain size statistics for storage optimization - Support BLOB-based query optimization +## Development Plan + +#### Milestone 1: External Blob Support Review Comment: I don't think UDF is the right approach here since that will act on a single row and remove the possibility of batching the lookups when we have these blob container files. See the reader section above for the overview and this [PR](https://github.com/apache/hudi/pull/18098) for the implementation. I'm updating the reader section to add the expectations on the reference returned -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
