the-other-tim-brown opened a new issue, #18110: URL: https://github.com/apache/hudi/issues/18110
### Task Description **What needs to be done:** 1. Establish core logic for fetching the blobs. This should aim to batch requests when possible for blocks of data within the same file. The logic should be able to handle ranges within a file that leverage the offset and length fields and also read full files if those values are not set. 2. This logic should be exposed as an easy to use spark function for those using Spark `Dataset<Row>` directly 3. This functionality should be exposed as a function that a Spark SQL user can invoke on a blob column **Why this task is needed:** Spark users should be able to easily deserialize the blob columns in their dataset. ### Task Type Code improvement/refactoring ### Related Issues **Parent feature issue:** (if applicable ) **Related issues:** NOTE: Use `Relationships` button to add parent/blocking issues after issue is created. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
