[I] Add deserialize/fetch blobs functionality to Spark readers [hudi]

via GitHub Fri, 06 Feb 2026 11:12:21 -0800


the-other-tim-brown opened a new issue, #18110:
URL: https://github.com/apache/hudi/issues/18110


   ### Task Description
   
   **What needs to be done:**
   1. Establish core logic for fetching the blobs. This should aim to batch 
requests when possible for blocks of data within the same file. The logic 
should be able to handle ranges within a file that leverage the offset and 
length fields and also read full files if those values are not set.
   2. This logic should be exposed as an easy to use spark function for those 
using Spark `Dataset<Row>` directly 
   3. This functionality should be exposed as a function that a Spark SQL user 
can invoke on a blob column
   
   **Why this task is needed:**
   Spark users should be able to easily deserialize the blob columns in their 
dataset.
   
   
   ### Task Type
   
   Code improvement/refactoring
   
   ### Related Issues
   
   **Parent feature issue:** (if applicable )
   **Related issues:**
   NOTE: Use `Relationships` button to add parent/blocking issues after issue 
is created.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Add deserialize/fetch blobs functionality to Spark readers [hudi]

Reply via email to