PavithranRick opened a new pull request, #17482:
URL: https://github.com/apache/hudi/pull/17482

   ### Describe the issue this Pull Request addresses
   
   Current write-path logic performs **partition path resolution and 
partition-switch checks for every record**, even when the table is 
**non-partitioned**.
   
   For large datasets (e.g., terabyte scale), this results in significant 
overhead because:
   
   - Fetching the partition path requires **materializing/deserializing the 
full binary record**, which is expensive.
   - The `canWrite()` check is invoked for every record through 
`HoodieRowCreateHandle` → `BulkInsertDataInternalWriterHelper`.
   - For non-partitioned tables, this logic is unnecessary because the write 
handle never needs to switch.
   
   This PR optimizes the write path by **completely bypassing partition path 
lookup and write-handle switching for non-partitioned tables**.
   
   ---
   
   ### Summary and Changelog
   
   This PR introduces the following improvements:
   
   - Add early detection using `HoodieTable` / `HoodieTableMetaClient` to 
determine if the table is partitioned.
   - If the table is **unpartitioned**, skip:
     - partition path extraction,
     - partition switch checks in `canWrite()`,
     - any record-level partition materialization.
   - Allow records to be streamed directly without triggering heavy byte-array 
deserialization.
   - Prevent unnecessary overhead in `BaseCreateHandle` and 
`HoodieRowCreateHandle` for unpartitioned tables.
   
   These changes significantly reduce CPU cost for bulk insert workloads on 
non-partitioned tables.
   
   ---
   
   ### Impact
   
   - **Performance**: Major improvement for non-partitioned tables, especially 
for binary-encoded record formats.
   - **Behavior**: No functional change for partitioned tables.
   - **API**: No user-facing API changes; logic is internal.
   
   ---
   
   ### Risk Level
   
   **Low**
   
   - Optimization path is only taken when the table is explicitly detected as 
non-partitioned.
   - Behavior for partitioned tables remains unchanged.
   - Write-path tests mitigate regression risk.
   
   ---
   
   ### Documentation Update
   
   None.  
   This PR does not introduce new configs or change user-facing behavior.
   
   ---
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to