PavithranRick opened a new pull request, #17482:
URL: https://github.com/apache/hudi/pull/17482
### Describe the issue this Pull Request addresses
Current write-path logic performs **partition path resolution and
partition-switch checks for every record**, even when the table is
**non-partitioned**.
For large datasets (e.g., terabyte scale), this results in significant
overhead because:
- Fetching the partition path requires **materializing/deserializing the
full binary record**, which is expensive.
- The `canWrite()` check is invoked for every record through
`HoodieRowCreateHandle` → `BulkInsertDataInternalWriterHelper`.
- For non-partitioned tables, this logic is unnecessary because the write
handle never needs to switch.
This PR optimizes the write path by **completely bypassing partition path
lookup and write-handle switching for non-partitioned tables**.
---
### Summary and Changelog
This PR introduces the following improvements:
- Add early detection using `HoodieTable` / `HoodieTableMetaClient` to
determine if the table is partitioned.
- If the table is **unpartitioned**, skip:
- partition path extraction,
- partition switch checks in `canWrite()`,
- any record-level partition materialization.
- Allow records to be streamed directly without triggering heavy byte-array
deserialization.
- Prevent unnecessary overhead in `BaseCreateHandle` and
`HoodieRowCreateHandle` for unpartitioned tables.
These changes significantly reduce CPU cost for bulk insert workloads on
non-partitioned tables.
---
### Impact
- **Performance**: Major improvement for non-partitioned tables, especially
for binary-encoded record formats.
- **Behavior**: No functional change for partitioned tables.
- **API**: No user-facing API changes; logic is internal.
---
### Risk Level
**Low**
- Optimization path is only taken when the table is explicitly detected as
non-partitioned.
- Behavior for partitioned tables remains unchanged.
- Write-path tests mitigate regression risk.
---
### Documentation Update
None.
This PR does not introduce new configs or change user-facing behavior.
---
### Contributor's checklist
- [ ] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [ ] Enough context is provided in the sections above
- [ ] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]