linliu-code opened a new pull request, #13977:
URL: https://github.com/apache/hudi/pull/13977
### Describe the issue this Pull Request addresses
When HFile writer serializes an entry, e.g., key value entry for data block,
index entry for index block, a ByteBuffer is used to allocate some memory to
store the serialize data. Since the capacity of the ByteBuffer cannot be
changed, we need to give a proper value since this buffer will be used to
serialize all entries for a block.
Previously the capacity of the buffer equals the HFile data block size with
the assumption that the key value pair should not be longer than the block
size. However, we found that in fact a key value pair length could be larger
than the block size. When this happened, `java.nio.BufferOverflowException`
exception was thrown.
### Summary and Changelog
To solve the above problem, we calculate the buffer capacity based on the
content of the entries. For different blocks, the calculation logic could be
different due to their different storage format. E.g,
- data block, the capacity is calculated as: `max(key.length + value.length)
+ 21 `,
where 21 includes: 4: key length + 4: value length + 2: length of key
length + 10: column family/timestamp/key type + 1 mvcc.
### Impact
1. When an entry size is larger than the block size, no error will be
thrown. Basically one block can contain any size of data.
2. When an entry size is small, less memory should be used since normally
block size is much larger than that of an entry.
### Risk Level
Medium.
### Documentation Update
<!-- Describe any necessary documentation update if there is any new
feature, config, or user-facing change. If not, put "none".
- The config description must be updated if new configs are added or the
default value of the configs are changed.
- Any new feature or user-facing change requires updating the Hudi website.
Please follow the
[instruction](https://hudi.apache.org/contribute/developer-setup#website)
to make changes to the website. -->
### Contributor's checklist
- [ ] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [ ] Enough context is provided in the sections above
- [ ] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]