linliu-code opened a new pull request, #13977:
URL: https://github.com/apache/hudi/pull/13977

   ### Describe the issue this Pull Request addresses
   
   When HFile writer serializes an entry, e.g., key value entry for data block, 
index entry for index block, a ByteBuffer is used to allocate some memory to 
store the serialize data. Since the capacity of the ByteBuffer cannot be 
changed, we need to give a proper value since this buffer will be used to 
serialize all entries for a block.
   
   Previously the capacity of the buffer equals the HFile data block size with 
the assumption that the key value pair should not be longer than the block 
size. However, we found that in fact a key value pair length could be larger 
than the block size. When this happened, `java.nio.BufferOverflowException` 
exception was thrown.
   
   ### Summary and Changelog
   
   To solve the above problem, we calculate the buffer capacity based on the 
content of the entries. For different blocks, the calculation logic could be 
different due to their different storage format. E.g, 
   
   - data block, the capacity is calculated as: `max(key.length + value.length) 
+ 21 `, 
   where 21 includes:  4: key length + 4: value length + 2: length of key 
length + 10: column family/timestamp/key type + 1 mvcc.
   
   
   ### Impact
   
   1. When an entry size is larger than the block size, no error will be 
thrown. Basically one block can contain any size of data.
   2. When an entry size is small, less memory should be used since normally 
block size is much larger than that of an entry.
   
   ### Risk Level
   
   Medium.
   
   ### Documentation Update
   
   <!-- Describe any necessary documentation update if there is any new 
feature, config, or user-facing change. If not, put "none".
   
   - The config description must be updated if new configs are added or the 
default value of the configs are changed.
   - Any new feature or user-facing change requires updating the Hudi website. 
Please follow the 
     [instruction](https://hudi.apache.org/contribute/developer-setup#website) 
to make changes to the website. -->
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to