Huicheng Song created PARQUET-2199:
--------------------------------------
Summary: checkBlockSizeReached zero record size perf issue
Key: PARQUET-2199
URL: https://issues.apache.org/jira/browse/PARQUET-2199
Project: Parquet
Issue Type: Bug
Components: parquet-mr
Affects Versions: 2.0.0
Reporter: Huicheng Song
Parquet checks Block size after writing records to decide when it shall flush.
This is relatively expensive, so it estimates the next check based on record
size, record count etc.
For small records (less than 1byte after compression), the average record size
is 0 after integer division. This caused overflow when calculating the next
record count for block size check, resulting block size being checked for every
record.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)