Huicheng Song created PARQUET-2199:
--------------------------------------

             Summary: checkBlockSizeReached zero record size perf issue
                 Key: PARQUET-2199
                 URL: https://issues.apache.org/jira/browse/PARQUET-2199
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
    Affects Versions: 2.0.0
            Reporter: Huicheng Song


Parquet checks Block size after writing records to decide when it shall flush. 
This is relatively expensive, so it estimates the next check based on record 
size, record count etc.

For small records (less than 1byte after compression), the average record size 
is 0 after integer division. This caused overflow when calculating the next 
record count for block size check, resulting block size being checked for every 
record.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to