Chao Sun created PARQUET-2052:
---------------------------------
Summary: Integer overflow when writing huge binary using
dictionary encoding
Key: PARQUET-2052
URL: https://issues.apache.org/jira/browse/PARQUET-2052
Project: Parquet
Issue Type: Bug
Reporter: Chao Sun
Assignee: Chao Sun
To check whether it should fallback to plain encoding,
{{DictionaryValuesWriter}} currently use two variables: {{dictionaryByteSize}}
and {{maxDictionaryByteSize}}, both of which are integer. This will cause issue
when one first writes a relatively small binary within the threshold and then
write a huge string which cause {{dictionaryByteSize}} overflow and becoming
negative.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)