David Mollitor created PARQUET-2072:
---------------------------------------
Summary: Do Not Determine Both Min/Max for Binary Stats
Key: PARQUET-2072
URL: https://issues.apache.org/jira/browse/PARQUET-2072
Project: Parquet
Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor
I'm looking at some benchmarking code of Apache ORC v.s. Apache Parquet and see
that Parquet is quite a bit slower for writes (reads TBD). Based on my
investigation, I have noticed a significant amount of time spent in determining
min/max for binary types.
One quick improvement is to bypass a "max" value determinization if the value
has already been determined to be a "min".
While I'm at it, remove calls to deprecated functions.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)