David Mollitor created PARQUET-2072: ---------------------------------------
Summary: Do Not Determine Both Min/Max for Binary Stats Key: PARQUET-2072 URL: https://issues.apache.org/jira/browse/PARQUET-2072 Project: Parquet Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor I'm looking at some benchmarking code of Apache ORC v.s. Apache Parquet and see that Parquet is quite a bit slower for writes (reads TBD). Based on my investigation, I have noticed a significant amount of time spent in determining min/max for binary types. One quick improvement is to bypass a "max" value determinization if the value has already been determined to be a "min". While I'm at it, remove calls to deprecated functions. -- This message was sent by Atlassian Jira (v8.3.4#803005)