David Mollitor created PARQUET-2072:
---------------------------------------

             Summary: Do Not Determine Both Min/Max for Binary Stats
                 Key: PARQUET-2072
                 URL: https://issues.apache.org/jira/browse/PARQUET-2072
             Project: Parquet
          Issue Type: Improvement
            Reporter: David Mollitor
            Assignee: David Mollitor


I'm looking at some benchmarking code of Apache ORC v.s. Apache Parquet and see 
that Parquet is quite a bit slower for writes (reads TBD).  Based on my 
investigation, I have noticed a significant amount of time spent in determining 
min/max for binary types.

One quick improvement is to bypass a "max" value determinization if the value 
has already been determined to be a "min".

While I'm at it, remove calls to deprecated functions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to