David Zanter created ORC-574:
--------------------------------
Summary: Performance: Statistics getMax/getMin should return const
vals to prevent std:string copies
Key: ORC-574
URL: https://issues.apache.org/jira/browse/ORC-574
Project: ORC
Issue Type: Improvement
Components: C++
Affects Versions: 1.6.2
Reporter: David Zanter
Attachments: callgrind-before-after.JPG
Via Callgrind Performance Profiling of a scenario of a Copy (Full Read and then
Full Write) of a 1.9 million row ZLib Compressed ORC Table. The #4 Usage of
CPU is the std::string alloc from being called by:
orc::StringColumnStatisticsImpl::update method due to the getMax/getMin calls
causing std:string alloc/copy/delete.
Changing the getMaximum/getMinimum methods to return const vals will prevent
these alloc/copy/deletes from occurring.
Currently with 1.6.X master the performance profile of this scenario is:
Instructions Executed: 16.6 Billion Instructions
real clock time 3.91 seconds
With the fix to use consts, this improves the CPU usage by about 38% and the
Clock Time about 10% to:
Instructions Executed: 12.0 Billion Instructions
real clock time 3.53 seconds
Attached JPG showing before (left) and after (right) screenshot of callgrind.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)