David Zanter created ORC-574:
--------------------------------

             Summary: Performance: Statistics getMax/getMin should return const 
vals to prevent std:string copies
                 Key: ORC-574
                 URL: https://issues.apache.org/jira/browse/ORC-574
             Project: ORC
          Issue Type: Improvement
          Components: C++
    Affects Versions: 1.6.2
            Reporter: David Zanter
         Attachments: callgrind-before-after.JPG

Via Callgrind Performance Profiling of a scenario of a Copy (Full Read and then 
Full Write) of a 1.9 million row ZLib Compressed ORC Table.  The #4 Usage of 
CPU is the std::string alloc from being called by: 
orc::StringColumnStatisticsImpl::update method due to the getMax/getMin calls 
causing std:string alloc/copy/delete.

 

Changing the getMaximum/getMinimum methods to return const vals will prevent 
these alloc/copy/deletes from occurring.

 

Currently with 1.6.X master the performance profile of this scenario is:

Instructions Executed: 16.6 Billion Instructions

real clock time 3.91 seconds

 

With the fix to use consts, this improves the CPU usage by about 38% and the 
Clock Time about 10% to:

Instructions Executed: 12.0 Billion Instructions

real clock time 3.53 seconds

 

Attached JPG showing before (left) and after (right) screenshot of callgrind.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to