imay opened a new issue #2040: Shared Pointer has a great impact on performance URL: https://github.com/apache/incubator-doris/issues/2040 When I looked into why beta rowset has bad performance than alpha rowset. I run stress test with a query like "select sum(cost) from testTbl", testTbl is a aggregate table, and cost is a SUM column. And I get a CPU Flame Graph below.  It is unbelievable that a get version() function cost almost 80% CPU. So I look what it do in code, and I found that we will check version of each row to see if it is deleted. AlphaRowset do some optimization for this case, so it won't meet the case. And this is another story, which we should add optimization for beta rowset too. I found that we call all get function with a shared pointer. So I guess the key point why the performance is bad is because of usage of shared pointer. Then I change some code to call getter function with plain pointer. After modification, I run the test again, then I get another CPU FlameGraph below.  And this seems reasonable. I'm not very sure what's ehe reason why shared pointer has impact. It may be that shared pointer will change its reference counter when calling function through it. Through this case, we should avoid shared pointer usage in our performance critical path. Now in our storage engine, we use shared pointer everywhere to avoid memory leak. We should change this status and use shared pointer when it is actually needed.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
