Super interesting! Nice sleuthing! TW Impala QE On Tue, Jun 6, 2017 at 5:15 PM, Todd Lipcon <[email protected]> wrote:
> FYI Impala folks: you might hit this issue when you upgrade to llvm 4.0 as > well. Something to be aware of > ---------- Forwarded message ---------- > From: "Todd Lipcon (Code Review)" <[email protected]> > Date: Jun 6, 2017 5:06 PM > Subject: [kudu-CR] codegen: fix regression in inlining after LLVM 4.0 > upgrade > To: "Dan Burkert" <[email protected]>, "David Ribeiro Alves" < > [email protected]>, "Binglin Chang" <[email protected]>, < > [email protected]> > Cc: "Todd Lipcon" <[email protected]> > > Hello Dan Burkert, > > I'd like you to do a code review. Please visit > > http://gerrit.cloudera.org:8080/7099 > > to review the following change. > > Change subject: codegen: fix regression in inlining after LLVM 4.0 upgrade > ...................................................................... > > codegen: fix regression in inlining after LLVM 4.0 upgrade > > With the upgrade to LLVM 4.0, the performance of our generated code > regressed significantly. After looking at the generated assembly from > codegen-test, I could see that there were many 'call' instructions that > didn't appear in the earlier output, which led me to suspect the > inliner. > > After adding a 'module->dump()' call and looking at the output, I could > see that the calls were to utility functions like BitmapTest() which > should be inlined due to be very small. However, the function was marked > 'noinline' in precompiled.ll. After a bit of Googling I came across a > thread[1] in which someone else noticed that all of their functions had > unexpected 'noinline' attributes after upgrading to 4.0. > > After following the breadcrumbs, I found some advice to change the way > in which we emit precompiled.ll to disable the built-in LLVM passes > which were responsible for adding this attribute. > > During my investigation, I also found that we should have been passing > the optimization and size levels into the inlining pass, so I threw > that change in for good measure. > > This fixed the perf regression. I benchmarked using: > memrowset-test --roundtrip_num_rows=10000000 \ > --gtest_filter=\*InsertCount\* \ > --gtest_repeat=10 > > Before this fix: > > I0606 16:55:51.448117 48991 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.502s user 0.502s sys 0.000s > I0606 16:55:57.937391 48991 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.500s user 0.499s sys 0.000s > I0606 16:56:04.330037 48991 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.498s user 0.499s sys 0.000s > I0606 16:56:10.706786 48991 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.504s user 0.503s sys 0.001s > I0606 16:56:17.094764 48991 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.510s user 0.510s sys 0.000s > I0606 16:56:23.474200 48991 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.502s user 0.502s sys 0.000s > I0606 16:56:29.845799 48991 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.503s user 0.503s sys 0.000s > I0606 16:56:36.221843 48991 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.502s user 0.503s sys 0.000s > I0606 16:56:42.599604 48991 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.502s user 0.503s sys 0.000s > I0606 16:56:48.962482 48991 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.503s user 0.502s sys 0.000s > > After this fix: > > I0606 16:53:51.573541 48403 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.302s user 0.302s sys 0.000s > I0606 16:53:57.922485 48403 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.301s user 0.301s sys 0.000s > I0606 16:54:04.186511 48403 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.310s user 0.310s sys 0.000s > I0606 16:54:10.425207 48403 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.313s user 0.313s sys 0.000s > I0606 16:54:16.648202 48403 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.316s user 0.316s sys 0.000s > I0606 16:54:22.868655 48403 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.311s user 0.312s sys 0.000s > I0606 16:54:29.081784 48403 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.313s user 0.314s sys 0.000s > I0606 16:54:35.317623 48403 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.314s user 0.312s sys 0.003s > I0606 16:54:41.550542 48403 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.316s user 0.315s sys 0.000s > I0606 16:54:47.761406 48403 memrowset-test.cc:446] Time spent Scanning > rows where all are committed: real 0.314s user 0.314s sys 0.000s > > I also manually inspected the assembly from codegen-test and compared vs > a version compiled with LLVM 3.9 and found that it now has fewer > instructions than before. > > [1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/113175.html > > Change-Id: I7e449df80e5cd02b9ce2dbf4daa5cf8151007dcf > --- > M src/kudu/codegen/CMakeLists.txt > M src/kudu/codegen/module_builder.cc > 2 files changed, 11 insertions(+), 1 deletion(-) > > > git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/7099/1 > -- > To view, visit http://gerrit.cloudera.org:8080/7099 > To unsubscribe, visit http://gerrit.cloudera.org:8080/settings > > Gerrit-MessageType: newchange > Gerrit-Change-Id: I7e449df80e5cd02b9ce2dbf4daa5cf8151007dcf > Gerrit-PatchSet: 1 > Gerrit-Project: kudu > Gerrit-Branch: master > Gerrit-Owner: Todd Lipcon <[email protected]> > Gerrit-Reviewer: Dan Burkert <[email protected]> >
