Issue 55393
Summary [IROutliner] CTMark/consumer-typeset slow to compile for AArch64 @ -Oz and -O2
Labels llvm:optimizations, llvm:compiletime
Assignees
Reporter ornata
    I've observed that at -Oz the IR outliner is 59.8% slower than baseline, and at -O2 it's 52.6% slower than the baseline.

This was done by compiling [CTMark](https://github.com/llvm/llvm-test-suite/blob/main/CTMark/README.md) at both optimization levels with and without the outliner. CTMark was compiled using [LNT](https://lnt.readthedocs.io/en/latest/).

I collected the average over 3 samples, compiled with a single thread.

I used a debug build of Clang, but I suspect the issue should persist with a release build or a release + asserts build.

# Reproducing

Here's IR for one of the files in the benchmark which compiles slowly with the outliner: https://godbolt.org/z/boYe1TPTW

You could also compile consumer-typeset from CTMark for AArch64 to do an end-to-end test.

I think we should be able to take out a big chunk with the IR above though. :)

# Analysis

Focusing on -O2 because I don't want to think about the MachineOutliner getting in the way...

I collected time traces using `clang -ftime-trace` and looked at all of them. The worst outlier I found was `z12.c`.

![image](https://user-images.githubusercontent.com/4722725/167938830-ee4422a4-a4bb-4bc1-977e-468b83edf655.png)

After that I used Instruments to figure out the heaviest stack trace.

```
1.26 s    5.2%    1.26 s           llvm::isa_impl_wrap<llvm::Instruction, llvm::Value const* const, llvm::Value const*>::doit(llvm::Value const* const&)
1.10 s    4.6%    0 s            bool llvm::isa<llvm::Instruction, llvm::Value const*>(llvm::Value const* const&)
595.00 ms    2.4%    0 s             bool llvm::isa<llvm::Instruction, llvm::Value const*>(llvm::Value const* const&)
385.00 ms    1.6%    0 s              llvm::CallInst::classof(llvm::Value const*)
385.00 ms    1.6%    0 s               llvm::isa_impl<llvm::CallInst, llvm::Value, void>::doit(llvm::Value const&)
385.00 ms    1.6%    0 s                llvm::isa_impl_cl<llvm::CallInst, llvm::Value const*>::doit(llvm::Value const*)
385.00 ms    1.6%    0 s                 llvm::isa_impl_wrap<llvm::CallInst, llvm::Value const*, llvm::Value const*>::doit(llvm::Value const* const&)
385.00 ms    1.6%    0 s                  llvm::isa_impl_wrap<llvm::CallInst, llvm::Value const* const, llvm::Value const*>::doit(llvm::Value const* const&)
385.00 ms    1.6%    0 s                   bool llvm::isa<llvm::CallInst, llvm::Value const*>(llvm::Value const* const&)
378.00 ms    1.5%    0 s                    llvm::IntrinsicInst::classof(llvm::Value const*)
364.00 ms    1.5%    0 s                     llvm::isa_impl<llvm::IntrinsicInst, llvm::Value, void>::doit(llvm::Value const&)
364.00 ms    1.5%    0 s                      llvm::isa_impl_cl<llvm::IntrinsicInst, llvm::Value const*>::doit(llvm::Value const*)
364.00 ms    1.5%    0 s                       llvm::isa_impl_wrap<llvm::IntrinsicInst, llvm::Value const*, llvm::Value const*>::doit(llvm::Value const* const&)
363.00 ms    1.5%    0 s                        llvm::isa_impl_wrap<llvm::IntrinsicInst, llvm::Value const* const, llvm::Value const*>::doit(llvm::Value const* const&)
363.00 ms    1.5%    0 s                         bool llvm::isa<llvm::IntrinsicInst, llvm::Value const*>(llvm::Value const* const&)
295.00 ms    1.2%    0 s                          llvm::DbgInfoIntrinsic::classof(llvm::Value const*)
295.00 ms    1.2%    0 s                           llvm::isa_impl<llvm::DbgInfoIntrinsic, llvm::Instruction, void>::doit(llvm::Instruction const&)
295.00 ms    1.2%    0 s                            llvm::isa_impl_cl<llvm::DbgInfoIntrinsic, llvm::Instruction const>::doit(llvm::Instruction const&)
295.00 ms    1.2%    0 s                             llvm::isa_impl_wrap<llvm::DbgInfoIntrinsic, llvm::Instruction const, llvm::Instruction const>::doit(llvm::Instruction const&)
295.00 ms    1.2%    0 s                              bool llvm::isa<llvm::DbgInfoIntrinsic, llvm::Instruction>(llvm::Instruction const&)
295.00 ms    1.2%    0 s                               llvm::BasicBlock::instructionsWithoutDebug(bool)::$_1::operator()(llvm::Instruction&) const
295.00 ms    1.2%    0 s                                decltype(static_cast<llvm::BasicBlock::instructionsWithoutDebug(bool)::$_1&>(fp)(static_cast<llvm::Instruction&>(fp0))) std::__1::__invoke<llvm::BasicBlock::instructionsWithoutDebug(bool)::$_1&, llvm::Instruction&>(llvm::BasicBlock::instructionsWithoutDebug(bool)::$_1&, llvm::Instruction&)
295.00 ms    1.2%    0 s                                 bool std::__1::__invoke_void_return_wrapper<bool, false>::__call<llvm::BasicBlock::instructionsWithoutDebug(bool)::$_1&, llvm::Instruction&>(llvm::BasicBlock::instructionsWithoutDebug(bool)::$_1&, llvm::Instruction&)
295.00 ms    1.2%    0 s                                  std::__1::__function::__alloc_func<llvm::BasicBlock::instructionsWithoutDebug(bool)::$_1, std::__1::allocator<llvm::BasicBlock::instructionsWithoutDebug(bool)::$_1>, bool (llvm::Instruction&)>::operator()(llvm::Instruction&)
295.00 ms    1.2%    0 s                                   std::__1::__function::__func<llvm::BasicBlock::instructionsWithoutDebug(bool)::$_1, std::__1::allocator<llvm::BasicBlock::instructionsWithoutDebug(bool)::$_1>, bool (llvm::Instruction&)>::operator()(llvm::Instruction&)
295.00 ms    1.2%    0 s                                    std::__1::__function::__value_func<bool (llvm::Instruction&)>::operator()(llvm::Instruction&) const
295.00 ms    1.2%    0 s                                     std::__1::function<bool (llvm::Instruction&)>::operator()(llvm::Instruction&) const
295.00 ms    1.2%    0 s                                      llvm::filter_iterator_base<llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, true, false, void>, false, false>, std::__1::function<bool (llvm::Instruction&)>, std::__1::bidirectional_iterator_tag>::findNextValid()
270.00 ms    1.1%    0 s                                       llvm::filter_iterator_base<llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, true, false, void>, false, false>, std::__1::function<bool (llvm::Instruction&)>, std::__1::bidirectional_iterator_tag>::operator++()
261.00 ms    1.0%    0 s                                        llvm::CodeExtractorAnalysisCache::CodeExtractorAnalysisCache(llvm::Function&)
261.00 ms    1.0%    0 s                                         llvm::CodeExtractorAnalysisCache::CodeExtractorAnalysisCache(llvm::Function&)
260.00 ms    1.0%    0 s                                          getCodeExtractorArguments(llvm::OutlinableRegion&, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> >&, llvm::DenseSet<unsigned int, llvm::DenseMapInfo<unsigned int> >&, llvm::DenseMap<llvm::Value*, llvm::Value*, llvm::DenseMapInfo<llvm::Value*>, llvm::detail::DenseMapPair<llvm::Value*, llvm::Value*> >&, llvm::SetVector<llvm::Value*, std::__1::vector<llvm::Value*, std::__1::allocator<llvm::Value*> >, llvm::DenseSet<llvm::Value*, llvm::DenseMapInfo<llvm::Value*> > >&, llvm::SetVector<llvm::Value*, std::__1::vector<llvm::Value*, std::__1::allocator<llvm::Value*> >, llvm::DenseSet<llvm::Value*, llvm::DenseMapInfo<llvm::Value*> > >&)
260.00 ms    1.0%    0 s                                           llvm::IROutliner::findAddInputsOutputs(llvm::Module&, llvm::OutlinableRegion&, llvm::DenseSet<unsigned int, llvm::DenseMapInfo<unsigned int> >&)
260.00 ms    1.0%    0 s                                            llvm::IROutliner::doOutline(llvm::Module&)
```

cc @AndrewLitteken 
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to