[TVM Discuss] [Questions] Disabling LLVM Unrolling

Animesh Jain via TVM Discuss Sat, 21 Mar 2020 10:36:08 -0700


Thanks for sharing your thoughts.

Let me share some more background. To achieve high performance for compute
heavy ops (close to hand-written kernels like MKLDNN or ACL), we need to
perform vector register tiling. This is one more level lower than cache tiling.
Here, we have to carefully craft a TVM schedule that carefully manages the data
reuse in vector registers, number of vector registers, number of vector FMA
operations in innermost loop, number of vector memory accesses, and prefetcher
friendly accesses. There are many factors to consider here, and a developer has
to carefully craft the loop optimizations schedule to find a suitable balance.
@kevinthesun can back me up here.

Now, simple optimization like Loop unrolling can completely offset this
balance. For example, my TVM schedule might be keeping the total vector
register count < 32 (# ARM vector registers), but LLVM unrolling even by a
factor of 2 will double the vfma operations, defeating the whole purpose of
loop tiling. I have dabbled in writing x86 assembly for SGEMM, and have
experienced all these issues.

### What about rerolling, unroll-and-jam and strip-mining?
I think reroll is disabled by default. Dont know about unroll-and-jam.
Strip-mining is a TVM responsibility (it is just tiling in 1D for vectorization
- common in TVM). But, I understand your overarching point. And yes, more
strongly I am suggesting to give more control to TVM for these loop
optimizations. I also believe that different loop optimizations will have
different impact. I observed that LLVM unrolling has a big impact.

### Default schedules to use LLVM optimizations?
I was thinking about this as well. And I completely agree. I want more control
in compute-intensive ops, but I want LLVM to optimize for default schedules.
Even further, if I can embed something in TVM IR to disable a loop optimization
for certain section of LLVM IR, it might be the best design.

### Mix and match of TVM optimization and LLVM optimization
Yes, this is same as previous point.

## Summary
### Why should we disable LLVM unrolling?
* TVM schedules performance behave as expected. A developer can trust his/her
schedule for performance.
* This also helps in improving Auto-TVM that can be painfully long today. By
carefully analyzing the loop structure, we can reason about how good register
tiling is and discard bad configurations quickly.
* Disabling LLVM unrolling does not mean we will miss a configuration. Our
schedules are templated. AutoTVM will have configuration where the axis that
LLVM was unrolling, is now unrolled by TVM. (But, I understand we need data).

### Why should we keep LLVM unrolling?
* Default schedules might see perf degradation.
* In short-term, top-hub might not be optimal anymore. We might need to re-tune.

If all of us see theoretical benefits and agree that the performance data is
the only deciding factor, I can start collecting data for both x86 and ARM.
Data collection will take time, so it is better if we agree on the idea first :)

---
[Visit Topic](https://discuss.tvm.ai/t/disabling-llvm-unrolling/6039/5) to
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click
here](https://discuss.tvm.ai/email/unsubscribe/a3efd352fb16577099457feba58cfeca7e9ac306faafa4ba12395d9a874f2f5f).

[TVM Discuss] [Questions] Disabling LLVM Unrolling

Reply via email to