cuda CUDNN auto tune, optimal parameters of cuda kernels

Pedro Larroy Wed, 24 Jan 2018 08:40:04 -0800

Hi

We have identified that cuda cudnn autotune produces a significant
spike of ram usage when finding the best convolution algorithm.


As far as we understand this is inside the cudnn library. But in
platforms like the TX1 where we only have 4G this is problematic as
the spike is close to 4G.

auto tune can be disabled with an environment variable, but for these
platforms might be interesting to save these kind of parameters once
and not have them run every time at runtime, otherwise you are
probably doing convolutions with slower kernels.

The second topic I wanted to bring up is, would it be a good idea to
have configurable kernel launch parameters to optimize SM resource
utilization?

Either via maybe a compile time approach based on the target arch:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4640924/

Or based on a runtime profile.

Any thoughts on these topics?

Pedro.

cuda CUDNN auto tune, optimal parameters of cuda kernels

Reply via email to