jwfromm opened a new pull request #6748:
URL: https://github.com/apache/incubator-tvm/pull/6748
This collection of new utility functions allows enables a starting floating
point model to be converted to a datatype and format that can be run using the
efficient HWNC tensorcore schedules introduced in #6121. Although these
schedules are the fastest available in TVM, they have a few very specific
requirements that make it difficult to apply generally to models. Specifically,
compatible operators must have inputs set to `int4` or `int8`, all compatible
layers must be in the `HWNC` layout, and incompatible layers should be left in
their original layout and datatype. There are currently not tools to make such
changes to an existing model. To address this, I've written the following
utilities:
`count_layers`: A pass that determines the number of layers of the specified
operator in a graph. Although generally useful, for tensorcores we use this to
enable the `skip_layers` feature.
`recast`: A pass that changes the input and output datatype of all specified
operators in a graph, with the option to skip a set of layers. Although this
pass is only useful for benchmarking as it does not apply any intelligent
quantization, this type of utility is a common topic on the Discuss forums and
can serve as a good example for users interested in similar functionality.
`LayoutConfig`: An optional scope that can be applied around the
`ConvertLayout` pass. In this PR I use it to enable skipping the conversion of
specified conv2d layers, but it could be extended for other customization down
the line.
HWNC support for `ConvertLayout`.
The combination of these utilities allows us to target HWNC tensorcores
using a workflow such as this:
```
mod, params = relay.testing.resnet.get_workload()
layout_config = relay.transform.LayoutConfig(skip_layers=[0])
desired_layouts = {'nn.conv2d: ['HWNC', 'default']}
with layout_config:
seq =
tvm.transform.Sequential([relay.transform.ConvertLayout(desired_layouts)])
with tvm.transform.PassContext(opt_level=3):
mod = seq(mod)
mod = recast(mod, 'int4', 'int32', skip_layers=[0])
```
When autotuned, the resulting `mod` will qualify for using the HWNC
tensorcore strategy.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]