jwfromm opened a new pull request #6748:
URL: https://github.com/apache/incubator-tvm/pull/6748


   This collection of new utility functions allows enables a starting floating 
point model to be converted to a datatype and format that can be run using the 
efficient HWNC tensorcore schedules introduced in #6121. Although these 
schedules are the fastest available in TVM, they have a few very specific 
requirements that make it difficult to apply generally to models. Specifically, 
compatible operators must have inputs set to `int4` or `int8`, all compatible 
layers must be in the `HWNC` layout, and incompatible layers should be left in 
their original layout and datatype. There are currently not tools to make such 
changes to an existing model. To address this, I've written the following 
utilities:
   
   `count_layers`: A pass that determines the number of layers of the specified 
operator in a graph. Although generally useful, for tensorcores we use this to 
enable the `skip_layers` feature.
   
   `recast`: A pass that changes the input and output datatype of all specified 
operators in a graph, with the option to skip a set of layers. Although this 
pass is only useful for benchmarking as it does not apply any intelligent 
quantization, this type of utility is a common topic on the Discuss forums and 
can serve as a good example for users interested in similar functionality.
   
   `LayoutConfig`: An optional scope that can be applied around the 
`ConvertLayout` pass. In this PR I use it to enable skipping the conversion of 
specified conv2d layers, but it could be extended for other customization down 
the line.
   
   HWNC support for `ConvertLayout`.
   
   The combination of these utilities allows us to target HWNC tensorcores 
using a workflow such as this:
   ```
   mod, params = relay.testing.resnet.get_workload()
   layout_config = relay.transform.LayoutConfig(skip_layers=[0])
   desired_layouts = {'nn.conv2d: ['HWNC', 'default']}
   with layout_config:
       seq = 
tvm.transform.Sequential([relay.transform.ConvertLayout(desired_layouts)])
       with tvm.transform.PassContext(opt_level=3):
           mod = seq(mod)
   mod = recast(mod, 'int4', 'int32', skip_layers=[0])
   ``` 
   When autotuned, the resulting `mod` will qualify for using the HWNC 
tensorcore strategy.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to