Aharrypotter opened a new issue, #19534:
URL: https://github.com/apache/tvm/issues/19534

   ## Problem
   
   The Relax TFLite frontend currently has two related blockers for quantized 
TFLite import.
   
   First, quantized tensors are blocked early in `get_tensors()` by the tensor 
quantization metadata guard. After preserving tensor-level quantization 
metadata (`scale`, `zero_point`, and `axis`) and allowing the frontend to 
proceed further, the next blocker appears at the operator conversion stage:
   
   ```text
   NameError: name '_qnn' is not defined
   ```
   
   This happens because the frontend contains quantized operator conversion 
paths that reference non-existent `_qnn.op.*` APIs.
   
   At the same time, Relax already provides `quantize` / `dequantize` operators 
with C++ registration, Python APIs, legalization to TE, and tests. This 
suggests that quantized TFLite operators may initially be imported using QDQ 
decomposition around existing Relax ops, rather than requiring a new set of 
fused QNN operators as the first step.
   
   This issue tracks the work needed to support quantized TFLite operator 
import in the Relax frontend.
   
   ## Affected `_qnn.op.*` calls
   
   The TFLite frontend (`python/tvm/relax/frontend/tflite/tflite_frontend.py`) 
references 7 non-existent QNN ops across 18 call sites:
   
   | Op | Call sites | Typical context |
   |----|-----------|-----------------|
   | `quantize` | 1 | float → int8 in `convert_quantize()` |
   | `dequantize` | 4 | int8 → float in `convert_dequantize()` and 
`convert_detection_postprocess()` |
   | `requantize` | 9 | post-conv/dense/relu/reshape/reduce scale adjustment |
   | `conv2d` | 1 | quantized 2D convolution in `convert_conv()` |
   | `dense` | 1 | quantized fully connected in `convert_fully_connected()` |
   | `concat` | 1 | quantized concatenation in `convert_concatenation()` |
   | `conv2d_transpose` | 1 | quantized transposed convolution in 
`convert_transpose_conv()` |
   
   ## Existing Relax quantization infrastructure
   
   Relax already has two QDQ operators with C++ registration, Python APIs, 
legalization, and tests:
   
   - `relax.op.quantize(data, scale, zero_point, axis, out_dtype)` — 
`clip(round(input / scale) + zp, min, max)`
   - `relax.op.dequantize(data, scale, zero_point, axis, out_dtype)` — `scale * 
(input - zp)`
   
   These are defined in:
   - C++: `src/relax/op/tensor/qdq.cc`
   - Python API: `python/tvm/relax/op/qdq.py`
   - Legalization: `python/tvm/relax/transform/legalize_ops/qdq.py`
   - Tests: `tests/python/relax/test_op_qdq.py`, 
`tests/python/relax/test_transform_legalize_ops_qdq.py`
   
   Both support per-tensor and per-axis (channel-wise) quantization via the 
`axis` parameter.
   
   ## Possible implementation directions
   
   There are at least two possible paths:
   
   1. Add explicit fused Relax QNN operators, such as `qnn.conv2d`, 
`qnn.dense`, and `qnn.requantize`.
   2. Reuse existing Relax QDQ operators and import quantized TFLite operators 
as QDQ patterns around existing Relax compute ops.
   
   I propose starting with the second path. The QDQ-based approach has a 
smaller API surface and can reuse existing Relax quantize/dequantize 
infrastructure. Explicit fused QNN operators may still be useful later for 
optimized int8 execution or backend-specific pattern matching, and can be 
discussed as a follow-up if needed.
   
   ## Task list
   
   - [ ] Preserve tensor quantization metadata in `get_tensors()` (`scale`, 
`zero_point`, and `axis`) and remove the global quantization guard
   - [ ] Replace quantize/dequantize helpers with Relax QDQ ops
   - [ ] Support quantized Conv2D via QDQ decomposition
   - [ ] Add per-channel Conv2D weight support
   - [ ] Support quantized FullyConnected / Dense via QDQ
   - [ ] Support remaining quantized ops (`concat`, `conv2d_transpose`, 
`requantize` paths)
   
   ## Out of scope
   
   - ONNX `QLinearConv` / `QLinearMatMul` — may benefit from similar 
infrastructure but tracked separately
   - End-to-end int8 kernel optimization — may require explicit fused QNN ops 
or backend-specific QDQ pattern matching, and is not the first milestone
   - Per-channel axis remap for arbitrary ops — only addressed for conv2d and 
dense where weight layout transpose occurs
   
   ## References
   
   - TFLite frontend: `python/tvm/relax/frontend/tflite/tflite_frontend.py`
   - TFLite quantization spec: 
https://www.tensorflow.org/lite/performance/quantization_spec
   - Existing TFLite tracking issues: #19412, #19519
   - Related: tensor quantization metadata parsing
   
   cc @tlopex for visibility


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to