Aharrypotter opened a new issue, #19534: URL: https://github.com/apache/tvm/issues/19534
## Problem The Relax TFLite frontend currently has two related blockers for quantized TFLite import. First, quantized tensors are blocked early in `get_tensors()` by the tensor quantization metadata guard. After preserving tensor-level quantization metadata (`scale`, `zero_point`, and `axis`) and allowing the frontend to proceed further, the next blocker appears at the operator conversion stage: ```text NameError: name '_qnn' is not defined ``` This happens because the frontend contains quantized operator conversion paths that reference non-existent `_qnn.op.*` APIs. At the same time, Relax already provides `quantize` / `dequantize` operators with C++ registration, Python APIs, legalization to TE, and tests. This suggests that quantized TFLite operators may initially be imported using QDQ decomposition around existing Relax ops, rather than requiring a new set of fused QNN operators as the first step. This issue tracks the work needed to support quantized TFLite operator import in the Relax frontend. ## Affected `_qnn.op.*` calls The TFLite frontend (`python/tvm/relax/frontend/tflite/tflite_frontend.py`) references 7 non-existent QNN ops across 18 call sites: | Op | Call sites | Typical context | |----|-----------|-----------------| | `quantize` | 1 | float → int8 in `convert_quantize()` | | `dequantize` | 4 | int8 → float in `convert_dequantize()` and `convert_detection_postprocess()` | | `requantize` | 9 | post-conv/dense/relu/reshape/reduce scale adjustment | | `conv2d` | 1 | quantized 2D convolution in `convert_conv()` | | `dense` | 1 | quantized fully connected in `convert_fully_connected()` | | `concat` | 1 | quantized concatenation in `convert_concatenation()` | | `conv2d_transpose` | 1 | quantized transposed convolution in `convert_transpose_conv()` | ## Existing Relax quantization infrastructure Relax already has two QDQ operators with C++ registration, Python APIs, legalization, and tests: - `relax.op.quantize(data, scale, zero_point, axis, out_dtype)` — `clip(round(input / scale) + zp, min, max)` - `relax.op.dequantize(data, scale, zero_point, axis, out_dtype)` — `scale * (input - zp)` These are defined in: - C++: `src/relax/op/tensor/qdq.cc` - Python API: `python/tvm/relax/op/qdq.py` - Legalization: `python/tvm/relax/transform/legalize_ops/qdq.py` - Tests: `tests/python/relax/test_op_qdq.py`, `tests/python/relax/test_transform_legalize_ops_qdq.py` Both support per-tensor and per-axis (channel-wise) quantization via the `axis` parameter. ## Possible implementation directions There are at least two possible paths: 1. Add explicit fused Relax QNN operators, such as `qnn.conv2d`, `qnn.dense`, and `qnn.requantize`. 2. Reuse existing Relax QDQ operators and import quantized TFLite operators as QDQ patterns around existing Relax compute ops. I propose starting with the second path. The QDQ-based approach has a smaller API surface and can reuse existing Relax quantize/dequantize infrastructure. Explicit fused QNN operators may still be useful later for optimized int8 execution or backend-specific pattern matching, and can be discussed as a follow-up if needed. ## Task list - [ ] Preserve tensor quantization metadata in `get_tensors()` (`scale`, `zero_point`, and `axis`) and remove the global quantization guard - [ ] Replace quantize/dequantize helpers with Relax QDQ ops - [ ] Support quantized Conv2D via QDQ decomposition - [ ] Add per-channel Conv2D weight support - [ ] Support quantized FullyConnected / Dense via QDQ - [ ] Support remaining quantized ops (`concat`, `conv2d_transpose`, `requantize` paths) ## Out of scope - ONNX `QLinearConv` / `QLinearMatMul` — may benefit from similar infrastructure but tracked separately - End-to-end int8 kernel optimization — may require explicit fused QNN ops or backend-specific QDQ pattern matching, and is not the first milestone - Per-channel axis remap for arbitrary ops — only addressed for conv2d and dense where weight layout transpose occurs ## References - TFLite frontend: `python/tvm/relax/frontend/tflite/tflite_frontend.py` - TFLite quantization spec: https://www.tensorflow.org/lite/performance/quantization_spec - Existing TFLite tracking issues: #19412, #19519 - Related: tensor quantization metadata parsing cc @tlopex for visibility -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
