mbs-octoml commented on a change in pull request #22: URL: https://github.com/apache/tvm-rfcs/pull/22#discussion_r717880405
########## File path: rfcs/0022-tir-non-scalar-constants.md ########## @@ -0,0 +1,148 @@ + +- Feature Name: tir_non_scalar_constants +- Start Date: 2021-06-01 +- RFC PR: https://github.com/apache/tvm-rfcs/pull/22 +- GitHub Issue: TBD + +# 1. Summary + +This RFC proposes how non-scalar constants could be represented in TIR and used by passes in the lowering process. + +# 2. Motivation + +Currently, the non-scalar constants are represented in Relay (relay.Constant) to be used by relay passes but not in TIR. Therefore, when performing lowering using TIR passes, we have to maintain a side-channel of tir::Var to constant non-scalar data mapping to perform transformations that could use the knowledge where some of the data are constants. + +Few example scenarios as further motivation : + +## Weight compression + +When lowering for accelerators (See [Arm(R) Ethos(TM)-U NPU](https://github.com/apache/tvm-rfcs/pull/11)), certain operations will need to get tiled to co-optimize performance and memory utilization. Such tiling patterns create slices of weights that need compressing that will end up with varying sizes. Therefore, the knowledge of some tir::Vars refer to constants are critical in the level of TIR to perform this. + +## Memory Planning + +The TIR program has the ability to express both inter and intra operator memory requirement, post-scheduling as explained further by [Unified Static Memory Planning RFC](https://github.com/apache/tvm-rfcs/pull/9). It would be better if the constants could be embedded to the TIR PrimFunc because the memory for constants becomes visible for the memory planner. Moreover, this allows various [target-dependent lowerings](https://github.com/apache/tvm-rfcs/pull/10), to produce TIR PrimFuncs with target-specific constants in it. + +## Winograd Constants + +The Winograd transformation (used for fast GEMMs) involves multiplication by a hard-coded constant tensor. This is currently accomplished in TE using a complicated TE compute expression with many nested selects. Being able to directly express a constant tensor here would significantly simplify this code. See https://github.com/apache/tvm/blob/9df2ae8eaa8b394013182a7ad09ac57fe401f80e/python/tvm/topi/utils.py#L320-L350. + + +# 3. Guide-level explanation + +This is not particularly a user-facing feature and this will allow constants to be 'linked' to TIR. Intially, tir.allocate_const nodes will only be created during scheduling when -link-params is included in the Target (e.g. to relay.build and to TVMC). + +# 4. Reference-level explanation + +The proposal is quite simple and it could be explained as follows : + +``` [email protected] +def myfunc(): + param = tir.allocate_const([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], "int32", [10]) +``` + +This follows closely the semantics of tir.allocate and the difference being it represent a buffer filled with constants. + +There are mainly two ways of constants being created in the lowering : + +A1. Linking the params of the model (relay.Constants -- currently, the model params would be in Relay as relay.Constant nodes) + +A2. Creation/Mutation of constants in the lowering -- these maybe different to the original constants prior to scheduling the Relay into TIR. + +For A1, this should only be done if the target support codegeneration of the constant data (i.e. support --link-params) as part of the operator runtime.Module. Therefore, this is executor independent. + +For A2, the lowering for targets that support constant as part of the operators, there can be new (differently sized) constants could be created due to optimizations such as weight compression as required by the target. + + +### IRNode Definition + +``` +class AllocateConstNode : public StmtNode { + public: + /*! \brief The buffer variable. */ + Var buffer_var; + /*! \brief The data associated to the constant. */ + NDArray data; + /*! \brief If the PrimFunc containing the Stmt is added to IRModule, + this is an optional index to indicate the index within + "Constants" attribute, that is a Array<NDArray> of IRModule. + */ + Optional<Integer> irmod_storage_idx; Review comment: LGTM nit: Suggest we make data and irmod_storage_idx mutually exclusive. I assume you'll want a Pass to hoist (and share?) the data into IRModule "Constants" attribute, in which case you'll rewrite from data to irmod_storage_idx. This is almost identical to how the Relay parser uses the 'meta' syntax and MetaTable to refer to and resolve relay.Constants, except: - The array is keyed by 'relay.Constant'. - The array holds Relay Constants, not NDArrays. - The MetaTable is just to help parsing and is discarded. This suggest we should similarly move the contents of the MataTable into IRModule attributes, and allow Relay Constant to similarly represent the NDArray immediately or via index. If that were already in place then I'd suggest we replace data and irmod_storage_idx with just a Relay Constant so that constants can easily make the hop from Relay to TIR. Could you capture that under the alts considered so I don't forget it? ########## File path: rfcs/0022-tir-non-scalar-constants.md ########## @@ -0,0 +1,148 @@ + +- Feature Name: tir_non_scalar_constants +- Start Date: 2021-06-01 +- RFC PR: https://github.com/apache/tvm-rfcs/pull/22 +- GitHub Issue: TBD + +# 1. Summary + +This RFC proposes how non-scalar constants could be represented in TIR and used by passes in the lowering process. + +# 2. Motivation + +Currently, the non-scalar constants are represented in Relay (relay.Constant) to be used by relay passes but not in TIR. Therefore, when performing lowering using TIR passes, we have to maintain a side-channel of tir::Var to constant non-scalar data mapping to perform transformations that could use the knowledge where some of the data are constants. + +Few example scenarios as further motivation : + +## Weight compression + +When lowering for accelerators (See [Arm(R) Ethos(TM)-U NPU](https://github.com/apache/tvm-rfcs/pull/11)), certain operations will need to get tiled to co-optimize performance and memory utilization. Such tiling patterns create slices of weights that need compressing that will end up with varying sizes. Therefore, the knowledge of some tir::Vars refer to constants are critical in the level of TIR to perform this. + +## Memory Planning + +The TIR program has the ability to express both inter and intra operator memory requirement, post-scheduling as explained further by [Unified Static Memory Planning RFC](https://github.com/apache/tvm-rfcs/pull/9). It would be better if the constants could be embedded to the TIR PrimFunc because the memory for constants becomes visible for the memory planner. Moreover, this allows various [target-dependent lowerings](https://github.com/apache/tvm-rfcs/pull/10), to produce TIR PrimFuncs with target-specific constants in it. + +## Winograd Constants + +The Winograd transformation (used for fast GEMMs) involves multiplication by a hard-coded constant tensor. This is currently accomplished in TE using a complicated TE compute expression with many nested selects. Being able to directly express a constant tensor here would significantly simplify this code. See https://github.com/apache/tvm/blob/9df2ae8eaa8b394013182a7ad09ac57fe401f80e/python/tvm/topi/utils.py#L320-L350. + + +# 3. Guide-level explanation + +This is not particularly a user-facing feature and this will allow constants to be 'linked' to TIR. Intially, tir.allocate_const nodes will only be created during scheduling when -link-params is included in the Target (e.g. to relay.build and to TVMC). + +# 4. Reference-level explanation + +The proposal is quite simple and it could be explained as follows : + +``` [email protected] +def myfunc(): + param = tir.allocate_const([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], "int32", [10]) +``` + +This follows closely the semantics of tir.allocate and the difference being it represent a buffer filled with constants. + +There are mainly two ways of constants being created in the lowering : + +A1. Linking the params of the model (relay.Constants -- currently, the model params would be in Relay as relay.Constant nodes) + +A2. Creation/Mutation of constants in the lowering -- these maybe different to the original constants prior to scheduling the Relay into TIR. + +For A1, this should only be done if the target support codegeneration of the constant data (i.e. support --link-params) as part of the operator runtime.Module. Therefore, this is executor independent. + +For A2, the lowering for targets that support constant as part of the operators, there can be new (differently sized) constants could be created due to optimizations such as weight compression as required by the target. + + +### IRNode Definition + +``` +class AllocateConstNode : public StmtNode { + public: + /*! \brief The buffer variable. */ + Var buffer_var; + /*! \brief The data associated to the constant. */ + NDArray data; + /*! \brief If the PrimFunc containing the Stmt is added to IRModule, + this is an optional index to indicate the index within + "Constants" attribute, that is a Array<NDArray> of IRModule. + */ + Optional<Integer> irmod_storage_idx; Review comment: nit: Suggest we make data and irmod_storage_idx mutually exclusive. I assume you'll want a Pass to hoist (and share?) the data into IRModule "Constants" attribute, in which case you'll rewrite from data to irmod_storage_idx. This is almost identical to how the Relay parser uses the 'meta' syntax and MetaTable to refer to and resolve relay.Constants, except: - The array is keyed by 'relay.Constant'. - The array holds Relay Constants, not NDArrays. - The MetaTable is just to help parsing and is discarded. This suggest we should similarly move the contents of the MataTable into IRModule attributes, and allow Relay Constant to similarly represent the NDArray immediately or via index. If that were already in place then I'd suggest we replace data and irmod_storage_idx with just a Relay Constant so that constants can easily make the hop from Relay to TIR. Could you capture that under the alts considered so I don't forget it? ########## File path: rfcs/0022-tir-non-scalar-constants.md ########## @@ -0,0 +1,148 @@ + +- Feature Name: tir_non_scalar_constants +- Start Date: 2021-06-01 +- RFC PR: https://github.com/apache/tvm-rfcs/pull/22 +- GitHub Issue: TBD + +# 1. Summary + +This RFC proposes how non-scalar constants could be represented in TIR and used by passes in the lowering process. + +# 2. Motivation + +Currently, the non-scalar constants are represented in Relay (relay.Constant) to be used by relay passes but not in TIR. Therefore, when performing lowering using TIR passes, we have to maintain a side-channel of tir::Var to constant non-scalar data mapping to perform transformations that could use the knowledge where some of the data are constants. + +Few example scenarios as further motivation : + +## Weight compression + +When lowering for accelerators (See [Arm(R) Ethos(TM)-U NPU](https://github.com/apache/tvm-rfcs/pull/11)), certain operations will need to get tiled to co-optimize performance and memory utilization. Such tiling patterns create slices of weights that need compressing that will end up with varying sizes. Therefore, the knowledge of some tir::Vars refer to constants are critical in the level of TIR to perform this. + +## Memory Planning + +The TIR program has the ability to express both inter and intra operator memory requirement, post-scheduling as explained further by [Unified Static Memory Planning RFC](https://github.com/apache/tvm-rfcs/pull/9). It would be better if the constants could be embedded to the TIR PrimFunc because the memory for constants becomes visible for the memory planner. Moreover, this allows various [target-dependent lowerings](https://github.com/apache/tvm-rfcs/pull/10), to produce TIR PrimFuncs with target-specific constants in it. + +## Winograd Constants + +The Winograd transformation (used for fast GEMMs) involves multiplication by a hard-coded constant tensor. This is currently accomplished in TE using a complicated TE compute expression with many nested selects. Being able to directly express a constant tensor here would significantly simplify this code. See https://github.com/apache/tvm/blob/9df2ae8eaa8b394013182a7ad09ac57fe401f80e/python/tvm/topi/utils.py#L320-L350. + + +# 3. Guide-level explanation + +This is not particularly a user-facing feature and this will allow constants to be 'linked' to TIR. Intially, tir.allocate_const nodes will only be created during scheduling when -link-params is included in the Target (e.g. to relay.build and to TVMC). + +# 4. Reference-level explanation + +The proposal is quite simple and it could be explained as follows : + +``` [email protected] +def myfunc(): + param = tir.allocate_const([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], "int32", [10]) +``` + +This follows closely the semantics of tir.allocate and the difference being it represent a buffer filled with constants. + +There are mainly two ways of constants being created in the lowering : + +A1. Linking the params of the model (relay.Constants -- currently, the model params would be in Relay as relay.Constant nodes) + +A2. Creation/Mutation of constants in the lowering -- these maybe different to the original constants prior to scheduling the Relay into TIR. + +For A1, this should only be done if the target support codegeneration of the constant data (i.e. support --link-params) as part of the operator runtime.Module. Therefore, this is executor independent. + +For A2, the lowering for targets that support constant as part of the operators, there can be new (differently sized) constants could be created due to optimizations such as weight compression as required by the target. + + +### IRNode Definition + +``` +class AllocateConstNode : public StmtNode { + public: + /*! \brief The buffer variable. */ + Var buffer_var; + /*! \brief The data associated to the constant. */ + NDArray data; + /*! \brief If the PrimFunc containing the Stmt is added to IRModule, + this is an optional index to indicate the index within + "Constants" attribute, that is a Array<NDArray> of IRModule. + */ + Optional<Integer> irmod_storage_idx; Review comment: nit: Suggest we make data and irmod_storage_idx mutually exclusive. I assume you'll want a Pass to hoist (and share?) the data into the IRModule "Constants" attribute, in which case you'll rewrite from data to irmod_storage_idx. This is almost identical to how the Relay parser uses the 'meta' syntax and MetaTable to refer to and resolve relay.Constants, except: - The array is keyed by 'relay.Constant'. - The array holds Relay Constants, not NDArrays. - The MetaTable is just to help parsing and is discarded. This suggest we should similarly move the contents of the MataTable into IRModule attributes, and allow Relay Constant to similarly represent the NDArray immediately or via index. If that were already in place then I'd suggest we replace data and irmod_storage_idx with just a Relay Constant so that constants can easily make the hop from Relay to TIR. Could you capture that under the alts considered so I don't forget it? ########## File path: rfcs/0022-tir-non-scalar-constants.md ########## @@ -0,0 +1,148 @@ + +- Feature Name: tir_non_scalar_constants +- Start Date: 2021-06-01 +- RFC PR: https://github.com/apache/tvm-rfcs/pull/22 +- GitHub Issue: TBD + +# 1. Summary + +This RFC proposes how non-scalar constants could be represented in TIR and used by passes in the lowering process. + +# 2. Motivation + +Currently, the non-scalar constants are represented in Relay (relay.Constant) to be used by relay passes but not in TIR. Therefore, when performing lowering using TIR passes, we have to maintain a side-channel of tir::Var to constant non-scalar data mapping to perform transformations that could use the knowledge where some of the data are constants. + +Few example scenarios as further motivation : + +## Weight compression + +When lowering for accelerators (See [Arm(R) Ethos(TM)-U NPU](https://github.com/apache/tvm-rfcs/pull/11)), certain operations will need to get tiled to co-optimize performance and memory utilization. Such tiling patterns create slices of weights that need compressing that will end up with varying sizes. Therefore, the knowledge of some tir::Vars refer to constants are critical in the level of TIR to perform this. + +## Memory Planning + +The TIR program has the ability to express both inter and intra operator memory requirement, post-scheduling as explained further by [Unified Static Memory Planning RFC](https://github.com/apache/tvm-rfcs/pull/9). It would be better if the constants could be embedded to the TIR PrimFunc because the memory for constants becomes visible for the memory planner. Moreover, this allows various [target-dependent lowerings](https://github.com/apache/tvm-rfcs/pull/10), to produce TIR PrimFuncs with target-specific constants in it. + +## Winograd Constants + +The Winograd transformation (used for fast GEMMs) involves multiplication by a hard-coded constant tensor. This is currently accomplished in TE using a complicated TE compute expression with many nested selects. Being able to directly express a constant tensor here would significantly simplify this code. See https://github.com/apache/tvm/blob/9df2ae8eaa8b394013182a7ad09ac57fe401f80e/python/tvm/topi/utils.py#L320-L350. + + +# 3. Guide-level explanation + +This is not particularly a user-facing feature and this will allow constants to be 'linked' to TIR. Intially, tir.allocate_const nodes will only be created during scheduling when -link-params is included in the Target (e.g. to relay.build and to TVMC). + +# 4. Reference-level explanation + +The proposal is quite simple and it could be explained as follows : + +``` [email protected] +def myfunc(): + param = tir.allocate_const([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], "int32", [10]) +``` + +This follows closely the semantics of tir.allocate and the difference being it represent a buffer filled with constants. + +There are mainly two ways of constants being created in the lowering : + +A1. Linking the params of the model (relay.Constants -- currently, the model params would be in Relay as relay.Constant nodes) + +A2. Creation/Mutation of constants in the lowering -- these maybe different to the original constants prior to scheduling the Relay into TIR. + +For A1, this should only be done if the target support codegeneration of the constant data (i.e. support --link-params) as part of the operator runtime.Module. Therefore, this is executor independent. + +For A2, the lowering for targets that support constant as part of the operators, there can be new (differently sized) constants could be created due to optimizations such as weight compression as required by the target. + + +### IRNode Definition + +``` +class AllocateConstNode : public StmtNode { + public: + /*! \brief The buffer variable. */ + Var buffer_var; + /*! \brief The data associated to the constant. */ + NDArray data; + /*! \brief If the PrimFunc containing the Stmt is added to IRModule, + this is an optional index to indicate the index within + "Constants" attribute, that is a Array<NDArray> of IRModule. + */ + Optional<Integer> irmod_storage_idx; Review comment: nit: Suggest we make data and irmod_storage_idx mutually exclusive. I assume you'll want a Pass to hoist (and share?) the data into the IRModule "Constants" attribute, in which case you'll rewrite from data to irmod_storage_idx. This is almost identical to how the Relay parser uses the 'meta' syntax and MetaTable to refer to and resolve relay.Constants, except: - The array is keyed by 'relay.Constant'. - The array holds Relay Constants, not NDArrays. - The MetaTable is just to help parsing and is discarded. This suggest we should similarly move the contents of the MataTable into IRModule attributes (keyed by "Constants"), and allow a Relay Constant to similarly represent the NDArray immediately or via an index. If that were already in place then I'd suggest we replace data and irmod_storage_idx with just a Relay Constant so that constants can easily make the hop from Relay to TIR. Could you capture that under the alts considered so I don't forget it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
