tqchen commented on a change in pull request #9313: URL: https://github.com/apache/tvm/pull/9313#discussion_r739827358
########## File path: include/tvm/target/se_scope.h ########## @@ -0,0 +1,333 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +/*! + * \file tvm/target/se_scope.h + * \brief A compile time representation for a Storage or Execution Scope. + */ + +#ifndef TVM_TARGET_SE_SCOPE_H_ +#define TVM_TARGET_SE_SCOPE_H_ + +#include <tvm/ir/transform.h> +#include <tvm/target/target.h> + +#include <string> +#include <unordered_map> +#include <utility> + +namespace tvm { + +/*! + * Abstract label for an area of memory. + * + * Currently uninterpreted and arbitrary. Likely to be replaced by a structured representation + * of a memory pool in the future. Please try to use this alias instead of String to aid future + * code migration. + */ +using MemoryScope = String; + +/*! + * \brief Describes at compile time where data is to be stored down to the device and memory + * scope level, or where execution is to take place, down to the device level. It is a quadruple of: + * - A \p device_type (\p DLDeviceType). May be kInvalidDeviceType if unconstrained. + * - A \p virtual_device_id (\p int). This allows us to distinguish distinct devices + * with the same \p Target, for example in a multi-GPU system. May be -1 if unconstrained. + * See "Virtual Devices" below. + * - A \p target (\p Target) describing how to compile code for the intended device. May be null + * if unconstrained. + * - A \p memory_scope (\p MemoryScope, which is currently just \p String) describing which memory + * area is to be used to hold data. May be "" if unconstrained. See "Memory Scopes and Devices" + * below. + * + * Some or all of these fields may be unconstrained, signaling that device planning is free to + * choose a value consistent with the whole program. However if a \p target is given then the \p + * device_type must equal \p target->kind->device_type. + * + * Note that currently we assume if a function returns its result on a particular device + * then the function body is also executed on that device. See the overview comment in + * src/relay/transforms/device_planner.cc for more details. + * + * By 'data' we include both tensors and additional supporting datastructures such as shapes, + * Relay AST items, Relay tuples, and Relay references. Typically non-tensor data must reside + * on a 'CPU'-like device with good support for scalars. + * + * By 'execution' we include both (fused) primitive operators, and all the Relay expressions + * surrounding them which coordinates data and control flow. Again, typically non-primitive + * operators must be executed on a 'CPU'-like device with good support for control flow. + * + * Targets vs Devices + * ------------------ + * Generally \p Targets (a compile-time only datastructue) describe compiler options for a specific + * microarchitecture and toolchain, while \p Devices (a runtime datastructure also available at + * compile time) describe a physical device on the target system. Obviously the target must agree + * with the device's microarchitecture, but we otherwise don't impose any constraints between them: + * - It's ok to use different \p Targets for the same \p Device, eg to squeeze some extra perf + * out of a particular primitive. + * - It's ok to use the same \p Target for multiple \p Devices, eg if we have multiple CPUs. + * + * Traditionally TVM assumes at most one \p Target per \p DLDeviceType. We are moving away from that + * assumption. + * + * Virtual vs Physical Devices + * --------------------------- + * The \p virtual_device_id may be used by downstream passes or the runtime to help decide which + * \p device_id to use for a particular physical runtime \p Device. For example: + * - Some runtimes may support passing in an array of actual `device` specifications, and the + * \p virtual_device_id can be used at runtime as an index into that array. + * - Some runtimes may support dynamically allocating computations to physical devices. On these + * systems a large space of \p virtual_device_ids could be used at compile time, even though + * at runtime only a few physical devices will be present. + * + * The \p virtual_device_id may also be left unconstrained if not needed. + * + * Memory Scopes and Devices + * ------------------------- + * Multi-device systems can have complex memory hierarchies. For example + * \code + * (kDLCPU, 0, "llvm", "global") + * \endcode + * and + * \code + * (kDLCPU, 1, "llvm", "global") + * \endcode + * could denote: + * - The same memory area accessible from two separate CPUs without any CPU affinity; + * - Distinct memory areas in a NUMA architecture for which cross-device access is handled + * by the memory system; + * - Outright distinct memory areas, where one device cannot directly address the memory of + * another. + * + * Similarly: + * \code + * (kDLCPU, 0, "llvm", "global") + * \endcode + * and + * \code + * (kDLCUDA, 0, "cuda", "host") + * \endcode + * could denote the same memory area, but with very different access costs. + * + * Furthermore, not all memory scopes are accessible to all devices, and it is possible for + * a memory scope to only be accessible to a device when code is compiled with particular + * \p Target options. + * + * \p SEScopes themselves have no system-level understanding. Currently device planning will + * simply insert "device_copy" operators wherever \p SEScopes are not exactly pointwise equal. + * We may revisit this in the future as the work on memory pools matures. + * + * Joining and Defaulting + * ---------------------- + * It is possible to 'join' two \p SEScopes to yield the most constrained \p SEScope which agrees + * with both join arguments. Eg: + * \code + * Join((kDLCPU, -1, "llvm", ""), (kInvalidDeviceType, 3, null, "global)) + * => (kDLCPU, 3, "llvm", "global") + * Join((kDLCPU, -1, "llvm", ""), (kInvalidDeviceType, 3, null, "local)) + * => null (no join possible) + * \endcode + * + * Related to 'join' is 'default', which only takes constrained fields from the rhs when the + * lhs is unconstrained: + * \code + * Default(kDLCPU, -1, "llvm", "local"), (kDLCPU, 3, null, "global")) + * => (kDLCPU, 3, "llvm", "local") + * \endcode + * + * These operations are needed during device planning. + * + */ +class SEScopeNode : public Object { + public: + /*! + * \brief The \p DLDeviceType (represtented as an int) of the device. If \p target is known then + * this will be equal to \p target->kind->device_type. If \p target is null then the target is to + * be determined by a later pass. + * + * This is needed to support the legacy "on_device" and "device_copy" calls which only allow + * a \p DLDeviceTypes (as an integer) to be given. + * + * kInvalidDeviceType denotes unconstrained. + */ + int device_type_int; + + DLDeviceType device_type() const { return static_cast<DLDeviceType>(device_type_int); } + + /*! + * \brief The 'virtual' device identifier for the device. This must be resolved to a physical + * device identifier either during compilation or at runtime. + * + * -1 denotes unconstrained. + */ + int virtual_device_id; + + /*! + * \brief The \p Target describing how to compile for the device. + * + * Null denotes unconstrained. Note that if a target later becomes known for this \p SEScope + * then it must be consistent with the \p device_type if that is already known. This is + * enforced by the Join and Default methods. + */ + Target target; + + /*! + * \brief The scope of memory within the device. + * + * Empty denotes unconstrained. + */ + MemoryScope memory_scope; + + /*! + * \brief Returns true if scope is fully unconstrained, ie no target/device type, virtual device + * id or memory scope is specified. + */ + bool is_fully_unconstrained() const { Review comment: IsFullyUnconstrained ########## File path: include/tvm/target/compilation_config.h ########## @@ -0,0 +1,167 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +/*! + * \file tvm/target/compilation_config.h + * \brief A helper class to collect all the targets in canonical form necessary for compilation. Review comment: Do not need to address in this PR given this is still experimental, but would be useful to have a followup discussion on: - A0: rely on most configs in PassContext and IRModule attachment(target constraint of each functions). - A1: centralize options in a single structure. We will need to think about strategies in A0 and A1 and how do they interact with each other. If we are building a fixed function, closed box toolkit, then A1 is usually sufficient. In our case, to enable open box philosophy, we need to consider cases where constraints are pre-populated by passes not written by us(e.g. BYOC to CUDA that only works for cuda), and iterative refinement over the process. In that case, we want IRModule to be self-sufficient for constraints that are already populated, and make followup build function respect them. ########## File path: src/target/se_scope.cc ########## @@ -0,0 +1,224 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +/*! + * \file tvm/target/se_scope.cc + * \brief Implementation of \p SEScope for representing a Storage or Execution scope. + */ +#include <tvm/node/reflection.h> +#include <tvm/runtime/device_api.h> +#include <tvm/target/se_scope.h> + +namespace tvm { + +TVM_REGISTER_NODE_TYPE(SEScopeNode); + +void SEScopeNode::VisitAttrs(AttrVisitor* v) { + v->Visit("device_type_int", &device_type_int); + v->Visit("virtual_device_id", &virtual_device_id); + v->Visit("target", &target); + v->Visit("memory_scope", &memory_scope); +} + +bool SEScopeNode::SEqualReduce(const SEScopeNode* other, SEqualReducer equal) const { + return device_type_int == other->device_type_int && + virtual_device_id == other->virtual_device_id && + // NOTE: Comparing targets by their str representations + target->str() == other->target->str() && memory_scope == other->memory_scope; +} + +void SEScopeNode::SHashReduce(SHashReducer hash_reduce) const { + hash_reduce(device_type_int); + hash_reduce(virtual_device_id); + // NOTE: Reducing target to its str representation + hash_reduce(target->str()); Review comment: structural hash on target? cc @zxybazh. Add a TODO is OK, confirm if str is a legacy property that can be removed. ########## File path: include/tvm/target/se_scope.h ########## @@ -0,0 +1,333 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +/*! + * \file tvm/target/se_scope.h + * \brief A compile time representation for a Storage or Execution Scope. + */ + +#ifndef TVM_TARGET_SE_SCOPE_H_ +#define TVM_TARGET_SE_SCOPE_H_ + +#include <tvm/ir/transform.h> +#include <tvm/target/target.h> + +#include <string> +#include <unordered_map> +#include <utility> + +namespace tvm { + +/*! + * Abstract label for an area of memory. + * + * Currently uninterpreted and arbitrary. Likely to be replaced by a structured representation + * of a memory pool in the future. Please try to use this alias instead of String to aid future + * code migration. + */ +using MemoryScope = String; + +/*! + * \brief Describes at compile time where data is to be stored down to the device and memory + * scope level, or where execution is to take place, down to the device level. It is a quadruple of: + * - A \p device_type (\p DLDeviceType). May be kInvalidDeviceType if unconstrained. + * - A \p virtual_device_id (\p int). This allows us to distinguish distinct devices + * with the same \p Target, for example in a multi-GPU system. May be -1 if unconstrained. + * See "Virtual Devices" below. + * - A \p target (\p Target) describing how to compile code for the intended device. May be null + * if unconstrained. + * - A \p memory_scope (\p MemoryScope, which is currently just \p String) describing which memory + * area is to be used to hold data. May be "" if unconstrained. See "Memory Scopes and Devices" + * below. + * + * Some or all of these fields may be unconstrained, signaling that device planning is free to + * choose a value consistent with the whole program. However if a \p target is given then the \p + * device_type must equal \p target->kind->device_type. + * + * Note that currently we assume if a function returns its result on a particular device + * then the function body is also executed on that device. See the overview comment in + * src/relay/transforms/device_planner.cc for more details. + * + * By 'data' we include both tensors and additional supporting datastructures such as shapes, + * Relay AST items, Relay tuples, and Relay references. Typically non-tensor data must reside + * on a 'CPU'-like device with good support for scalars. + * + * By 'execution' we include both (fused) primitive operators, and all the Relay expressions + * surrounding them which coordinates data and control flow. Again, typically non-primitive + * operators must be executed on a 'CPU'-like device with good support for control flow. + * + * Targets vs Devices + * ------------------ + * Generally \p Targets (a compile-time only datastructue) describe compiler options for a specific + * microarchitecture and toolchain, while \p Devices (a runtime datastructure also available at + * compile time) describe a physical device on the target system. Obviously the target must agree + * with the device's microarchitecture, but we otherwise don't impose any constraints between them: + * - It's ok to use different \p Targets for the same \p Device, eg to squeeze some extra perf + * out of a particular primitive. + * - It's ok to use the same \p Target for multiple \p Devices, eg if we have multiple CPUs. + * + * Traditionally TVM assumes at most one \p Target per \p DLDeviceType. We are moving away from that + * assumption. + * + * Virtual vs Physical Devices + * --------------------------- + * The \p virtual_device_id may be used by downstream passes or the runtime to help decide which + * \p device_id to use for a particular physical runtime \p Device. For example: + * - Some runtimes may support passing in an array of actual `device` specifications, and the + * \p virtual_device_id can be used at runtime as an index into that array. + * - Some runtimes may support dynamically allocating computations to physical devices. On these + * systems a large space of \p virtual_device_ids could be used at compile time, even though + * at runtime only a few physical devices will be present. + * + * The \p virtual_device_id may also be left unconstrained if not needed. + * + * Memory Scopes and Devices + * ------------------------- + * Multi-device systems can have complex memory hierarchies. For example + * \code + * (kDLCPU, 0, "llvm", "global") + * \endcode + * and + * \code + * (kDLCPU, 1, "llvm", "global") + * \endcode + * could denote: + * - The same memory area accessible from two separate CPUs without any CPU affinity; + * - Distinct memory areas in a NUMA architecture for which cross-device access is handled + * by the memory system; + * - Outright distinct memory areas, where one device cannot directly address the memory of + * another. + * + * Similarly: + * \code + * (kDLCPU, 0, "llvm", "global") + * \endcode + * and + * \code + * (kDLCUDA, 0, "cuda", "host") + * \endcode + * could denote the same memory area, but with very different access costs. + * + * Furthermore, not all memory scopes are accessible to all devices, and it is possible for + * a memory scope to only be accessible to a device when code is compiled with particular + * \p Target options. + * + * \p SEScopes themselves have no system-level understanding. Currently device planning will + * simply insert "device_copy" operators wherever \p SEScopes are not exactly pointwise equal. + * We may revisit this in the future as the work on memory pools matures. + * + * Joining and Defaulting + * ---------------------- + * It is possible to 'join' two \p SEScopes to yield the most constrained \p SEScope which agrees + * with both join arguments. Eg: + * \code + * Join((kDLCPU, -1, "llvm", ""), (kInvalidDeviceType, 3, null, "global)) + * => (kDLCPU, 3, "llvm", "global") + * Join((kDLCPU, -1, "llvm", ""), (kInvalidDeviceType, 3, null, "local)) + * => null (no join possible) + * \endcode + * + * Related to 'join' is 'default', which only takes constrained fields from the rhs when the + * lhs is unconstrained: + * \code + * Default(kDLCPU, -1, "llvm", "local"), (kDLCPU, 3, null, "global")) + * => (kDLCPU, 3, "llvm", "local") + * \endcode + * + * These operations are needed during device planning. + * + */ +class SEScopeNode : public Object { + public: + /*! + * \brief The \p DLDeviceType (represtented as an int) of the device. If \p target is known then + * this will be equal to \p target->kind->device_type. If \p target is null then the target is to + * be determined by a later pass. + * + * This is needed to support the legacy "on_device" and "device_copy" calls which only allow + * a \p DLDeviceTypes (as an integer) to be given. + * + * kInvalidDeviceType denotes unconstrained. + */ + int device_type_int; + + DLDeviceType device_type() const { return static_cast<DLDeviceType>(device_type_int); } + + /*! + * \brief The 'virtual' device identifier for the device. This must be resolved to a physical + * device identifier either during compilation or at runtime. + * + * -1 denotes unconstrained. + */ + int virtual_device_id; + + /*! + * \brief The \p Target describing how to compile for the device. + * + * Null denotes unconstrained. Note that if a target later becomes known for this \p SEScope + * then it must be consistent with the \p device_type if that is already known. This is + * enforced by the Join and Default methods. + */ + Target target; + + /*! + * \brief The scope of memory within the device. + * + * Empty denotes unconstrained. + */ + MemoryScope memory_scope; + + /*! + * \brief Returns true if scope is fully unconstrained, ie no target/device type, virtual device + * id or memory scope is specified. + */ + bool is_fully_unconstrained() const { Review comment: mainly because it does not correspond to a member flag fully_unconstrained ########## File path: src/target/se_scope.cc ########## @@ -0,0 +1,224 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +/*! + * \file tvm/target/se_scope.cc + * \brief Implementation of \p SEScope for representing a Storage or Execution scope. + */ +#include <tvm/node/reflection.h> +#include <tvm/runtime/device_api.h> +#include <tvm/target/se_scope.h> + +namespace tvm { + +TVM_REGISTER_NODE_TYPE(SEScopeNode); + +void SEScopeNode::VisitAttrs(AttrVisitor* v) { + v->Visit("device_type_int", &device_type_int); + v->Visit("virtual_device_id", &virtual_device_id); + v->Visit("target", &target); + v->Visit("memory_scope", &memory_scope); +} + +bool SEScopeNode::SEqualReduce(const SEScopeNode* other, SEqualReducer equal) const { + return device_type_int == other->device_type_int && + virtual_device_id == other->virtual_device_id && + // NOTE: Comparing targets by their str representations + target->str() == other->target->str() && memory_scope == other->memory_scope; Review comment: Do we have structural equality on targets? str is a legacy property that can be removed. Left as TODO is OK, consider use json repr instead of str, if a temp solution is needed here. cc @zxybazh who authored the target part and could be useful to followup ########## File path: include/tvm/target/se_scope.h ########## @@ -0,0 +1,333 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +/*! + * \file tvm/target/se_scope.h + * \brief A compile time representation for a Storage or Execution Scope. + */ + +#ifndef TVM_TARGET_SE_SCOPE_H_ +#define TVM_TARGET_SE_SCOPE_H_ + +#include <tvm/ir/transform.h> +#include <tvm/target/target.h> + +#include <string> +#include <unordered_map> +#include <utility> + +namespace tvm { + +/*! + * Abstract label for an area of memory. + * + * Currently uninterpreted and arbitrary. Likely to be replaced by a structured representation + * of a memory pool in the future. Please try to use this alias instead of String to aid future + * code migration. + */ +using MemoryScope = String; + +/*! + * \brief Describes at compile time where data is to be stored down to the device and memory + * scope level, or where execution is to take place, down to the device level. It is a quadruple of: + * - A \p device_type (\p DLDeviceType). May be kInvalidDeviceType if unconstrained. + * - A \p virtual_device_id (\p int). This allows us to distinguish distinct devices + * with the same \p Target, for example in a multi-GPU system. May be -1 if unconstrained. + * See "Virtual Devices" below. + * - A \p target (\p Target) describing how to compile code for the intended device. May be null + * if unconstrained. + * - A \p memory_scope (\p MemoryScope, which is currently just \p String) describing which memory + * area is to be used to hold data. May be "" if unconstrained. See "Memory Scopes and Devices" + * below. + * + * Some or all of these fields may be unconstrained, signaling that device planning is free to + * choose a value consistent with the whole program. However if a \p target is given then the \p + * device_type must equal \p target->kind->device_type. + * + * Note that currently we assume if a function returns its result on a particular device + * then the function body is also executed on that device. See the overview comment in + * src/relay/transforms/device_planner.cc for more details. + * + * By 'data' we include both tensors and additional supporting datastructures such as shapes, + * Relay AST items, Relay tuples, and Relay references. Typically non-tensor data must reside + * on a 'CPU'-like device with good support for scalars. + * + * By 'execution' we include both (fused) primitive operators, and all the Relay expressions + * surrounding them which coordinates data and control flow. Again, typically non-primitive + * operators must be executed on a 'CPU'-like device with good support for control flow. + * + * Targets vs Devices + * ------------------ + * Generally \p Targets (a compile-time only datastructue) describe compiler options for a specific + * microarchitecture and toolchain, while \p Devices (a runtime datastructure also available at + * compile time) describe a physical device on the target system. Obviously the target must agree + * with the device's microarchitecture, but we otherwise don't impose any constraints between them: + * - It's ok to use different \p Targets for the same \p Device, eg to squeeze some extra perf + * out of a particular primitive. + * - It's ok to use the same \p Target for multiple \p Devices, eg if we have multiple CPUs. + * + * Traditionally TVM assumes at most one \p Target per \p DLDeviceType. We are moving away from that + * assumption. + * + * Virtual vs Physical Devices + * --------------------------- + * The \p virtual_device_id may be used by downstream passes or the runtime to help decide which + * \p device_id to use for a particular physical runtime \p Device. For example: + * - Some runtimes may support passing in an array of actual `device` specifications, and the + * \p virtual_device_id can be used at runtime as an index into that array. + * - Some runtimes may support dynamically allocating computations to physical devices. On these + * systems a large space of \p virtual_device_ids could be used at compile time, even though + * at runtime only a few physical devices will be present. + * + * The \p virtual_device_id may also be left unconstrained if not needed. + * + * Memory Scopes and Devices + * ------------------------- + * Multi-device systems can have complex memory hierarchies. For example + * \code + * (kDLCPU, 0, "llvm", "global") + * \endcode + * and + * \code + * (kDLCPU, 1, "llvm", "global") + * \endcode + * could denote: + * - The same memory area accessible from two separate CPUs without any CPU affinity; + * - Distinct memory areas in a NUMA architecture for which cross-device access is handled + * by the memory system; + * - Outright distinct memory areas, where one device cannot directly address the memory of + * another. + * + * Similarly: + * \code + * (kDLCPU, 0, "llvm", "global") + * \endcode + * and + * \code + * (kDLCUDA, 0, "cuda", "host") + * \endcode + * could denote the same memory area, but with very different access costs. + * + * Furthermore, not all memory scopes are accessible to all devices, and it is possible for + * a memory scope to only be accessible to a device when code is compiled with particular + * \p Target options. + * + * \p SEScopes themselves have no system-level understanding. Currently device planning will + * simply insert "device_copy" operators wherever \p SEScopes are not exactly pointwise equal. + * We may revisit this in the future as the work on memory pools matures. + * + * Joining and Defaulting + * ---------------------- + * It is possible to 'join' two \p SEScopes to yield the most constrained \p SEScope which agrees + * with both join arguments. Eg: + * \code + * Join((kDLCPU, -1, "llvm", ""), (kInvalidDeviceType, 3, null, "global)) + * => (kDLCPU, 3, "llvm", "global") + * Join((kDLCPU, -1, "llvm", ""), (kInvalidDeviceType, 3, null, "local)) + * => null (no join possible) + * \endcode + * + * Related to 'join' is 'default', which only takes constrained fields from the rhs when the + * lhs is unconstrained: + * \code + * Default(kDLCPU, -1, "llvm", "local"), (kDLCPU, 3, null, "global")) + * => (kDLCPU, 3, "llvm", "local") + * \endcode + * + * These operations are needed during device planning. + * + */ +class SEScopeNode : public Object { + public: + /*! + * \brief The \p DLDeviceType (represtented as an int) of the device. If \p target is known then + * this will be equal to \p target->kind->device_type. If \p target is null then the target is to + * be determined by a later pass. + * + * This is needed to support the legacy "on_device" and "device_copy" calls which only allow + * a \p DLDeviceTypes (as an integer) to be given. + * + * kInvalidDeviceType denotes unconstrained. + */ + int device_type_int; + + DLDeviceType device_type() const { return static_cast<DLDeviceType>(device_type_int); } + + /*! + * \brief The 'virtual' device identifier for the device. This must be resolved to a physical + * device identifier either during compilation or at runtime. + * + * -1 denotes unconstrained. + */ + int virtual_device_id; + + /*! + * \brief The \p Target describing how to compile for the device. + * + * Null denotes unconstrained. Note that if a target later becomes known for this \p SEScope + * then it must be consistent with the \p device_type if that is already known. This is + * enforced by the Join and Default methods. + */ + Target target; + + /*! + * \brief The scope of memory within the device. + * + * Empty denotes unconstrained. + */ + MemoryScope memory_scope; + + /*! + * \brief Returns true if scope is fully unconstrained, ie no target/device type, virtual device + * id or memory scope is specified. + */ + bool is_fully_unconstrained() const { + return !target.defined() && device_type() == kInvalidDeviceType && virtual_device_id == -1 && + memory_scope.empty(); + } + + /*! + * \brief Returns true if scope is fully constrained, ie target, virtual device id and + * memory scope are all specified. + */ + bool is_fully_constrained() const { Review comment: CamelCase ########## File path: src/ir/attr_functor.h ########## @@ -105,6 +106,7 @@ class AttrFunctor<R(const ObjectRef& n, Args...)> { virtual R VisitAttr_(const tir::CastNode* op, Args... args) ATTR_FUNCTOR_DEFAULT; virtual R VisitAttr_(const tir::CallNode* op, Args... args) ATTR_FUNCTOR_DEFAULT; virtual R VisitAttr_(const tir::SelectNode* op, Args... args) ATTR_FUNCTOR_DEFAULT; + virtual R VisitAttr_(const SEScopeNode* op, Args... args) ATTR_FUNCTOR_DEFAULT; Review comment: Is it possible to make SEScopeNode a subclass Attrs, in this case we do not need special handling here ########## File path: include/tvm/target/se_scope.h ########## @@ -0,0 +1,333 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +/*! + * \file tvm/target/se_scope.h + * \brief A compile time representation for a Storage or Execution Scope. + */ + +#ifndef TVM_TARGET_SE_SCOPE_H_ +#define TVM_TARGET_SE_SCOPE_H_ + +#include <tvm/ir/transform.h> +#include <tvm/target/target.h> + +#include <string> +#include <unordered_map> +#include <utility> + +namespace tvm { + +/*! + * Abstract label for an area of memory. + * + * Currently uninterpreted and arbitrary. Likely to be replaced by a structured representation + * of a memory pool in the future. Please try to use this alias instead of String to aid future + * code migration. + */ +using MemoryScope = String; + +/*! + * \brief Describes at compile time where data is to be stored down to the device and memory + * scope level, or where execution is to take place, down to the device level. It is a quadruple of: + * - A \p device_type (\p DLDeviceType). May be kInvalidDeviceType if unconstrained. + * - A \p virtual_device_id (\p int). This allows us to distinguish distinct devices + * with the same \p Target, for example in a multi-GPU system. May be -1 if unconstrained. + * See "Virtual Devices" below. + * - A \p target (\p Target) describing how to compile code for the intended device. May be null + * if unconstrained. + * - A \p memory_scope (\p MemoryScope, which is currently just \p String) describing which memory + * area is to be used to hold data. May be "" if unconstrained. See "Memory Scopes and Devices" + * below. + * + * Some or all of these fields may be unconstrained, signaling that device planning is free to + * choose a value consistent with the whole program. However if a \p target is given then the \p + * device_type must equal \p target->kind->device_type. + * + * Note that currently we assume if a function returns its result on a particular device + * then the function body is also executed on that device. See the overview comment in + * src/relay/transforms/device_planner.cc for more details. + * + * By 'data' we include both tensors and additional supporting datastructures such as shapes, + * Relay AST items, Relay tuples, and Relay references. Typically non-tensor data must reside + * on a 'CPU'-like device with good support for scalars. + * + * By 'execution' we include both (fused) primitive operators, and all the Relay expressions + * surrounding them which coordinates data and control flow. Again, typically non-primitive + * operators must be executed on a 'CPU'-like device with good support for control flow. + * + * Targets vs Devices + * ------------------ + * Generally \p Targets (a compile-time only datastructue) describe compiler options for a specific + * microarchitecture and toolchain, while \p Devices (a runtime datastructure also available at + * compile time) describe a physical device on the target system. Obviously the target must agree + * with the device's microarchitecture, but we otherwise don't impose any constraints between them: + * - It's ok to use different \p Targets for the same \p Device, eg to squeeze some extra perf + * out of a particular primitive. + * - It's ok to use the same \p Target for multiple \p Devices, eg if we have multiple CPUs. + * + * Traditionally TVM assumes at most one \p Target per \p DLDeviceType. We are moving away from that + * assumption. + * + * Virtual vs Physical Devices + * --------------------------- + * The \p virtual_device_id may be used by downstream passes or the runtime to help decide which + * \p device_id to use for a particular physical runtime \p Device. For example: + * - Some runtimes may support passing in an array of actual `device` specifications, and the + * \p virtual_device_id can be used at runtime as an index into that array. + * - Some runtimes may support dynamically allocating computations to physical devices. On these + * systems a large space of \p virtual_device_ids could be used at compile time, even though + * at runtime only a few physical devices will be present. + * + * The \p virtual_device_id may also be left unconstrained if not needed. + * + * Memory Scopes and Devices + * ------------------------- + * Multi-device systems can have complex memory hierarchies. For example + * \code + * (kDLCPU, 0, "llvm", "global") + * \endcode + * and + * \code + * (kDLCPU, 1, "llvm", "global") + * \endcode + * could denote: + * - The same memory area accessible from two separate CPUs without any CPU affinity; + * - Distinct memory areas in a NUMA architecture for which cross-device access is handled + * by the memory system; + * - Outright distinct memory areas, where one device cannot directly address the memory of + * another. + * + * Similarly: + * \code + * (kDLCPU, 0, "llvm", "global") + * \endcode + * and + * \code + * (kDLCUDA, 0, "cuda", "host") + * \endcode + * could denote the same memory area, but with very different access costs. + * + * Furthermore, not all memory scopes are accessible to all devices, and it is possible for + * a memory scope to only be accessible to a device when code is compiled with particular + * \p Target options. + * + * \p SEScopes themselves have no system-level understanding. Currently device planning will + * simply insert "device_copy" operators wherever \p SEScopes are not exactly pointwise equal. + * We may revisit this in the future as the work on memory pools matures. + * + * Joining and Defaulting + * ---------------------- + * It is possible to 'join' two \p SEScopes to yield the most constrained \p SEScope which agrees + * with both join arguments. Eg: + * \code + * Join((kDLCPU, -1, "llvm", ""), (kInvalidDeviceType, 3, null, "global)) + * => (kDLCPU, 3, "llvm", "global") + * Join((kDLCPU, -1, "llvm", ""), (kInvalidDeviceType, 3, null, "local)) + * => null (no join possible) + * \endcode + * + * Related to 'join' is 'default', which only takes constrained fields from the rhs when the + * lhs is unconstrained: + * \code + * Default(kDLCPU, -1, "llvm", "local"), (kDLCPU, 3, null, "global")) + * => (kDLCPU, 3, "llvm", "local") + * \endcode + * + * These operations are needed during device planning. + * + */ +class SEScopeNode : public Object { + public: + /*! + * \brief The \p DLDeviceType (represtented as an int) of the device. If \p target is known then + * this will be equal to \p target->kind->device_type. If \p target is null then the target is to + * be determined by a later pass. + * + * This is needed to support the legacy "on_device" and "device_copy" calls which only allow + * a \p DLDeviceTypes (as an integer) to be given. + * + * kInvalidDeviceType denotes unconstrained. + */ + int device_type_int; + + DLDeviceType device_type() const { return static_cast<DLDeviceType>(device_type_int); } + + /*! + * \brief The 'virtual' device identifier for the device. This must be resolved to a physical + * device identifier either during compilation or at runtime. + * + * -1 denotes unconstrained. + */ + int virtual_device_id; + + /*! + * \brief The \p Target describing how to compile for the device. + * + * Null denotes unconstrained. Note that if a target later becomes known for this \p SEScope + * then it must be consistent with the \p device_type if that is already known. This is + * enforced by the Join and Default methods. + */ + Target target; + + /*! + * \brief The scope of memory within the device. + * + * Empty denotes unconstrained. + */ + MemoryScope memory_scope; + + /*! + * \brief Returns true if scope is fully unconstrained, ie no target/device type, virtual device + * id or memory scope is specified. + */ + bool is_fully_unconstrained() const { + return !target.defined() && device_type() == kInvalidDeviceType && virtual_device_id == -1 && + memory_scope.empty(); + } + + /*! + * \brief Returns true if scope is fully constrained, ie target, virtual device id and + * memory scope are all specified. + */ + bool is_fully_constrained() const { + return target.defined() && virtual_device_id != -1 && !memory_scope.empty(); + } + + Device ToDevice() const { Review comment: Document the behavior, note that the device id is virtual device api and may not corresponds to the real mappings(if any) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
