[GitHub] [tvm] tqchen commented on a change in pull request #9313: Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning.

GitBox Sun, 31 Oct 2021 08:43:21 -0700


tqchen commented on a change in pull request #9313:
URL: https://github.com/apache/tvm/pull/9313#discussion_r739827358




##########
File path: include/tvm/target/se_scope.h
##########
@@ -0,0 +1,333 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file tvm/target/se_scope.h
+ * \brief A compile time representation for a Storage or Execution Scope.
+ */
+
+#ifndef TVM_TARGET_SE_SCOPE_H_
+#define TVM_TARGET_SE_SCOPE_H_
+
+#include <tvm/ir/transform.h>
+#include <tvm/target/target.h>
+
+#include <string>
+#include <unordered_map>
+#include <utility>
+
+namespace tvm {
+
+/*!
+ * Abstract label for an area of memory.
+ *
+ * Currently uninterpreted and arbitrary. Likely to be replaced by a 
structured representation
+ * of a memory pool in the future. Please try to use this alias instead of 
String to aid future
+ * code migration.
+ */
+using MemoryScope = String;
+
+/*!
+ * \brief Describes at compile time where data is to be stored down to the 
device and memory
+ * scope level, or where execution is to take place, down to the device level. 
It is a quadruple of:
+ * - A \p device_type (\p DLDeviceType). May be kInvalidDeviceType if 
unconstrained.
+ * - A \p virtual_device_id (\p int). This allows us to distinguish distinct 
devices
+ *   with the same \p Target, for example in a multi-GPU system. May be -1 if 
unconstrained.
+ *   See "Virtual Devices" below.
+ * - A \p target (\p Target) describing how to compile code for the intended 
device. May be null
+ *   if unconstrained.
+ * - A \p memory_scope (\p MemoryScope, which is currently just \p String) 
describing which memory
+ *   area is to be used to hold data. May be "" if unconstrained. See "Memory 
Scopes and Devices"
+ *   below.
+ *
+ * Some or all of these fields may be unconstrained, signaling that device 
planning is free to
+ * choose a value consistent with the whole program. However if a \p target is 
given then the \p
+ * device_type must equal \p target->kind->device_type.
+ *
+ * Note that currently we assume if a function returns its result on a 
particular device
+ * then the function body is also executed on that device. See the overview 
comment in
+ * src/relay/transforms/device_planner.cc for more details.
+ *
+ * By 'data' we include both tensors and additional supporting datastructures 
such as shapes,
+ * Relay AST items, Relay tuples, and Relay references. Typically non-tensor 
data must reside
+ * on a 'CPU'-like device with good support for scalars.
+ *
+ * By 'execution' we include both (fused) primitive operators, and all the 
Relay expressions
+ * surrounding them which coordinates data and control flow. Again, typically 
non-primitive
+ * operators must be executed on a 'CPU'-like device with good support for 
control flow.
+ *
+ * Targets vs Devices
+ * ------------------
+ * Generally \p Targets (a compile-time only datastructue) describe compiler 
options for a specific
+ * microarchitecture and toolchain, while \p Devices (a runtime datastructure 
also available at
+ * compile time) describe a physical device on the target system. Obviously 
the target must agree
+ * with the device's microarchitecture, but we otherwise don't impose any 
constraints between them:
+ *  - It's ok to use different \p Targets for the same \p Device, eg to 
squeeze some extra perf
+ *    out of a particular primitive.
+ *  - It's ok to use the same \p Target for multiple \p Devices, eg if we have 
multiple CPUs.
+ *
+ * Traditionally TVM assumes at most one \p Target per \p DLDeviceType. We are 
moving away from that
+ * assumption.
+ *
+ * Virtual vs Physical Devices
+ * ---------------------------
+ * The \p virtual_device_id may be used by downstream passes or the runtime to 
help decide which
+ * \p device_id to use for a particular physical runtime \p Device. For 
example:
+ *  - Some runtimes may support passing in an array of actual `device` 
specifications, and the
+ *    \p virtual_device_id can be used at runtime as an index into that array.
+ *  - Some runtimes may support dynamically allocating computations to 
physical devices. On these
+ *    systems a large space of \p virtual_device_ids could be used at compile 
time, even though
+ *    at runtime only a few physical devices will be present.
+ *
+ * The \p virtual_device_id may also be left unconstrained if not needed.
+ *
+ * Memory Scopes and Devices
+ * -------------------------
+ * Multi-device systems can have complex memory hierarchies. For example
+ * \code
+ * (kDLCPU, 0, "llvm", "global")
+ * \endcode
+ * and
+ * \code
+ * (kDLCPU, 1, "llvm", "global")
+ * \endcode
+ * could denote:
+ * - The same memory area accessible from two separate CPUs without any CPU 
affinity;
+ * - Distinct memory areas in a NUMA architecture for which cross-device 
access is handled
+ *   by the memory system;
+ * - Outright distinct memory areas, where one device cannot directly address 
the memory of
+ *   another.
+ *
+ * Similarly:
+ * \code
+ * (kDLCPU, 0, "llvm", "global")
+ * \endcode
+ * and
+ * \code
+ * (kDLCUDA, 0, "cuda", "host")
+ * \endcode
+ * could denote the same memory area, but with very different access costs.
+ *
+ * Furthermore, not all memory scopes are accessible to all devices, and it is 
possible for
+ * a memory scope to only be accessible to a device when code is compiled with 
particular
+ * \p Target options.
+ *
+ * \p SEScopes themselves have no system-level understanding. Currently device 
planning will
+ * simply insert "device_copy" operators wherever \p SEScopes are not exactly 
pointwise equal.
+ * We may revisit this in the future as the work on memory pools matures.
+ *
+ * Joining and Defaulting
+ * ----------------------
+ * It is possible to 'join' two \p SEScopes to yield the most constrained \p 
SEScope which agrees
+ * with both join arguments. Eg:
+ * \code
+ * Join((kDLCPU, -1, "llvm", ""), (kInvalidDeviceType, 3, null, "global))
+ *   => (kDLCPU, 3, "llvm", "global")
+ * Join((kDLCPU, -1, "llvm", ""), (kInvalidDeviceType, 3, null, "local))
+ *   => null (no join possible)
+ * \endcode
+ *
+ * Related to 'join' is 'default', which only takes constrained fields from 
the rhs when the
+ * lhs is unconstrained:
+ * \code
+ * Default(kDLCPU, -1, "llvm", "local"), (kDLCPU, 3, null, "global"))
+ *   => (kDLCPU, 3, "llvm", "local")
+ * \endcode
+ *
+ * These operations are needed during device planning.
+ *
+ */
+class SEScopeNode : public Object {
+ public:
+  /*!
+   * \brief The \p DLDeviceType (represtented as an int) of the device. If \p 
target is known then
+   * this will be equal to \p target->kind->device_type. If \p target is null 
then the target is to
+   * be determined by a later pass.
+   *
+   * This is needed to support the legacy "on_device" and "device_copy" calls 
which only allow
+   * a \p DLDeviceTypes (as an integer) to be given.
+   *
+   * kInvalidDeviceType denotes unconstrained.
+   */
+  int device_type_int;
+
+  DLDeviceType device_type() const { return 
static_cast<DLDeviceType>(device_type_int); }
+
+  /*!
+   * \brief The 'virtual' device identifier for the device. This must be 
resolved to a physical
+   * device identifier either during compilation or at runtime.
+   *
+   * -1 denotes unconstrained.
+   */
+  int virtual_device_id;
+
+  /*!
+   * \brief The \p Target describing how to compile for the device.
+   *
+   * Null denotes unconstrained. Note that if a target later becomes known for 
this \p SEScope
+   * then it must be consistent with the \p device_type if that is already 
known. This is
+   * enforced by the Join and Default methods.
+   */
+  Target target;
+
+  /*!
+   * \brief The scope of memory within the device.
+   *
+   * Empty denotes unconstrained.
+   */
+  MemoryScope memory_scope;
+
+  /*!
+   * \brief Returns true if scope is fully unconstrained, ie no target/device 
type, virtual device
+   * id or memory scope is specified.
+   */
+  bool is_fully_unconstrained() const {

Review comment:
       IsFullyUnconstrained

##########
File path: include/tvm/target/compilation_config.h
##########
@@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file tvm/target/compilation_config.h
+ * \brief A helper class to collect all the targets in canonical form 
necessary for compilation.

Review comment:
       Do not need to address in this PR given this is still experimental, but 
would be useful to have a followup discussion on:
   
   - A0: rely on most configs in PassContext and IRModule attachment(target 
constraint of each functions).
   - A1: centralize options in a single structure.
   
   We will need to think about strategies in A0 and A1 and how do they interact 
with each other. 
   
   If we are building a fixed function, closed box toolkit, then A1 is usually 
sufficient. 
   
   In our case, to enable open box philosophy, we need to consider cases where 
constraints are pre-populated by passes not written by us(e.g. BYOC to CUDA 
that only works for cuda), and iterative refinement over the process.  In that 
case, we want IRModule to be self-sufficient for constraints that are already 
populated, and make followup build function respect them.
   
   

##########
File path: src/target/se_scope.cc
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file tvm/target/se_scope.cc
+ * \brief Implementation of \p SEScope for representing a Storage or Execution 
scope.
+ */
+#include <tvm/node/reflection.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/target/se_scope.h>
+
+namespace tvm {
+
+TVM_REGISTER_NODE_TYPE(SEScopeNode);
+
+void SEScopeNode::VisitAttrs(AttrVisitor* v) {
+  v->Visit("device_type_int", &device_type_int);
+  v->Visit("virtual_device_id", &virtual_device_id);
+  v->Visit("target", &target);
+  v->Visit("memory_scope", &memory_scope);
+}
+
+bool SEScopeNode::SEqualReduce(const SEScopeNode* other, SEqualReducer equal) 
const {
+  return device_type_int == other->device_type_int &&
+         virtual_device_id == other->virtual_device_id &&
+         // NOTE: Comparing targets by their str representations
+         target->str() == other->target->str() && memory_scope == 
other->memory_scope;
+}
+
+void SEScopeNode::SHashReduce(SHashReducer hash_reduce) const {
+  hash_reduce(device_type_int);
+  hash_reduce(virtual_device_id);
+  // NOTE: Reducing target to its str representation
+  hash_reduce(target->str());

Review comment:
       structural hash on target? cc @zxybazh. Add a TODO is OK, confirm if str 
is a legacy property that can be removed.

##########
File path: include/tvm/target/se_scope.h
##########
@@ -0,0 +1,333 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file tvm/target/se_scope.h
+ * \brief A compile time representation for a Storage or Execution Scope.
+ */
+
+#ifndef TVM_TARGET_SE_SCOPE_H_
+#define TVM_TARGET_SE_SCOPE_H_
+
+#include <tvm/ir/transform.h>
+#include <tvm/target/target.h>
+
+#include <string>
+#include <unordered_map>
+#include <utility>
+
+namespace tvm {
+
+/*!
+ * Abstract label for an area of memory.
+ *
+ * Currently uninterpreted and arbitrary. Likely to be replaced by a 
structured representation
+ * of a memory pool in the future. Please try to use this alias instead of 
String to aid future
+ * code migration.
+ */
+using MemoryScope = String;
+
+/*!
+ * \brief Describes at compile time where data is to be stored down to the 
device and memory
+ * scope level, or where execution is to take place, down to the device level. 
It is a quadruple of:
+ * - A \p device_type (\p DLDeviceType). May be kInvalidDeviceType if 
unconstrained.
+ * - A \p virtual_device_id (\p int). This allows us to distinguish distinct 
devices
+ *   with the same \p Target, for example in a multi-GPU system. May be -1 if 
unconstrained.
+ *   See "Virtual Devices" below.
+ * - A \p target (\p Target) describing how to compile code for the intended 
device. May be null
+ *   if unconstrained.
+ * - A \p memory_scope (\p MemoryScope, which is currently just \p String) 
describing which memory
+ *   area is to be used to hold data. May be "" if unconstrained. See "Memory 
Scopes and Devices"
+ *   below.
+ *
+ * Some or all of these fields may be unconstrained, signaling that device 
planning is free to
+ * choose a value consistent with the whole program. However if a \p target is 
given then the \p
+ * device_type must equal \p target->kind->device_type.
+ *
+ * Note that currently we assume if a function returns its result on a 
particular device
+ * then the function body is also executed on that device. See the overview 
comment in
+ * src/relay/transforms/device_planner.cc for more details.
+ *
+ * By 'data' we include both tensors and additional supporting datastructures 
such as shapes,
+ * Relay AST items, Relay tuples, and Relay references. Typically non-tensor 
data must reside
+ * on a 'CPU'-like device with good support for scalars.
+ *
+ * By 'execution' we include both (fused) primitive operators, and all the 
Relay expressions
+ * surrounding them which coordinates data and control flow. Again, typically 
non-primitive
+ * operators must be executed on a 'CPU'-like device with good support for 
control flow.
+ *
+ * Targets vs Devices
+ * ------------------
+ * Generally \p Targets (a compile-time only datastructue) describe compiler 
options for a specific
+ * microarchitecture and toolchain, while \p Devices (a runtime datastructure 
also available at
+ * compile time) describe a physical device on the target system. Obviously 
the target must agree
+ * with the device's microarchitecture, but we otherwise don't impose any 
constraints between them:
+ *  - It's ok to use different \p Targets for the same \p Device, eg to 
squeeze some extra perf
+ *    out of a particular primitive.
+ *  - It's ok to use the same \p Target for multiple \p Devices, eg if we have 
multiple CPUs.
+ *
+ * Traditionally TVM assumes at most one \p Target per \p DLDeviceType. We are 
moving away from that
+ * assumption.
+ *
+ * Virtual vs Physical Devices
+ * ---------------------------
+ * The \p virtual_device_id may be used by downstream passes or the runtime to 
help decide which
+ * \p device_id to use for a particular physical runtime \p Device. For 
example:
+ *  - Some runtimes may support passing in an array of actual `device` 
specifications, and the
+ *    \p virtual_device_id can be used at runtime as an index into that array.
+ *  - Some runtimes may support dynamically allocating computations to 
physical devices. On these
+ *    systems a large space of \p virtual_device_ids could be used at compile 
time, even though
+ *    at runtime only a few physical devices will be present.
+ *
+ * The \p virtual_device_id may also be left unconstrained if not needed.
+ *
+ * Memory Scopes and Devices
+ * -------------------------
+ * Multi-device systems can have complex memory hierarchies. For example
+ * \code
+ * (kDLCPU, 0, "llvm", "global")
+ * \endcode
+ * and
+ * \code
+ * (kDLCPU, 1, "llvm", "global")
+ * \endcode
+ * could denote:
+ * - The same memory area accessible from two separate CPUs without any CPU 
affinity;
+ * - Distinct memory areas in a NUMA architecture for which cross-device 
access is handled
+ *   by the memory system;
+ * - Outright distinct memory areas, where one device cannot directly address 
the memory of
+ *   another.
+ *
+ * Similarly:
+ * \code
+ * (kDLCPU, 0, "llvm", "global")
+ * \endcode
+ * and
+ * \code
+ * (kDLCUDA, 0, "cuda", "host")
+ * \endcode
+ * could denote the same memory area, but with very different access costs.
+ *
+ * Furthermore, not all memory scopes are accessible to all devices, and it is 
possible for
+ * a memory scope to only be accessible to a device when code is compiled with 
particular
+ * \p Target options.
+ *
+ * \p SEScopes themselves have no system-level understanding. Currently device 
planning will
+ * simply insert "device_copy" operators wherever \p SEScopes are not exactly 
pointwise equal.
+ * We may revisit this in the future as the work on memory pools matures.
+ *
+ * Joining and Defaulting
+ * ----------------------
+ * It is possible to 'join' two \p SEScopes to yield the most constrained \p 
SEScope which agrees
+ * with both join arguments. Eg:
+ * \code
+ * Join((kDLCPU, -1, "llvm", ""), (kInvalidDeviceType, 3, null, "global))
+ *   => (kDLCPU, 3, "llvm", "global")
+ * Join((kDLCPU, -1, "llvm", ""), (kInvalidDeviceType, 3, null, "local))
+ *   => null (no join possible)
+ * \endcode
+ *
+ * Related to 'join' is 'default', which only takes constrained fields from 
the rhs when the
+ * lhs is unconstrained:
+ * \code
+ * Default(kDLCPU, -1, "llvm", "local"), (kDLCPU, 3, null, "global"))
+ *   => (kDLCPU, 3, "llvm", "local")
+ * \endcode
+ *
+ * These operations are needed during device planning.
+ *
+ */
+class SEScopeNode : public Object {
+ public:
+  /*!
+   * \brief The \p DLDeviceType (represtented as an int) of the device. If \p 
target is known then
+   * this will be equal to \p target->kind->device_type. If \p target is null 
then the target is to
+   * be determined by a later pass.
+   *
+   * This is needed to support the legacy "on_device" and "device_copy" calls 
which only allow
+   * a \p DLDeviceTypes (as an integer) to be given.
+   *
+   * kInvalidDeviceType denotes unconstrained.
+   */
+  int device_type_int;
+
+  DLDeviceType device_type() const { return 
static_cast<DLDeviceType>(device_type_int); }
+
+  /*!
+   * \brief The 'virtual' device identifier for the device. This must be 
resolved to a physical
+   * device identifier either during compilation or at runtime.
+   *
+   * -1 denotes unconstrained.
+   */
+  int virtual_device_id;
+
+  /*!
+   * \brief The \p Target describing how to compile for the device.
+   *
+   * Null denotes unconstrained. Note that if a target later becomes known for 
this \p SEScope
+   * then it must be consistent with the \p device_type if that is already 
known. This is
+   * enforced by the Join and Default methods.
+   */
+  Target target;
+
+  /*!
+   * \brief The scope of memory within the device.
+   *
+   * Empty denotes unconstrained.
+   */
+  MemoryScope memory_scope;
+
+  /*!
+   * \brief Returns true if scope is fully unconstrained, ie no target/device 
type, virtual device
+   * id or memory scope is specified.
+   */
+  bool is_fully_unconstrained() const {

Review comment:
       mainly because it does not correspond to a member flag 
fully_unconstrained

##########
File path: src/target/se_scope.cc
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file tvm/target/se_scope.cc
+ * \brief Implementation of \p SEScope for representing a Storage or Execution 
scope.
+ */
+#include <tvm/node/reflection.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/target/se_scope.h>
+
+namespace tvm {
+
+TVM_REGISTER_NODE_TYPE(SEScopeNode);
+
+void SEScopeNode::VisitAttrs(AttrVisitor* v) {
+  v->Visit("device_type_int", &device_type_int);
+  v->Visit("virtual_device_id", &virtual_device_id);
+  v->Visit("target", &target);
+  v->Visit("memory_scope", &memory_scope);
+}
+
+bool SEScopeNode::SEqualReduce(const SEScopeNode* other, SEqualReducer equal) 
const {
+  return device_type_int == other->device_type_int &&
+         virtual_device_id == other->virtual_device_id &&
+         // NOTE: Comparing targets by their str representations
+         target->str() == other->target->str() && memory_scope == 
other->memory_scope;

Review comment:
       Do we have structural equality on targets? str is a legacy property that 
can be removed. Left as TODO is OK, consider use json repr instead of str, if a 
temp solution is needed here.
   
   cc @zxybazh who authored the target part and could be useful to followup

##########
File path: include/tvm/target/se_scope.h
##########
@@ -0,0 +1,333 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file tvm/target/se_scope.h
+ * \brief A compile time representation for a Storage or Execution Scope.
+ */
+
+#ifndef TVM_TARGET_SE_SCOPE_H_
+#define TVM_TARGET_SE_SCOPE_H_
+
+#include <tvm/ir/transform.h>
+#include <tvm/target/target.h>
+
+#include <string>
+#include <unordered_map>
+#include <utility>
+
+namespace tvm {
+
+/*!
+ * Abstract label for an area of memory.
+ *
+ * Currently uninterpreted and arbitrary. Likely to be replaced by a 
structured representation
+ * of a memory pool in the future. Please try to use this alias instead of 
String to aid future
+ * code migration.
+ */
+using MemoryScope = String;
+
+/*!
+ * \brief Describes at compile time where data is to be stored down to the 
device and memory
+ * scope level, or where execution is to take place, down to the device level. 
It is a quadruple of:
+ * - A \p device_type (\p DLDeviceType). May be kInvalidDeviceType if 
unconstrained.
+ * - A \p virtual_device_id (\p int). This allows us to distinguish distinct 
devices
+ *   with the same \p Target, for example in a multi-GPU system. May be -1 if 
unconstrained.
+ *   See "Virtual Devices" below.
+ * - A \p target (\p Target) describing how to compile code for the intended 
device. May be null
+ *   if unconstrained.
+ * - A \p memory_scope (\p MemoryScope, which is currently just \p String) 
describing which memory
+ *   area is to be used to hold data. May be "" if unconstrained. See "Memory 
Scopes and Devices"
+ *   below.
+ *
+ * Some or all of these fields may be unconstrained, signaling that device 
planning is free to
+ * choose a value consistent with the whole program. However if a \p target is 
given then the \p
+ * device_type must equal \p target->kind->device_type.
+ *
+ * Note that currently we assume if a function returns its result on a 
particular device
+ * then the function body is also executed on that device. See the overview 
comment in
+ * src/relay/transforms/device_planner.cc for more details.
+ *
+ * By 'data' we include both tensors and additional supporting datastructures 
such as shapes,
+ * Relay AST items, Relay tuples, and Relay references. Typically non-tensor 
data must reside
+ * on a 'CPU'-like device with good support for scalars.
+ *
+ * By 'execution' we include both (fused) primitive operators, and all the 
Relay expressions
+ * surrounding them which coordinates data and control flow. Again, typically 
non-primitive
+ * operators must be executed on a 'CPU'-like device with good support for 
control flow.
+ *
+ * Targets vs Devices
+ * ------------------
+ * Generally \p Targets (a compile-time only datastructue) describe compiler 
options for a specific
+ * microarchitecture and toolchain, while \p Devices (a runtime datastructure 
also available at
+ * compile time) describe a physical device on the target system. Obviously 
the target must agree
+ * with the device's microarchitecture, but we otherwise don't impose any 
constraints between them:
+ *  - It's ok to use different \p Targets for the same \p Device, eg to 
squeeze some extra perf
+ *    out of a particular primitive.
+ *  - It's ok to use the same \p Target for multiple \p Devices, eg if we have 
multiple CPUs.
+ *
+ * Traditionally TVM assumes at most one \p Target per \p DLDeviceType. We are 
moving away from that
+ * assumption.
+ *
+ * Virtual vs Physical Devices
+ * ---------------------------
+ * The \p virtual_device_id may be used by downstream passes or the runtime to 
help decide which
+ * \p device_id to use for a particular physical runtime \p Device. For 
example:
+ *  - Some runtimes may support passing in an array of actual `device` 
specifications, and the
+ *    \p virtual_device_id can be used at runtime as an index into that array.
+ *  - Some runtimes may support dynamically allocating computations to 
physical devices. On these
+ *    systems a large space of \p virtual_device_ids could be used at compile 
time, even though
+ *    at runtime only a few physical devices will be present.
+ *
+ * The \p virtual_device_id may also be left unconstrained if not needed.
+ *
+ * Memory Scopes and Devices
+ * -------------------------
+ * Multi-device systems can have complex memory hierarchies. For example
+ * \code
+ * (kDLCPU, 0, "llvm", "global")
+ * \endcode
+ * and
+ * \code
+ * (kDLCPU, 1, "llvm", "global")
+ * \endcode
+ * could denote:
+ * - The same memory area accessible from two separate CPUs without any CPU 
affinity;
+ * - Distinct memory areas in a NUMA architecture for which cross-device 
access is handled
+ *   by the memory system;
+ * - Outright distinct memory areas, where one device cannot directly address 
the memory of
+ *   another.
+ *
+ * Similarly:
+ * \code
+ * (kDLCPU, 0, "llvm", "global")
+ * \endcode
+ * and
+ * \code
+ * (kDLCUDA, 0, "cuda", "host")
+ * \endcode
+ * could denote the same memory area, but with very different access costs.
+ *
+ * Furthermore, not all memory scopes are accessible to all devices, and it is 
possible for
+ * a memory scope to only be accessible to a device when code is compiled with 
particular
+ * \p Target options.
+ *
+ * \p SEScopes themselves have no system-level understanding. Currently device 
planning will
+ * simply insert "device_copy" operators wherever \p SEScopes are not exactly 
pointwise equal.
+ * We may revisit this in the future as the work on memory pools matures.
+ *
+ * Joining and Defaulting
+ * ----------------------
+ * It is possible to 'join' two \p SEScopes to yield the most constrained \p 
SEScope which agrees
+ * with both join arguments. Eg:
+ * \code
+ * Join((kDLCPU, -1, "llvm", ""), (kInvalidDeviceType, 3, null, "global))
+ *   => (kDLCPU, 3, "llvm", "global")
+ * Join((kDLCPU, -1, "llvm", ""), (kInvalidDeviceType, 3, null, "local))
+ *   => null (no join possible)
+ * \endcode
+ *
+ * Related to 'join' is 'default', which only takes constrained fields from 
the rhs when the
+ * lhs is unconstrained:
+ * \code
+ * Default(kDLCPU, -1, "llvm", "local"), (kDLCPU, 3, null, "global"))
+ *   => (kDLCPU, 3, "llvm", "local")
+ * \endcode
+ *
+ * These operations are needed during device planning.
+ *
+ */
+class SEScopeNode : public Object {
+ public:
+  /*!
+   * \brief The \p DLDeviceType (represtented as an int) of the device. If \p 
target is known then
+   * this will be equal to \p target->kind->device_type. If \p target is null 
then the target is to
+   * be determined by a later pass.
+   *
+   * This is needed to support the legacy "on_device" and "device_copy" calls 
which only allow
+   * a \p DLDeviceTypes (as an integer) to be given.
+   *
+   * kInvalidDeviceType denotes unconstrained.
+   */
+  int device_type_int;
+
+  DLDeviceType device_type() const { return 
static_cast<DLDeviceType>(device_type_int); }
+
+  /*!
+   * \brief The 'virtual' device identifier for the device. This must be 
resolved to a physical
+   * device identifier either during compilation or at runtime.
+   *
+   * -1 denotes unconstrained.
+   */
+  int virtual_device_id;
+
+  /*!
+   * \brief The \p Target describing how to compile for the device.
+   *
+   * Null denotes unconstrained. Note that if a target later becomes known for 
this \p SEScope
+   * then it must be consistent with the \p device_type if that is already 
known. This is
+   * enforced by the Join and Default methods.
+   */
+  Target target;
+
+  /*!
+   * \brief The scope of memory within the device.
+   *
+   * Empty denotes unconstrained.
+   */
+  MemoryScope memory_scope;
+
+  /*!
+   * \brief Returns true if scope is fully unconstrained, ie no target/device 
type, virtual device
+   * id or memory scope is specified.
+   */
+  bool is_fully_unconstrained() const {
+    return !target.defined() && device_type() == kInvalidDeviceType && 
virtual_device_id == -1 &&
+           memory_scope.empty();
+  }
+
+  /*!
+   * \brief Returns true if scope is fully constrained, ie target, virtual 
device id and
+   * memory scope are all specified.
+   */
+  bool is_fully_constrained() const {

Review comment:
       CamelCase

##########
File path: src/ir/attr_functor.h
##########
@@ -105,6 +106,7 @@ class AttrFunctor<R(const ObjectRef& n, Args...)> {
   virtual R VisitAttr_(const tir::CastNode* op, Args... args) 
ATTR_FUNCTOR_DEFAULT;
   virtual R VisitAttr_(const tir::CallNode* op, Args... args) 
ATTR_FUNCTOR_DEFAULT;
   virtual R VisitAttr_(const tir::SelectNode* op, Args... args) 
ATTR_FUNCTOR_DEFAULT;
+  virtual R VisitAttr_(const SEScopeNode* op, Args... args) 
ATTR_FUNCTOR_DEFAULT;

Review comment:
       Is it possible to make SEScopeNode a subclass Attrs, in this case we do 
not need special handling here 

##########
File path: include/tvm/target/se_scope.h
##########
@@ -0,0 +1,333 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+/*!
+ * \file tvm/target/se_scope.h
+ * \brief A compile time representation for a Storage or Execution Scope.
+ */
+
+#ifndef TVM_TARGET_SE_SCOPE_H_
+#define TVM_TARGET_SE_SCOPE_H_
+
+#include <tvm/ir/transform.h>
+#include <tvm/target/target.h>
+
+#include <string>
+#include <unordered_map>
+#include <utility>
+
+namespace tvm {
+
+/*!
+ * Abstract label for an area of memory.
+ *
+ * Currently uninterpreted and arbitrary. Likely to be replaced by a 
structured representation
+ * of a memory pool in the future. Please try to use this alias instead of 
String to aid future
+ * code migration.
+ */
+using MemoryScope = String;
+
+/*!
+ * \brief Describes at compile time where data is to be stored down to the 
device and memory
+ * scope level, or where execution is to take place, down to the device level. 
It is a quadruple of:
+ * - A \p device_type (\p DLDeviceType). May be kInvalidDeviceType if 
unconstrained.
+ * - A \p virtual_device_id (\p int). This allows us to distinguish distinct 
devices
+ *   with the same \p Target, for example in a multi-GPU system. May be -1 if 
unconstrained.
+ *   See "Virtual Devices" below.
+ * - A \p target (\p Target) describing how to compile code for the intended 
device. May be null
+ *   if unconstrained.
+ * - A \p memory_scope (\p MemoryScope, which is currently just \p String) 
describing which memory
+ *   area is to be used to hold data. May be "" if unconstrained. See "Memory 
Scopes and Devices"
+ *   below.
+ *
+ * Some or all of these fields may be unconstrained, signaling that device 
planning is free to
+ * choose a value consistent with the whole program. However if a \p target is 
given then the \p
+ * device_type must equal \p target->kind->device_type.
+ *
+ * Note that currently we assume if a function returns its result on a 
particular device
+ * then the function body is also executed on that device. See the overview 
comment in
+ * src/relay/transforms/device_planner.cc for more details.
+ *
+ * By 'data' we include both tensors and additional supporting datastructures 
such as shapes,
+ * Relay AST items, Relay tuples, and Relay references. Typically non-tensor 
data must reside
+ * on a 'CPU'-like device with good support for scalars.
+ *
+ * By 'execution' we include both (fused) primitive operators, and all the 
Relay expressions
+ * surrounding them which coordinates data and control flow. Again, typically 
non-primitive
+ * operators must be executed on a 'CPU'-like device with good support for 
control flow.
+ *
+ * Targets vs Devices
+ * ------------------
+ * Generally \p Targets (a compile-time only datastructue) describe compiler 
options for a specific
+ * microarchitecture and toolchain, while \p Devices (a runtime datastructure 
also available at
+ * compile time) describe a physical device on the target system. Obviously 
the target must agree
+ * with the device's microarchitecture, but we otherwise don't impose any 
constraints between them:
+ *  - It's ok to use different \p Targets for the same \p Device, eg to 
squeeze some extra perf
+ *    out of a particular primitive.
+ *  - It's ok to use the same \p Target for multiple \p Devices, eg if we have 
multiple CPUs.
+ *
+ * Traditionally TVM assumes at most one \p Target per \p DLDeviceType. We are 
moving away from that
+ * assumption.
+ *
+ * Virtual vs Physical Devices
+ * ---------------------------
+ * The \p virtual_device_id may be used by downstream passes or the runtime to 
help decide which
+ * \p device_id to use for a particular physical runtime \p Device. For 
example:
+ *  - Some runtimes may support passing in an array of actual `device` 
specifications, and the
+ *    \p virtual_device_id can be used at runtime as an index into that array.
+ *  - Some runtimes may support dynamically allocating computations to 
physical devices. On these
+ *    systems a large space of \p virtual_device_ids could be used at compile 
time, even though
+ *    at runtime only a few physical devices will be present.
+ *
+ * The \p virtual_device_id may also be left unconstrained if not needed.
+ *
+ * Memory Scopes and Devices
+ * -------------------------
+ * Multi-device systems can have complex memory hierarchies. For example
+ * \code
+ * (kDLCPU, 0, "llvm", "global")
+ * \endcode
+ * and
+ * \code
+ * (kDLCPU, 1, "llvm", "global")
+ * \endcode
+ * could denote:
+ * - The same memory area accessible from two separate CPUs without any CPU 
affinity;
+ * - Distinct memory areas in a NUMA architecture for which cross-device 
access is handled
+ *   by the memory system;
+ * - Outright distinct memory areas, where one device cannot directly address 
the memory of
+ *   another.
+ *
+ * Similarly:
+ * \code
+ * (kDLCPU, 0, "llvm", "global")
+ * \endcode
+ * and
+ * \code
+ * (kDLCUDA, 0, "cuda", "host")
+ * \endcode
+ * could denote the same memory area, but with very different access costs.
+ *
+ * Furthermore, not all memory scopes are accessible to all devices, and it is 
possible for
+ * a memory scope to only be accessible to a device when code is compiled with 
particular
+ * \p Target options.
+ *
+ * \p SEScopes themselves have no system-level understanding. Currently device 
planning will
+ * simply insert "device_copy" operators wherever \p SEScopes are not exactly 
pointwise equal.
+ * We may revisit this in the future as the work on memory pools matures.
+ *
+ * Joining and Defaulting
+ * ----------------------
+ * It is possible to 'join' two \p SEScopes to yield the most constrained \p 
SEScope which agrees
+ * with both join arguments. Eg:
+ * \code
+ * Join((kDLCPU, -1, "llvm", ""), (kInvalidDeviceType, 3, null, "global))
+ *   => (kDLCPU, 3, "llvm", "global")
+ * Join((kDLCPU, -1, "llvm", ""), (kInvalidDeviceType, 3, null, "local))
+ *   => null (no join possible)
+ * \endcode
+ *
+ * Related to 'join' is 'default', which only takes constrained fields from 
the rhs when the
+ * lhs is unconstrained:
+ * \code
+ * Default(kDLCPU, -1, "llvm", "local"), (kDLCPU, 3, null, "global"))
+ *   => (kDLCPU, 3, "llvm", "local")
+ * \endcode
+ *
+ * These operations are needed during device planning.
+ *
+ */
+class SEScopeNode : public Object {
+ public:
+  /*!
+   * \brief The \p DLDeviceType (represtented as an int) of the device. If \p 
target is known then
+   * this will be equal to \p target->kind->device_type. If \p target is null 
then the target is to
+   * be determined by a later pass.
+   *
+   * This is needed to support the legacy "on_device" and "device_copy" calls 
which only allow
+   * a \p DLDeviceTypes (as an integer) to be given.
+   *
+   * kInvalidDeviceType denotes unconstrained.
+   */
+  int device_type_int;
+
+  DLDeviceType device_type() const { return 
static_cast<DLDeviceType>(device_type_int); }
+
+  /*!
+   * \brief The 'virtual' device identifier for the device. This must be 
resolved to a physical
+   * device identifier either during compilation or at runtime.
+   *
+   * -1 denotes unconstrained.
+   */
+  int virtual_device_id;
+
+  /*!
+   * \brief The \p Target describing how to compile for the device.
+   *
+   * Null denotes unconstrained. Note that if a target later becomes known for 
this \p SEScope
+   * then it must be consistent with the \p device_type if that is already 
known. This is
+   * enforced by the Join and Default methods.
+   */
+  Target target;
+
+  /*!
+   * \brief The scope of memory within the device.
+   *
+   * Empty denotes unconstrained.
+   */
+  MemoryScope memory_scope;
+
+  /*!
+   * \brief Returns true if scope is fully unconstrained, ie no target/device 
type, virtual device
+   * id or memory scope is specified.
+   */
+  bool is_fully_unconstrained() const {
+    return !target.defined() && device_type() == kInvalidDeviceType && 
virtual_device_id == -1 &&
+           memory_scope.empty();
+  }
+
+  /*!
+   * \brief Returns true if scope is fully constrained, ie target, virtual 
device id and
+   * memory scope are all specified.
+   */
+  bool is_fully_constrained() const {
+    return target.defined() && virtual_device_id != -1 && 
!memory_scope.empty();
+  }
+
+  Device ToDevice() const {

Review comment:
       Document the behavior, note that the device id is virtual device api and 
may not corresponds to the real mappings(if any)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] tqchen commented on a change in pull request #9313: Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning.

Reply via email to