compute_inline steps

GitBox Fri, 17 Jul 2020 09:42:15 -0700


merrymercy commented on a change in pull request #6073:
URL: https://github.com/apache/incubator-tvm/pull/6073#discussion_r456552575




##########
File path: tests/python/unittest/test_auto_scheduler_loop_state.py
##########
@@ -61,5 +61,79 @@ def test_split_fuse_reorder():
     assert s1[C].iters[4].range.extent == 8
     assert s1[C].iters[5].range.extent == 2
 
+    s1.parallel(C, j1)
+    s1.unroll(C, j2)
+    s1.vectorize(C, j3)
+    s1.bind(C, i1, "blockIdx.x")
+    s1.bind(C, i2, "vthread")
+    s1.bind(C, i3, "threadIdx.y")
+
+
+def test_compute_at_root_inline():
+    dag = auto_scheduler.ComputeDAG(conv2d_nchw_bn_relu(1, 224, 224, 3, 64, 7, 
2, 3))
+    s0 = dag.get_init_state()
+
+    # data, padding, kernel = 0, 1, 2
+    conv = s0.stage_ops[3]
+    # bias = 4
+    bias_add = s0.stage_ops[5]
+    # bn_scale = 6
+    bn_mul = s0.stage_ops[7]
+    # bn_offset = 8
+    bn_add = s0.stage_ops[9]
+    relu = s0.stage_ops[10]
+
+    s0.compute_inline(bn_add)
+    s0.compute_inline(bn_mul)
+    s0.compute_inline(bias_add)
+    s0.compute_at(conv, relu, s0[relu].iters[2])
+    print(s0)
+    assert str(s0) == \

Review comment:
       I have no idea either

##########
File path: python/tvm/auto_scheduler/loop_state.py
##########
@@ -119,9 +137,61 @@ def reorder(self, stage, order):
         order : List[Iterator]
             Iterators in the expected order.
         """
-        stage_id = self._resolve_stage_id(stage)
+        self.state_object = _ffi_api.StateReorder(self.state_object, 
self._resolve_stage_id(stage),
+                                                  order)
+
+    def compute_at(self, stage, target_stage, target_iter):
+        """ Schedule primitive corresponds to te.compute_at.
+
+        Parameters
+        ----------
+        stage : Union[int, Operation, Tensor]
+            The Stage to be compute at, can be a Stage order index, Stage 
operation or stage
+            output tensor.
+        target_stage : Union[int, Operation, Tensor]
+            The target stage of compute_at, can be a Stage order index, Stage 
operation or stage
+            output tensor.
+        target_iter : Iterator
+            The target Iterator of compute_at.
+
+        Notes
+        -----
+        After compute_at, the extent of each iterator may not be accurate any 
more, so the bound
+        information will be removed from this state. Run 
ComputeDAG::InferBound to recover.

Review comment:
       ```suggestion
           After compute_at, we need careful dependency analysis to compute the 
accurate bound information.
           However, it is relatively expensive and complicated. So in 
LoopState, we just fill "None" as bound 
           for the newly created iterators. If you do need the bound, you can 
call ComputeDAG::InferBound on
           the returned state to get all bound information.
   ```
   
   Please propagate this change.

##########
File path: python/tvm/auto_scheduler/loop_state.py
##########
@@ -161,16 +235,116 @@ def fuse(self, stage, iters):
             The Stage to be fused, can be a Stage order index, Stage operation 
or stage
             output tensor.
         iters : List[Iterator]
-            The iterators to be fused
+            The iterators to be fused.
+
+        Returns
+        -------
+        res_it : Iterator
+            The fused Iterator.
+
+        Notes
+        -----
+        If the iterators to be fused have stages attached at them(by 
compute_at), the fused
+        result will become the new attach point.
+        """
+        self.state_object, res = _ffi_api.StateFuse(self.state_object,
+                                                    
self._resolve_stage_id(stage), iters)
+        return res
+
+    def vectorize(self, stage, iterator):
+        """ Schedule primitive corresponds to te.vectorize.
+
+        Parameters
+        ----------
+        stage : Union[int, Operation, Tensor]
+            The Stage to be vectorized, can be a Stage order index, Stage 
operation or stage

Review comment:
       ```suggestion
               The Stage to be vectorized, which can be specified by 
               the integer index, Operation, or output tensor of the stage.
   ```
   Propagate this modification to other comments as well.

##########
File path: tests/python/unittest/test_auto_scheduler_measure.py
##########
@@ -18,17 +18,55 @@
 """ Test measurement and log serialization. """
 
 import tvm
-from tvm import auto_scheduler
+import topi
+from tvm import te, auto_scheduler
 import tempfile
 
 from test_auto_scheduler_common import get_tiled_matmul
 
 
 def test_record():
-    dag, s = get_tiled_matmul()
-
     if not tvm.runtime.enabled("llvm"):
         return
+
+    A = te.placeholder((512, 512), name='A')
+    B = te.placeholder((512, 512), name='B')
+    k = te.reduce_axis((0, 512), name='k')
+    C = te.compute((512, 512), lambda i, j: te.sum(A[i][k] * B[k][j], 
axis=[k]), name='C')
+    D = topi.nn.relu(C)
+    k = te.reduce_axis((0, 512), name='k')
+    E = te.compute((512, 512), lambda i, j: te.sum(A[i][k] * D[k][j], 
axis=[k]), name='C')
+    F = topi.nn.relu(E)
+
+    dag = auto_scheduler.ComputeDAG([A, B, F])
+    s = dag.get_init_state()
+
+    # Split
+    its0 = s.split(C, s[C].iters[0], [4, 8, 8])
+    its1 = s.split(C, s[C].iters[4], [8, 4, 4])
+    # Reorder
+    s.reorder(C, [its0[0], its1[0], its0[1], its1[1], its0[2], its1[2], 
its0[3], s[C].iters[8],
+                  its1[3]])
+    # Fuse
+    s.fuse(C, [s[C].iters[0], s[C].iters[1], s[C].iters[2]])
+    # Compute at
+    s.split(F, s[F].iters[0], [2])
+    s.compute_at(E, F, s[F].iters[0])
+    # Compute inline
+    s.compute_inline(D)
+    # Compute root
+    s.compute_root(D)
+    # Parallel
+    s.parallel(C, s[C].iters[0])
+    # Thread bind
+    s.bind(C, s[C].iters[1], "blockIdx.x")

Review comment:
       Using thread binding on a llvm target is confusing to me. Although the 
check is correct.

##########
File path: src/auto_scheduler/compute_dag.cc
##########
@@ -276,10 +276,18 @@ std::pair<te::Schedule, Array<te::Tensor>> 
ComputeDAG::ApplySteps(
     // return value, so the ApplyToSchedule is not able to be merged to single 
interface
     if (auto ps = step.as<ReorderStepNode>()) {
       ps->ApplyToSchedule(stages, stage_to_axes);
+    } else if (auto ps = step.as<ComputeAtStepNode>()) {

Review comment:
       Why do you use a different order from the order in our internal repo?
   I propose we can use two orders
   1.  Alphabetical order
   2. Logical order from easy to hard.
      - primitives working on one stage: reorder, split, fuse, annotation
       - primitives working on multiple stages: compute_at, compute_root, 
compute_inline
      - primitive adding new stages: cache_read, cache_write, ...
   
   And we should keep the same order in all places.

##########
File path: src/auto_scheduler/measure_record.cc
##########
@@ -169,6 +206,18 @@ struct Handler<::tvm::Array<::tvm::auto_scheduler::Step>> {
           fused_ids.push_back(i);
         }
         data->push_back(::tvm::auto_scheduler::FuseStep(stage_id, fused_ids));
+      } else if (name == "AN") {

Review comment:
       I agree with your points.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-tvm] merrymercy commented on a change in pull request #6073: [Ansor][AutoTVM v2.0] Phase 1: Add annotation/compute_at/compute_root/compute_inline steps

Reply via email to