ah-cheng opened a new pull request, #11724:
URL: https://github.com/apache/tvm/pull/11724

   When the attributes `num_splits == 1` of `split` op,  the op do nothing. I 
think it can be optimized away.
   the relay expr shows as below before optimized :
   ```
     %0 = layout_transform(%input, src_layout="NHWC", dst_layout="NCHW") /* 
ty=Tensor[(1, 3, 224, 224), float32] */;
     %1 = layout_transform(%v_param_1, src_layout="HWIO", dst_layout="OIHW") /* 
ty=Tensor[(32, 3, 3, 3), float32] */;
     %2 = expand_dims(%v_param_2, axis=0, num_newaxis=3) /* ty=Tensor[(1, 1, 1, 
32), float32] */;
     %3 = nn.conv2d(%0, %1, strides=[2, 2], padding=[0, 0, 0, 0], channels=32, 
kernel_size=[3, 3]) /* ty=Tensor[(1, 32, 111, 111), float32] */;
     %4 = layout_transform(%2, src_layout="NHWC", dst_layout="NCHW") /* 
ty=Tensor[(1, 32, 1, 1), float32] */;
     %5 = add(%3, %4) /* ty=Tensor[(1, 32, 111, 111), float32] */;
     %6 = nn.relu(%5) /* ty=Tensor[(1, 32, 111, 111), float32] */;
     %7 = layout_transform(%v_param_3, src_layout="HWIO", dst_layout="OIHW") /* 
ty=Tensor[(11, 32, 1, 1), float32] */;
     %8 = expand_dims(%v_param_4, axis=0, num_newaxis=3) /* ty=Tensor[(1, 1, 1, 
11), float32] */;
     %9 = nn.conv2d(%6, %7, padding=[0, 0, 0, 0], channels=11, kernel_size=[1, 
1]) /* ty=Tensor[(1, 11, 111, 111), float32] */;
     %10 = layout_transform(%8, src_layout="NHWC", dst_layout="NCHW") /* 
ty=Tensor[(1, 11, 1, 1), float32] */;
     %11 = add(%9, %10) /* ty=Tensor[(1, 11, 111, 111), float32] */;
     %12 = layout_transform(%11, src_layout="NCHW", dst_layout="NHWC") /* 
ty=Tensor[(1, 111, 111, 11), float32] */;
     %13 = split(%12, indices_or_sections=1, axis=3) /* ty=(Tensor[(1, 111, 
111, 11), float32],) */;
     %14 = %13.0;
     %15 = layout_transform(%14, src_layout="NHWC", dst_layout="NCHW") /* 
ty=Tensor[(1, 11, 111, 111), float32] */;
     %16 = nn.relu(%5) /* ty=Tensor[(1, 32, 111, 111), float32] */;
     %17 = layout_transform(%v_param_21, src_layout="HWOI", dst_layout="OIHW") 
/* ty=Tensor[(32, 1, 7, 7), float32] */;
     %18 = expand_dims(%v_param_22, axis=0, num_newaxis=3) /* ty=Tensor[(1, 1, 
1, 32), float32] */;
     %19 = nn.conv2d(%16, %17, strides=[2, 2], padding=[3, 3, 3, 3], groups=32, 
channels=32, kernel_size=[7, 7]) /* ty=Tensor[(1, 32, 56, 56), float32] */;
     %20 = layout_transform(%18, src_layout="NHWC", dst_layout="NCHW") /* 
ty=Tensor[(1, 32, 1, 1), float32] */;
     %21 = add(%19, %20) /* ty=Tensor[(1, 32, 56, 56), float32] */;
   ```
   After optimized is 
   ```
     %0 = layout_transform(%input, src_layout="NHWC", dst_layout="NCHW") /* 
ty=Tensor[(1, 3, 224, 224), float32] */;
     %1 = layout_transform(%v_param_1, src_layout="HWIO", dst_layout="OIHW") /* 
ty=Tensor[(32, 3, 3, 3), float32] */;
     %2 = expand_dims(%v_param_2, axis=0, num_newaxis=3) /* ty=Tensor[(1, 1, 1, 
32), float32] */;
     %3 = nn.conv2d(%0, %1, strides=[2, 2], padding=[0, 0, 0, 0], channels=32, 
kernel_size=[3, 3]) /* ty=Tensor[(1, 32, 111, 111), float32] */;
     %4 = layout_transform(%2, src_layout="NHWC", dst_layout="NCHW") /* 
ty=Tensor[(1, 32, 1, 1), float32] */;
     %5 = add(%3, %4) /* ty=Tensor[(1, 32, 111, 111), float32] */;
     %6 = nn.relu(%5) /* ty=Tensor[(1, 32, 111, 111), float32] */;
     %7 = layout_transform(%v_param_3, src_layout="HWIO", dst_layout="OIHW") /* 
ty=Tensor[(11, 32, 1, 1), float32] */;
     %8 = expand_dims(%v_param_4, axis=0, num_newaxis=3) /* ty=Tensor[(1, 1, 1, 
11), float32] */;
     %9 = nn.conv2d(%6, %7, padding=[0, 0, 0, 0], channels=11, kernel_size=[1, 
1]) /* ty=Tensor[(1, 11, 111, 111), float32] */;
     %10 = layout_transform(%8, src_layout="NHWC", dst_layout="NCHW") /* 
ty=Tensor[(1, 11, 1, 1), float32] */;
     %11 = add(%9, %10) /* ty=Tensor[(1, 11, 111, 111), float32] */;
     %12 = nn.relu(%5) /* ty=Tensor[(1, 32, 111, 111), float32] */;
     %13 = layout_transform(%v_param_21, src_layout="HWOI", dst_layout="OIHW") 
/* ty=Tensor[(32, 1, 7, 7), float32] */;
     %14 = expand_dims(%v_param_22, axis=0, num_newaxis=3) /* ty=Tensor[(1, 1, 
1, 32), float32] */;
     %15 = nn.conv2d(%12, %13, strides=[2, 2], padding=[3, 3, 3, 3], groups=32, 
channels=32, kernel_size=[7, 7]) /* ty=Tensor[(1, 32, 56, 56), float32] */;
     %16 = layout_transform(%14, src_layout="NHWC", dst_layout="NCHW") /* 
ty=Tensor[(1, 32, 1, 1), float32] */;
     %17 = add(%15, %16) /* ty=Tensor[(1, 32, 56, 56), float32] */;
   ```
   You can see that the `split` has been eliminated. 
   Hundreds of op calculations may be reduced for a large model
   Is this optimization possible?
   CC: @AndrewZhaoLuo 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to