masahi commented on a change in pull request #7562:
URL: https://github.com/apache/tvm/pull/7562#discussion_r586916951



##########
File path: python/tvm/relay/frontend/tensorflow.py
##########
@@ -1166,6 +1166,125 @@ def _impl(inputs, attr, params, mod):
     return _impl
 
 
+def _math_segment_sum():
+    def _impl(inputs, attr, params, mod):
+        assert len(inputs) == 2, "There should be 2 input tensors"
+        return get_relay_op("segment_sum")(inputs[0], inputs[1])
+
+    return _impl
+
+
+def _sparse_segment_sum():
+    def _impl(inputs, attr, params, mod):
+        assert len(inputs) == 3, "There should be 3 input tensors"
+        data = _op.take(inputs[0], inputs[1], axis=0)
+        return _op.segment_sum(data, inputs[2])
+

Review comment:
       I'm not aware of existing ops that I can count as "custom fused op". An 
efficient implementation of `sparse_segment_sum` is a new challenge for us I 
think. You can take a look at caffe2 repository for example, they have a LOT of 
code to better optimize this op. They even have a codegen tool for this op, see 
https://github.com/pytorch/pytorch/blob/master/caffe2/perfkernels/hp_emblookup_codegen.py
 and the generated code 
https://github.com/pytorch/pytorch/blob/master/caffe2/perfkernels/embedding_lookup_fused_8bit_rowwise_avx2.cc.
 This is the level of seriousness they have for this op. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to