lsy643 commented on pull request #6076:
URL: https://github.com/apache/incubator-tvm/pull/6076#issuecomment-659854233


   @zhiics @kevinthesun I am not trying to support heterogeneous execution in 
this PR. What I want is to make a model with operators all on gpu and dynamic 
input can compile and execute as expected. Such as this network:
   ```python
   debug_graph = tf.Graph()
   with debug_graph.as_default():
       input_1 = tf.placeholder(dtype=tf.int32, shape=[None], name='input_1')
       result = tf.nn.relu(input_1, name='result')
   target = "cuda"
   context = tvm.gpu()
   ```
   
   After the compilation of the above network, the shape function for `relu` is 
located on cpu but the output tensor for the shape function is located on gpu, 
which causes the `Argument arg1.device_type has an unsatisfied constraint` 
error. By creating a temporary cpu tensor for outputs of shape function and 
then copying back to the tensor on gpu, the compilation and execution of the 
above network can currently work.
   
   Besides, is it reasonable to deal with shape functions separately, like 
adding a `Opcode::InvokeShapeFunc` or something else? Since the shape function 
seems to always locate on cpu.
   
   I am highly looking forward to the heterogeneous compilation and execution 
of the virtual machine, which will make my current work much easier. If there 
is anything I can help, just let me know. Thanks a lot.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to