FrozenGene edited a comment on pull request #5913:
URL: https://github.com/apache/incubator-tvm/pull/5913#issuecomment-671789877


   > > @FrozenGene Please followup. It is okay to do the path 
`CPU@remote_device -> GPU@remote_device` for now, as long as there is no RPC 
communication cost (i.e. no `local_device` -> `remote device`)
   > > I remembered that we tried to do this in our internal repo but failed. 
What's the problem at that time?
   > 
   > @merrymercy Our current method is we will introduce one `dummy cpu` 
context in the remote and pass the data to the remote target (like OpenCL, 
CUDA). Previous time we want to do is to generate non empty data in the remote 
target but failed.
   > 
   > @tqchen 's suggestion we could leverage `empty` interface and fill the 
data into the allocated tensor to avoid introducing new `non_empty` api in the 
C / ndarray interface and generate random data directly in the remote device. 
Previous comment is to make sure that we maybe have to introduce cpu like our 
current way.
   > 
   > I will follow up my pr that move our implementation to the 
`contrib/random/random.cc` and turn it on always as our auto scheduler has 
local builder / local runner also rely on it (not just rpc).
   
   @merrymercy @tqchen I have updated the code and verified it in the remote 
cpu / remote mali gpu. We could do `CPU@remote_device -> GPU@remote_device` 
directly, not `CPU@host->CPU@remote_device -> GPU@remote_device`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to