FrozenGene commented on pull request #5913: URL: https://github.com/apache/incubator-tvm/pull/5913#issuecomment-669731115
> @FrozenGene Please followup. It is okay to do the path `CPU@remote_device -> GPU@remote_device` for now, as long as there is no RPC communication cost (i.e. no `local_device` -> `remote device`) > I remembered that we tried to do this in our internal repo but failed. What's the problem at that time? @merrymercy Our current method is we will introduce one `dummy cpu` context in the remote and pass the data to the remote target (like OpenCL, CUDA). Previous time we want to do is to generate non empty data in the remote target but failed. @tqchen 's suggestion we could leverage `empty` interface and fill the data into the allocated tensor to avoid introducing new `non_empty` api in the C / ndarray interface and generate random data directly in the remote device. Previous comment is to make sure that we maybe have to introduce cpu like our current way. I will follow up my pr that move our implementation to the `contrib/random/random.cc` and turn it on always as our auto scheduler has local builder / local runner also rely on it (not just rpc). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
