FrozenGene commented on pull request #5913:
URL: https://github.com/apache/incubator-tvm/pull/5913#issuecomment-660027353


   > see comments, we don't need to expose the(non_empty) function as a 
primitive function, given that it is not a numpy function, and is only used in 
AutoTVM, instead we can achieve the same goal by
   > 
   > ```python
   > x = nd.empty(..)
   > random_fill  = get_packed_func("contrib.random.random_fill") 
   > random_fill(x)
   > ```
   > 
   > Notably, the above solution is better because:
   > 
   > * Minimum interface exposure(non autotvm devs don't need to be aware of 
the change)
   > * Random initialization is performed on the device, the current impl 
actually will results in random number generated on the host then transfering 
to the device.
   
   @tqchen  I restart to this work. This way we could make the random 
initialization on the device (i.e. `AllocaWorkSpace` use correct device api). 
But as far as I know we still can not avoid generating random numbers on host 
if we don't call specific apis like  `cuRAND` (if we are for NV GPU) . That is 
to say, we still have to generate random numbers and copy to device. However, 
we should accomplish the goal we could generate random numbers on the remote 
cpu (if we are in rpc mode) and copy the data from remote cpu (like arm) to 
remote GPU (like mali) directly, not the path `x86 cpu -> arm cpu -> mali gpu`. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to