FrozenGene commented on pull request #5913:
URL: https://github.com/apache/incubator-tvm/pull/5913#issuecomment-660027353
> see comments, we don't need to expose the(non_empty) function as a
primitive function, given that it is not a numpy function, and is only used in
AutoTVM, instead we can achieve the same goal by
>
> ```python
> x = nd.empty(..)
> random_fill = get_packed_func("contrib.random.random_fill")
> random_fill(x)
> ```
>
> Notably, the above solution is better because:
>
> * Minimum interface exposure(non autotvm devs don't need to be aware of
the change)
> * Random initialization is performed on the device, the current impl
actually will results in random number generated on the host then transfering
to the device.
@tqchen I restart to this work. This way we could make the random
initialization on the device (i.e. `AllocaWorkSpace` use correct device api).
But as far as I know we still can not avoid generating random numbers on host
if we don't call specific apis like `cuRAND` (if we are for NV GPU) . That is
to say, we still have to generate random numbers and copy to device. However,
we should accomplish the goal we could generate random numbers on the remote
cpu (if we are in rpc mode) and copy the data from remote cpu (like arm) to
remote GPU (like mali) directly, not the path `x86 cpu -> arm cpu -> mali gpu`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]