sxjscience edited a comment on issue #16355: Embedding gradient performance optimization on GPU URL: https://github.com/apache/incubator-mxnet/pull/16355#issuecomment-538595793 Nice! LGTM. So the BinarySearch version of FindBounds has complexity O(|V| log |N|) where |V| is the vocabulary size and |N| is the number of indices. I guess our initial version (https://github.com/dmlc/mshadow/blob/master/mshadow/cuda/tensor_gpu-inl.cuh#L619-L672) has complexity O(|N|) for finding the boundaries. Thus, in some workloads (in which |N| is small), the O(N) version might be faster.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
