Tcc0403 commented on issue #714:
URL: https://github.com/apache/mahout/issues/714#issuecomment-3657470162

   For zero/nan value detection after l2 norm calculation, instead of moving 
tensor to host memory for validation, we can do it in device code and call 
[`__trap()`](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#trap-function)
 for early return and avoid subsequent encoding process. 
   
   Some real world use-cases:
   Torch: [nan 
check](https://github.com/pytorch/pytorch/blob/main/torch/csrc/distributed/c10d/NanCheck.cu)
   DeepEP: [time out 
mechanism](https://github.com/deepseek-ai/DeepEP/blob/a2d2354e1d0afd46942cd6e59aa51a37fb22b2ff/csrc/kernels/utils.cuh#L463)
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to