leezu commented on a change in pull request #11573: Add stable nrm2 Reducer
URL: https://github.com/apache/incubator-mxnet/pull/11573#discussion_r201567681
 
 

 ##########
 File path: src/operator/mshadow_op.h
 ##########
 @@ -815,6 +868,76 @@ struct nanprod {
   }
 };
 
+/*! \brief compute l2 norm */
+struct nrm2 {
+  /*! \brief do reduction into dst */
+  template<typename DType>
+  MSHADOW_XINLINE static void Reduce(volatile DType& sum_of_squares, volatile 
DType src) { // NOLINT(*)
+    sum_of_squares += src * src;
+  }
+  /*! \brief do stable reduction into dst */
+  template<typename DType>
+  MSHADOW_XINLINE static void Reduce(volatile DType& sum_of_squares,  volatile 
DType src, volatile DType& scale) { // NOLINT(*)
+    if (src != 0) {
+      DType abs = mshadow_op::abs::Map(src);
+      if (scale < abs) {
+        sum_of_squares = 1 + sum_of_squares * (scale / abs) * (scale / abs);
+        scale = abs;
+      } else {
+        sum_of_squares = sum_of_squares + (abs / scale) * (abs / scale);
+      }
+    }
+  }
+  /*! \brief combine the results of two reducers */
+  template<typename DType>
+  MSHADOW_XINLINE static void Merge(volatile DType& dst_val, volatile DType& 
src_val) { // NOLINT(*)
+    dst_val += src_val;
+  }
+  /*! \brief combine the results of two reducers */
+  template<typename DType>
+  MSHADOW_XINLINE static void Merge(volatile DType& dst_ssq, volatile DType& 
dst_scale, volatile DType& src_ssq, volatile DType& src_scale) { // NOLINT(*)
+    if (dst_scale != 0 && dst_scale >= src_scale) {
+      dst_ssq = dst_ssq + src_ssq * (src_scale / dst_scale) * (src_scale / 
dst_scale);
 
 Review comment:
   Sure. Remember that we use a scaled sum of squares to compute the L2 norm, 
to avoid numeric instability caused by the squaring and subsequently taking the 
square root of very small / large numbers.
   For efficient reducing, on GPU multiple reducers compute a reduction of a 
part of a vector to be reduced. Their result is a scaled sum of squares. To 
combine the reducers, we must find a common scale for all of them. Following 
the implementation of Reduce, I choose the largest scale.
   
   Above equation simply rescales the sum of squares of the reducer that 
currently uses a smaller scale value, such that in the end `norm(x) = sqrt(ssq) 
* scale = dst_scale * sqrt(dst_ssq + 
src_ssq*src_scale/dst_scale*src_scale_dst_scale) = 
sqrt(src_scale*src_scale*src_ssq + dst_scale*dst_scale*dst_ssq)` (where we 
wan't to avoid the right part due to numerical instability; here scale and ssq 
denote what is written to dst_ssq and dst_scale in above code).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to