[GitHub] [incubator-mxnet] bartekkuncer commented on a diff in pull request #21034: [FEATURE] Add tanh approximation for GeLU activation

GitBox Tue, 21 Jun 2022 08:12:10 -0700


bartekkuncer commented on code in PR #21034:
URL: https://github.com/apache/incubator-mxnet/pull/21034#discussion_r902744238



##########
src/operator/leaky_relu.cc:
##########
@@ -167,6 +179,9 @@ The following modified ReLU Activation functions are 
supported:
 
 - *elu*: Exponential Linear Unit. `y = x > 0 ? x : slope * (exp(x)-1)`
 - *gelu*: Gaussian Error Linear Unit. `y = 0.5 * x * (1 + erf(x / sqrt(2)))`
+- *gelu_erf*: Same as gelu

Review Comment:
   ```suggestion
   - *gelu_erf*: Same as gelu.
   ```



##########
python/mxnet/gluon/nn/activations.py:
##########
@@ -200,18 +200,26 @@ class GELU(HybridBlock):
         "Gaussian Error Linear Units (GELUs)", Hendrycks et al, 2016
         https://arxiv.org/abs/1606.08415
 
+    Parameters
+    ----------
+    approximation : string
+        Which approximation of GELU calculation to use (erf or tanh) 
 
     Inputs:
         - **data**: input tensor with arbitrary shape.
 
     Outputs:
         - **out**: output tensor with the same shape as `data`.
     """
-    def __init__(self, **kwargs):
+    def __init__(self, approximation='erf', **kwargs):
+        if approximation not in ['erf', 'tanh']:
+            raise ValueError("Unsupported approximation! Support values are 
'erf' and 'tanh', "

Review Comment:
   ```suggestion
               raise ValueError("Unsupported approximation! Supported values 
are 'erf' and 'tanh', "
   ```



##########
src/operator/leaky_relu.cc:
##########
@@ -157,6 +157,18 @@ static bool LRChangeLayout(nnvm::NodeAttrs* attrs,
   return false;
 }
 
+static void LeakyReLUParamParser(nnvm::NodeAttrs* attrs) {
+  // For backward compatible, replace gelu to gelu_erf

Review Comment:
   ```suggestion
     // For backward compatibility, replace gelu to gelu_erf
   ```



##########
tests/python/unittest/test_operator.py:
##########
@@ -634,6 +634,29 @@ def fselu_grad(grad, x, y):
 
 
 def test_gelu():
+    np_erf = np.vectorize(math.erf)
+    def fgelu(x):
+        return 0.5 * x * (1.0 + np_erf(x/np.sqrt(2)))
+
+    def fgelu_grad(grad, x, y):
+        return grad * (y / x + x / np.sqrt(2 * math.pi) * np.exp(-0.5*(x**2)))
+
+    shape = (3, 4)
+    x = mx.sym.Variable("x")
+    y = mx.sym.LeakyReLU(data=x, act_type="gelu")
+    for dtype in [np.float16, np.float32, np.float64]:
+        xa = np.random.uniform(low=-0.1,high=0.1,size=shape).astype(dtype)

Review Comment:
   ```suggestion
           xa = np.random.uniform(low=-0.1, high=0.1, size=shape).astype(dtype)
   ```



##########
tests/python/unittest/test_operator.py:
##########
@@ -634,6 +634,29 @@ def fselu_grad(grad, x, y):
 
 
 def test_gelu():
+    np_erf = np.vectorize(math.erf)
+    def fgelu(x):
+        return 0.5 * x * (1.0 + np_erf(x/np.sqrt(2)))
+
+    def fgelu_grad(grad, x, y):
+        return grad * (y / x + x / np.sqrt(2 * math.pi) * np.exp(-0.5*(x**2)))

Review Comment:
   Inconsistent whitespaces.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-mxnet] bartekkuncer commented on a diff in pull request #21034: [FEATURE] Add tanh approximation for GeLU activation

Reply via email to