[GitHub] [incubator-mxnet] anko-intel commented on a change in pull request #20753: Improve AMP, bf16 support. Support oneDNN ops in AMP

GitBox Tue, 15 Mar 2022 11:51:52 -0700


anko-intel commented on a change in pull request #20753:
URL: https://github.com/apache/incubator-mxnet/pull/20753#discussion_r826156404




##########
File path: python/mxnet/amp/amp.py
##########
@@ -700,79 +670,60 @@ def convert_hybrid_block(block, target_dtype="float16", 
target_dtype_ops=None,
         from being quantized
     device : Context
         Context on which model parameters should live
-    cast_optional_params : bool, default False
+    cast_params_offline : bool, default False
         Whether to cast the arg_params and aux_params that don't require to be 
in LP16
         because of a cast layer following it, but will reduce the computation 
and memory
         overhead of the model if casted.
     """
     from ..gluon import HybridBlock, SymbolBlock
+    from ..ndarray import NDArray as ND_NDArray
+    from ..numpy import ndarray as NP_NDArray
+
     assert isinstance(block, HybridBlock), "block input should be a 
HybridBlock"
+    if not isinstance(data_example, (list, tuple)):

Review comment:
       if data_example are tuple of list or tuple should we check if tuple's 
are NP|ND NDArray ?

##########
File path: .github/workflows/os_x_mklbuild.yml
##########
@@ -47,4 +47,4 @@ jobs:
           python -m pytest -n 4 --durations=50 --verbose 
tests/python/unittest/ -k 'not test_operator and not (test_subgraph or 
test_custom_op or test_external_op or 
test_recordimage_dataset_with_data_loader_multiworker or test_multi_worker or 
test_multi_worker_shape or test_multi_worker_forked_data_loader or 
test_multi_worker_dataloader_release_pool)' -m 'not serial'
           MXNET_ENGINE_TYPE=NaiveEngine python -m pytest -n 4 --durations=50 
--verbose tests/python/unittest/ -k 'test_operator and not (test_subgraph or 
test_custom_op or test_external_op or 
test_recordimage_dataset_with_data_loader_multiworker or test_multi_worker or 
test_multi_worker_shape or test_multi_worker_forked_data_loader or 
test_multi_worker_dataloader_release_pool)' -m 'not serial'
           python -m pytest --durations=50 --verbose tests/python/unittest/ -k 
'not (test_subgraph or test_custom_op or test_external_op or 
test_recordimage_dataset_with_data_loader_multiworker or test_multi_worker or 
test_multi_worker_shape or test_multi_worker_forked_data_loader or 
test_multi_worker_dataloader_release_pool)' -m 'serial'
-          python -m pytest -n 4 --durations=50 --verbose tests/python/dnnl -k 
'not test_bf16_operator'
+          python -m pytest -n 4 --durations=50 --verbose tests/python/dnnl -k 
'not (test_bf16_operator or test_amp or test_amp_subgraph)'

Review comment:
       why do you disable bf/amp tests ? There is no support on macos ?

##########
File path: python/mxnet/gluon/nn/basic_layers.py
##########
@@ -377,7 +377,7 @@ def __init__(self, axis=1, momentum=0.9, epsilon=1e-5, 
center=True, scale=True,
                                      differentiable=False)
 
     def cast(self, dtype):
-        if _np.dtype(dtype).name == 'float16':
+        if get_dtype_name(dtype) == 'float16':

Review comment:
       nothing about bfloat for batchnorm ?

##########
File path: python/mxnet/amp/amp.py
##########
@@ -700,79 +670,60 @@ def convert_hybrid_block(block, target_dtype="float16", 
target_dtype_ops=None,
         from being quantized
     device : Context
         Context on which model parameters should live
-    cast_optional_params : bool, default False
+    cast_params_offline : bool, default False
         Whether to cast the arg_params and aux_params that don't require to be 
in LP16
         because of a cast layer following it, but will reduce the computation 
and memory
         overhead of the model if casted.
     """
     from ..gluon import HybridBlock, SymbolBlock
+    from ..ndarray import NDArray as ND_NDArray
+    from ..numpy import ndarray as NP_NDArray
+
     assert isinstance(block, HybridBlock), "block input should be a 
HybridBlock"
+    if not isinstance(data_example, (list, tuple)):
+        assert isinstance(data_example, (ND_NDArray, NP_NDArray))
+        data_example = [data_example]
+
     if not block._cached_graph:
-        raise RuntimeError(
-            "Please first call block.hybridize() and then run forward with "
-            "this block at least once before calling export.")
-
-    # Prepare inputs to pass to the convert_symbol API
-    inputs, sym = block._cached_graph
-    input_names = []
-    for inp in inputs:
-        input_names.append(inp.name)
-    converted_sym = convert_symbol(sym, target_dtype, target_dtype_ops,
-                                   fp32_ops, conditional_fp32_ops,
-                                   excluded_sym_names, data_names=input_names,
-                                   cast_optional_params=cast_optional_params)
-
-    arg_names = set(converted_sym.list_arguments())
-    aux_names = set(converted_sym.list_auxiliary_states())
-    arg_dict = {}
-
-    # If dtype for the param was set in the json, cast the
-    # param to this dtype
-    attr_dict = converted_sym.attr_dict()
-    for param in block.collect_params().values():
-        name = param.name
-        if name in arg_names:
-            arg_dict['arg:%s'%name] = param._reduce()
-            if name in attr_dict and "__dtype__" in attr_dict[name]:
-                if attr_dict[name]["__dtype__"] != "-1":
-                    typ = _DTYPE_MX_TO_NP[int(attr_dict[name]["__dtype__"])]
-                    if typ == bfloat16:
-                        arg_dict['arg:%s' % name] = 
_cast_symbol_NDArray(arg_dict['arg:%s' % name], bfloat16)
-                    else:
-                        arg_dict['arg:%s'%name] = 
arg_dict['arg:%s'%name].astype(typ)
+        block.hybridize()
+    block(*data_example)

Review comment:
       should we wait for the result (even if we don't required it ?)

##########
File path: python/mxnet/amp/amp.py
##########
@@ -459,73 +463,73 @@ def convert_symbol(sym, target_dtype="float16", 
target_dtype_ops=None,
         from being casted to LP16 or FP32.
     data_names : list of strs, optional
         A list of strings that represent input data tensor names to the model
-    cast_optional_params : bool, default False
+    cast_params_offline : bool, default False
         Whether to cast the arg_params and aux_params that don't require to be 
in LP16

Review comment:
       is there LP16 somewhere defined. it means FP16 or BFloat16? Or it is 
separate case and BF16 should be added in descriptions?

##########
File path: tests/python/dnnl/subgraphs/test_amp_subgraph.py
##########
@@ -0,0 +1,243 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import json
+import mxnet as mx
+import mxnet.gluon.nn as nn
+from mxnet import amp
+from mxnet.amp.amp import bfloat16
+from mxnet.test_utils import assert_almost_equal
+from subgraph_common import SG_PASS_NAME, QUANTIZE_SG_PASS_NAME

Review comment:
       could we reuse more from previous subgraph tests? For example 
FC+something tests?

##########
File path: python/mxnet/amp/amp.py
##########
@@ -700,79 +670,60 @@ def convert_hybrid_block(block, target_dtype="float16", 
target_dtype_ops=None,
         from being quantized
     device : Context
         Context on which model parameters should live
-    cast_optional_params : bool, default False
+    cast_params_offline : bool, default False
         Whether to cast the arg_params and aux_params that don't require to be 
in LP16
         because of a cast layer following it, but will reduce the computation 
and memory
         overhead of the model if casted.
     """
     from ..gluon import HybridBlock, SymbolBlock
+    from ..ndarray import NDArray as ND_NDArray
+    from ..numpy import ndarray as NP_NDArray
+
     assert isinstance(block, HybridBlock), "block input should be a 
HybridBlock"
+    if not isinstance(data_example, (list, tuple)):
+        assert isinstance(data_example, (ND_NDArray, NP_NDArray))
+        data_example = [data_example]
+
     if not block._cached_graph:
-        raise RuntimeError(
-            "Please first call block.hybridize() and then run forward with "
-            "this block at least once before calling export.")
-
-    # Prepare inputs to pass to the convert_symbol API
-    inputs, sym = block._cached_graph
-    input_names = []
-    for inp in inputs:
-        input_names.append(inp.name)
-    converted_sym = convert_symbol(sym, target_dtype, target_dtype_ops,
-                                   fp32_ops, conditional_fp32_ops,
-                                   excluded_sym_names, data_names=input_names,
-                                   cast_optional_params=cast_optional_params)
-
-    arg_names = set(converted_sym.list_arguments())
-    aux_names = set(converted_sym.list_auxiliary_states())
-    arg_dict = {}
-
-    # If dtype for the param was set in the json, cast the
-    # param to this dtype
-    attr_dict = converted_sym.attr_dict()
-    for param in block.collect_params().values():
-        name = param.name
-        if name in arg_names:
-            arg_dict['arg:%s'%name] = param._reduce()
-            if name in attr_dict and "__dtype__" in attr_dict[name]:
-                if attr_dict[name]["__dtype__"] != "-1":
-                    typ = _DTYPE_MX_TO_NP[int(attr_dict[name]["__dtype__"])]
-                    if typ == bfloat16:
-                        arg_dict['arg:%s' % name] = 
_cast_symbol_NDArray(arg_dict['arg:%s' % name], bfloat16)
-                    else:
-                        arg_dict['arg:%s'%name] = 
arg_dict['arg:%s'%name].astype(typ)
+        block.hybridize()

Review comment:
       what about hybridize parameters? What when user is intended to use later 
model with different than default settings ?

##########
File path: src/operator/subgraph/dnnl/dnnl_post_amp_property.h
##########
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+#ifndef MXNET_OPERATOR_SUBGRAPH_DNNL_DNNL_POST_AMP_PROPERTY_H_
+#define MXNET_OPERATOR_SUBGRAPH_DNNL_DNNL_POST_AMP_PROPERTY_H_
+#if MXNET_USE_ONEDNN == 1
+
+#include <set>
+#include <string>
+#include <vector>
+
+#include "../../tensor/amp_cast.h"
+#include "../common.h"
+#include "dnnl_subgraph_base-inl.h"
+
+namespace mxnet {
+namespace op {
+
+inline bool IsSupportedAMPFuseOp(const nnvm::Node& node) {
+  const auto& op = node.op();
+  return (op != nullptr &&
+          (op == Op::Get("_sg_onednn_conv") || op == 
Op::Get("_sg_onednn_fully_connected") ||

Review comment:
       Cout you made static const std::set<const Op*> for list of suported 
operators, see 
https://github.com/apache/incubator-mxnet/pull/20929/files#diff-ad94944d261e72514ae44cef841ea130d44737b5ebdcfae76efd7dd8b15fb997R40-R52

##########
File path: src/operator/subgraph/dnnl/dnnl_subgraph_property.cc
##########
@@ -60,6 +61,9 @@ MXNET_REGISTER_SUBGRAPH_PROPERTY(ONEDNN_QUANTIZE, 
SgDNNLPostQuantizeAlignScalePr
 MXNET_REGISTER_SUBGRAPH_PROPERTY(ONEDNN_QUANTIZE, SgDNNLFCSumFuseProperty)
     .set_attr("quantize", true);
 
+MXNET_REGISTER_SUBGRAPH_BACKEND(ONEDNN_AMP).set_attr("context", 
Context::CPU());

Review comment:
       what about mixing quantization with AMP? should it also work for 
ONEDNN_QUANTIZE?
   If we put it to ONEDNN what should be an order of passes? Could AMP work on 
already fused operators ?
   How we can run only AMP pass if we wish?
   So rather it seems to be a not good idea.

##########
File path: src/operator/subgraph/dnnl/dnnl_fc.cc
##########
@@ -662,7 +662,11 @@ static bool SgDNNLFCInferType(const nnvm::NodeAttrs& attrs,
     }
     return true;
   } else {
-    return DefaultSubgraphOpType(attrs, in_types, out_types);
+    bool result = DefaultSubgraphOpType(attrs, in_types, out_types);
+    if (full_param.dnnl_param.amp_out_dtype.has_value()) {

Review comment:
       Is it means that we assume float32 type?
   Also quantized version could produce float if enable_float_output is set. 
Could it be writen as bfloat ?

##########
File path: src/operator/subgraph/dnnl/dnnl_transformer-inl.h
##########
@@ -32,6 +32,8 @@ struct DNNLSelfAttParam : public 
dmlc::Parameter<DNNLSelfAttParam> {
   bool enable_float_output;
   dmlc::optional<float> min_calib_range;  // min float value calculated from 
calibration dataset
   dmlc::optional<float> max_calib_range;  // max float value calculated from 
calibration dataset
+  dmlc::optional<int> amp_out_dtype;

Review comment:
       it is mshadow dtype ? some comment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-mxnet] anko-intel commented on a change in pull request #20753: Improve AMP, bf16 support. Support oneDNN ops in AMP

Reply via email to