FrozenGene commented on a change in pull request #5915: URL: https://github.com/apache/incubator-tvm/pull/5915#discussion_r454941618
########## File path: python/tvm/relay/op/contrib/arm_compute_lib.py ########## @@ -0,0 +1,119 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# pylint: disable=invalid-name, unused-argument +"""ACL library supported operators.""" Review comment: `ACL` -> `ARM Compute Library` ########## File path: docs/deploy/arm_compute_lib.rst ########## @@ -0,0 +1,135 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Relay Arm|reg| Compute Library Integration +========================================== + +Introduction +------------ + +Arm Compute Library (ACL) is an open source project that provides accelerated kernels for Arm CPU's +and GPU's. Currently the integration offloads operators to ACL to use hand-crafted assembler +routines in the library. By offloading select operators from a relay graph to ACL we can achieve +a performance boost on such devices. + +Building with ACL support +------------------------- + +The current implementation has two separate build options in cmake. The reason for this split is +because ACL cannot be used on an x86 machine. However, we still want to be able compile an ACL +runtime module on an x86 machine. + +* USE_ARM_COMPUTE_LIB=ON/OFF - Enabling this flag will add support for compiling an ACL runtime module. +* USE_ARM_COMPUTE_LIB_GRAPH_RUNTIME=ON/OFF/path-to-acl - Enabling this flag will allow the graph runtime to + compute the ACL offloaded functions. + +These flags can be used in different scenarios depending on your setup. For example, if you want +to compile ACL on an x86 machine and then run the module on a remote Arm device via RPC, you will +need to use USE_ACL=ON on the x86 machine and USE_GRAPH_RUNTIME_ACL=ON on the remote AArch64 +device. + +Usage +----- + +*Note:* this section may not stay up-to-date with changes to the API. + +Create a relay graph. This may be a single operator or a whole graph. The intention is that any +relay graph can be input. The ACL integration will only pick supported operators to be offloaded +whilst the rest will be computed via TVM. (For this example we will use a single +max_pool2d operator). + +.. code:: python + + import tvm + from tvm import relay + + data_type = "float32" + data_shape = (1, 14, 14, 512) + strides = (2, 2) + padding = (0, 0, 0, 0) + pool_size = (2, 2) + layout = "NHWC" + output_shape = (1, 7, 7, 512) + + data = relay.var('data', shape=data_shape, dtype=data_type) + out = relay.nn.max_pool2d(data, pool_size=pool_size, strides=strides, layout=layout, padding=padding) + module = tvm.IRModule.from_expr(out) + + +Annotate and partition the graph for ACL. + +..code:: python + + from tvm.relay.op.contrib.arm_compute_lib import partition_for_arm_compute_lib + partition_for_arm_compute_lib(module) + + +Build the Relay graph. + +.. code:: python + + target = "llvm -mtriple=aarch64-linux-gnu -mattr=+neon" + with tvm.transform.PassContext(opt_level=3, disabled_pass=["AlterOpLayout"]): + json, lib, params = relay.build(module, target=target) Review comment: As we have supported model based runtime, `relay.build` now returns `lib`, which contains `json` / `params`. Here, we could use the new behavior, just return `lib = relay.build(module, target=target)` ########## File path: tests/python/contrib/test_arm_compute_lib/infrastructure.py ########## @@ -0,0 +1,167 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from itertools import zip_longest, combinations +import json + +import tvm +from tvm import relay +from tvm import rpc +from tvm.contrib import graph_runtime +from tvm.relay.op.contrib import arm_compute_lib +from tvm.contrib import util + + +class Device: + """Adjust the following settings to connect to and use a remote device for tests.""" + use_remote = False + target = "llvm -mtriple=aarch64-linux-gnu -mattr=+neon" + # Enable cross compilation when connecting a remote device from a non-arm platform. + cross_compile = None + # cross_compile = "aarch64-linux-gnu-g++" + + def __init__(self): + """Keep remote device for lifetime of object.""" + self.device = self._get_remote() + + @classmethod + def _get_remote(cls): + """Get a remote (or local) device to use for testing.""" + if cls.use_remote: + # Here you may adjust settings to run the ACL unit tests via a remote + # device using the RPC mechanism. Use this in the case you want to compile + # an ACL module on a different machine to what you run the module on i.e. + # x86 -> AArch64. + # + # Use the following to connect directly to a remote device: + # device = rpc.connect( + # hostname="0.0.0.0", + # port=9090) + # + # Or connect via a tracker: + # device = tvm.autotvm.measure.request_remote( + # host="0.0.0.0", + # port=9090, + # device_key="device_key", + # timeout=1000) + # + # return device + raise NotImplementedError( + "Please adjust these settings to connect to your remote device.") + else: + device = rpc.LocalSession() + return device + + +def skip_runtime_test(): + """Skip test if it requires the runtime and it's not present.""" + # ACL codegen not present. + if not tvm.get_global_func("relay.ext.arm_compute_lib", True): + print("Skip because Arm Compute Library codegen is not available.") + return True + + # Remote device is in use or ACL runtime not present + if not Device.use_remote and not arm_compute_lib.is_arm_compute_runtime_enabled(): + print("Skip because runtime isn't present or a remote device isn't being used.") + return True + + +def skip_codegen_test(): + """Skip test if it requires the ACL codegen and it's not present.""" + if not tvm.get_global_func("relay.ext.arm_compute_lib", True): + print("Skip because Arm Compute Library codegen is not available.") + return True + + +def build_module(mod, target, params=None, enable_acl=True): + """Build module with option to build for ACL.""" + if isinstance(mod, tvm.relay.expr.Call): + mod = tvm.IRModule.from_expr(mod) + with tvm.transform.PassContext(opt_level=3, disabled_pass=["AlterOpLayout"]): + if enable_acl: + mod = arm_compute_lib.partition_for_arm_compute_lib(mod, params) + relay.backend.compile_engine.get().clear() + return relay.build(mod, target=target, params=params) + + +def build_and_run(mod, inputs, outputs, params, device, enable_acl=True, no_runs=1): + """Build and run the relay module.""" + graph, lib, params = build_module(mod, device.target, params, enable_acl) + lib = update_lib(lib, device.device, device.cross_compile) + gen_module = graph_runtime.create(graph, lib, ctx=device.device.cpu(0)) + gen_module.set_input(**inputs) + gen_module.set_input(**params) + for _ in range(no_runs): + gen_module.run() + out = [gen_module.get_output(i) for i in range(outputs)] + return out + + +def update_lib(lib, device, cross_compile): + """Export the library to the remote/local device.""" + lib_name = "mod.so" + temp = util.tempdir() + lib_path = temp.relpath(lib_name) + if cross_compile: + lib.export_library(lib_path, cc=cross_compile) + else: + lib.export_library(lib_path) + device.upload(lib_path) + lib = device.load_module(lib_name) + return lib + + +def verify(answers, atol, rtol): + """Compare the array of answers. Each entry is a list of outputs.""" + if len(answers) < 2: + raise RuntimeError( + f"No results to compare: expected at least two, found {len(answers)}") + for answer in zip_longest(*answers): + for outs in combinations(answer, 2): + tvm.testing.assert_allclose( + outs[0].asnumpy(), outs[1].asnumpy(), rtol=rtol, atol=atol) + + +def extract_acl_modules(module): + """Get the ACL module(s) from llvm module.""" + return list(filter(lambda mod: mod.type_key == "arm_compute_lib", + module.imported_modules)) + + +def verify_codegen(module, known_good_codegen, num_acl_modules, + target="llvm -mtriple=aarch64-linux-gnu -mattr=+neon"): + """Check acl codegen against a known good output.""" + _, module, _ = build_module(module, target) Review comment: ```python module = build_module(module, target) ``` ########## File path: python/tvm/relay/op/contrib/arm_compute_lib.py ########## @@ -0,0 +1,119 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# pylint: disable=invalid-name, unused-argument +"""ACL library supported operators.""" +import tvm +from tvm.relay import transform +from tvm.relay.build_module import bind_params_by_name + +from ...dataflow_pattern import wildcard, is_op, is_constant +from .register import register_pattern_table + + +def is_arm_compute_runtime_enabled(): + """Check if the ACL graph runtime is present. + + Returns + ------- + ret: bool + True if present, False if not. + """ + return tvm.get_global_func("relay.op.is_arm_compute_runtime_enabled", True) + + +def partition_for_arm_compute_lib(mod, params=None): + """Partition the graph greedily offloading supported + operators to Arm Compute Library. + + Parameters + ---------- + mod : Module + The module to run passes on. + params : Optional[Dict[str, NDArray]] + Constant input parameters. + + Returns + ------- + ret : annotated and partitioned module. + """ + if params: + mod['main'] = bind_params_by_name(mod['main'], params) + + seq = tvm.transform.Sequential([transform.MergeComposite(arm_compute_lib_pattern_table()), + transform.AnnotateTarget('arm_compute_lib'), + transform.PartitionGraph()]) + + return seq(mod) + + +@register_pattern_table("arm_compute_lib") +def arm_compute_lib_pattern_table(): + """Get the ACL pattern table.""" + + def conv_pattern(): + """Create a convolution pattern. + + Returns + ------- + pattern : dataflow_pattern.AltPattern + Denotes the convolution pattern. + """ + pattern = is_op('nn.pad')(wildcard()) | wildcard() + pattern = is_op('nn.conv2d')(pattern, is_constant()) + pattern = pattern.optional(lambda x: is_op('nn.bias_add')(x, is_constant())) + pattern = pattern.optional(is_op('nn.relu')) Review comment: Curious question, does ACL support to fuse `relu6`? which is implemented by op `clip` in TVM. ########## File path: tests/python/contrib/test_arm_compute_lib/infrastructure.py ########## @@ -0,0 +1,167 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from itertools import zip_longest, combinations +import json + +import tvm +from tvm import relay +from tvm import rpc +from tvm.contrib import graph_runtime +from tvm.relay.op.contrib import arm_compute_lib +from tvm.contrib import util + + +class Device: + """Adjust the following settings to connect to and use a remote device for tests.""" + use_remote = False + target = "llvm -mtriple=aarch64-linux-gnu -mattr=+neon" + # Enable cross compilation when connecting a remote device from a non-arm platform. + cross_compile = None + # cross_compile = "aarch64-linux-gnu-g++" + + def __init__(self): + """Keep remote device for lifetime of object.""" + self.device = self._get_remote() + + @classmethod + def _get_remote(cls): + """Get a remote (or local) device to use for testing.""" + if cls.use_remote: + # Here you may adjust settings to run the ACL unit tests via a remote + # device using the RPC mechanism. Use this in the case you want to compile + # an ACL module on a different machine to what you run the module on i.e. + # x86 -> AArch64. + # + # Use the following to connect directly to a remote device: + # device = rpc.connect( + # hostname="0.0.0.0", + # port=9090) + # + # Or connect via a tracker: + # device = tvm.autotvm.measure.request_remote( + # host="0.0.0.0", + # port=9090, + # device_key="device_key", + # timeout=1000) + # + # return device + raise NotImplementedError( + "Please adjust these settings to connect to your remote device.") + else: + device = rpc.LocalSession() + return device + + +def skip_runtime_test(): + """Skip test if it requires the runtime and it's not present.""" + # ACL codegen not present. + if not tvm.get_global_func("relay.ext.arm_compute_lib", True): + print("Skip because Arm Compute Library codegen is not available.") + return True + + # Remote device is in use or ACL runtime not present + if not Device.use_remote and not arm_compute_lib.is_arm_compute_runtime_enabled(): + print("Skip because runtime isn't present or a remote device isn't being used.") + return True + + +def skip_codegen_test(): + """Skip test if it requires the ACL codegen and it's not present.""" + if not tvm.get_global_func("relay.ext.arm_compute_lib", True): + print("Skip because Arm Compute Library codegen is not available.") + return True + + +def build_module(mod, target, params=None, enable_acl=True): + """Build module with option to build for ACL.""" + if isinstance(mod, tvm.relay.expr.Call): + mod = tvm.IRModule.from_expr(mod) + with tvm.transform.PassContext(opt_level=3, disabled_pass=["AlterOpLayout"]): + if enable_acl: + mod = arm_compute_lib.partition_for_arm_compute_lib(mod, params) + relay.backend.compile_engine.get().clear() + return relay.build(mod, target=target, params=params) + + +def build_and_run(mod, inputs, outputs, params, device, enable_acl=True, no_runs=1): + """Build and run the relay module.""" + graph, lib, params = build_module(mod, device.target, params, enable_acl) + lib = update_lib(lib, device.device, device.cross_compile) + gen_module = graph_runtime.create(graph, lib, ctx=device.device.cpu(0)) Review comment: ```python gen_module = graph_runtime.GraphModule(lib['default'](device.device.cpu(0))) ``` ########## File path: tests/python/contrib/test_arm_compute_lib/infrastructure.py ########## @@ -0,0 +1,167 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from itertools import zip_longest, combinations +import json + +import tvm +from tvm import relay +from tvm import rpc +from tvm.contrib import graph_runtime +from tvm.relay.op.contrib import arm_compute_lib +from tvm.contrib import util + + +class Device: + """Adjust the following settings to connect to and use a remote device for tests.""" + use_remote = False + target = "llvm -mtriple=aarch64-linux-gnu -mattr=+neon" + # Enable cross compilation when connecting a remote device from a non-arm platform. + cross_compile = None + # cross_compile = "aarch64-linux-gnu-g++" + + def __init__(self): + """Keep remote device for lifetime of object.""" + self.device = self._get_remote() + + @classmethod + def _get_remote(cls): + """Get a remote (or local) device to use for testing.""" + if cls.use_remote: + # Here you may adjust settings to run the ACL unit tests via a remote + # device using the RPC mechanism. Use this in the case you want to compile + # an ACL module on a different machine to what you run the module on i.e. + # x86 -> AArch64. + # + # Use the following to connect directly to a remote device: + # device = rpc.connect( + # hostname="0.0.0.0", + # port=9090) + # + # Or connect via a tracker: + # device = tvm.autotvm.measure.request_remote( + # host="0.0.0.0", + # port=9090, + # device_key="device_key", + # timeout=1000) + # + # return device + raise NotImplementedError( + "Please adjust these settings to connect to your remote device.") + else: + device = rpc.LocalSession() + return device + + +def skip_runtime_test(): + """Skip test if it requires the runtime and it's not present.""" + # ACL codegen not present. + if not tvm.get_global_func("relay.ext.arm_compute_lib", True): + print("Skip because Arm Compute Library codegen is not available.") + return True + + # Remote device is in use or ACL runtime not present + if not Device.use_remote and not arm_compute_lib.is_arm_compute_runtime_enabled(): + print("Skip because runtime isn't present or a remote device isn't being used.") + return True + + +def skip_codegen_test(): + """Skip test if it requires the ACL codegen and it's not present.""" + if not tvm.get_global_func("relay.ext.arm_compute_lib", True): + print("Skip because Arm Compute Library codegen is not available.") + return True + + +def build_module(mod, target, params=None, enable_acl=True): + """Build module with option to build for ACL.""" + if isinstance(mod, tvm.relay.expr.Call): + mod = tvm.IRModule.from_expr(mod) + with tvm.transform.PassContext(opt_level=3, disabled_pass=["AlterOpLayout"]): + if enable_acl: + mod = arm_compute_lib.partition_for_arm_compute_lib(mod, params) + relay.backend.compile_engine.get().clear() + return relay.build(mod, target=target, params=params) + + +def build_and_run(mod, inputs, outputs, params, device, enable_acl=True, no_runs=1): + """Build and run the relay module.""" + graph, lib, params = build_module(mod, device.target, params, enable_acl) + lib = update_lib(lib, device.device, device.cross_compile) + gen_module = graph_runtime.create(graph, lib, ctx=device.device.cpu(0)) + gen_module.set_input(**inputs) + gen_module.set_input(**params) Review comment: Could remove ```gen_module.set_input(**params)``` in new behavior ########## File path: docs/deploy/arm_compute_lib.rst ########## @@ -0,0 +1,135 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Relay Arm|reg| Compute Library Integration +========================================== + +Introduction +------------ + +Arm Compute Library (ACL) is an open source project that provides accelerated kernels for Arm CPU's +and GPU's. Currently the integration offloads operators to ACL to use hand-crafted assembler +routines in the library. By offloading select operators from a relay graph to ACL we can achieve +a performance boost on such devices. + +Building with ACL support +------------------------- + +The current implementation has two separate build options in cmake. The reason for this split is +because ACL cannot be used on an x86 machine. However, we still want to be able compile an ACL +runtime module on an x86 machine. + +* USE_ARM_COMPUTE_LIB=ON/OFF - Enabling this flag will add support for compiling an ACL runtime module. +* USE_ARM_COMPUTE_LIB_GRAPH_RUNTIME=ON/OFF/path-to-acl - Enabling this flag will allow the graph runtime to + compute the ACL offloaded functions. + +These flags can be used in different scenarios depending on your setup. For example, if you want +to compile ACL on an x86 machine and then run the module on a remote Arm device via RPC, you will +need to use USE_ACL=ON on the x86 machine and USE_GRAPH_RUNTIME_ACL=ON on the remote AArch64 +device. + +Usage +----- + +*Note:* this section may not stay up-to-date with changes to the API. + +Create a relay graph. This may be a single operator or a whole graph. The intention is that any +relay graph can be input. The ACL integration will only pick supported operators to be offloaded +whilst the rest will be computed via TVM. (For this example we will use a single +max_pool2d operator). + +.. code:: python + + import tvm + from tvm import relay + + data_type = "float32" + data_shape = (1, 14, 14, 512) + strides = (2, 2) + padding = (0, 0, 0, 0) + pool_size = (2, 2) + layout = "NHWC" + output_shape = (1, 7, 7, 512) + + data = relay.var('data', shape=data_shape, dtype=data_type) + out = relay.nn.max_pool2d(data, pool_size=pool_size, strides=strides, layout=layout, padding=padding) + module = tvm.IRModule.from_expr(out) + + +Annotate and partition the graph for ACL. + +..code:: python + + from tvm.relay.op.contrib.arm_compute_lib import partition_for_arm_compute_lib + partition_for_arm_compute_lib(module) + + +Build the Relay graph. + +.. code:: python + + target = "llvm -mtriple=aarch64-linux-gnu -mattr=+neon" + with tvm.transform.PassContext(opt_level=3, disabled_pass=["AlterOpLayout"]): + json, lib, params = relay.build(module, target=target) + + +Export the module. + +.. code:: python + + lib_path = '~/lib_acl.so' + cross_compile = 'aarch64-linux-gnu-c++' + lib.export_library(lib_path, cc=cross_compile) + + +Run Inference. This must be on an Arm device. If compiling on x86 device and running on aarch64 +consider using the RPC mechanism. + +.. code:: python + + tvm.runtime.load_module('lib_acl.so') + gen_module = tvm.contrib.graph_runtime.create(json, lib, ctx) Review comment: ```python loaded_lib = tvm.runtime.load_module('lib_acl.so') gen_module = tvm.contrib.graph_runtime.GraphModule(loaded_lib['default'](ctx)) ``` P.S. Where is `ctx` define? ########## File path: tests/python/contrib/test_arm_compute_lib/infrastructure.py ########## @@ -0,0 +1,167 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from itertools import zip_longest, combinations +import json + +import tvm +from tvm import relay +from tvm import rpc +from tvm.contrib import graph_runtime +from tvm.relay.op.contrib import arm_compute_lib +from tvm.contrib import util + + +class Device: + """Adjust the following settings to connect to and use a remote device for tests.""" + use_remote = False + target = "llvm -mtriple=aarch64-linux-gnu -mattr=+neon" + # Enable cross compilation when connecting a remote device from a non-arm platform. + cross_compile = None + # cross_compile = "aarch64-linux-gnu-g++" + + def __init__(self): + """Keep remote device for lifetime of object.""" + self.device = self._get_remote() + + @classmethod + def _get_remote(cls): + """Get a remote (or local) device to use for testing.""" + if cls.use_remote: + # Here you may adjust settings to run the ACL unit tests via a remote + # device using the RPC mechanism. Use this in the case you want to compile + # an ACL module on a different machine to what you run the module on i.e. + # x86 -> AArch64. + # + # Use the following to connect directly to a remote device: + # device = rpc.connect( + # hostname="0.0.0.0", + # port=9090) + # + # Or connect via a tracker: + # device = tvm.autotvm.measure.request_remote( + # host="0.0.0.0", + # port=9090, + # device_key="device_key", + # timeout=1000) + # + # return device + raise NotImplementedError( + "Please adjust these settings to connect to your remote device.") + else: + device = rpc.LocalSession() + return device + + +def skip_runtime_test(): + """Skip test if it requires the runtime and it's not present.""" + # ACL codegen not present. + if not tvm.get_global_func("relay.ext.arm_compute_lib", True): + print("Skip because Arm Compute Library codegen is not available.") + return True + + # Remote device is in use or ACL runtime not present + if not Device.use_remote and not arm_compute_lib.is_arm_compute_runtime_enabled(): + print("Skip because runtime isn't present or a remote device isn't being used.") + return True + + +def skip_codegen_test(): + """Skip test if it requires the ACL codegen and it's not present.""" + if not tvm.get_global_func("relay.ext.arm_compute_lib", True): + print("Skip because Arm Compute Library codegen is not available.") + return True + + +def build_module(mod, target, params=None, enable_acl=True): + """Build module with option to build for ACL.""" + if isinstance(mod, tvm.relay.expr.Call): + mod = tvm.IRModule.from_expr(mod) + with tvm.transform.PassContext(opt_level=3, disabled_pass=["AlterOpLayout"]): + if enable_acl: + mod = arm_compute_lib.partition_for_arm_compute_lib(mod, params) + relay.backend.compile_engine.get().clear() + return relay.build(mod, target=target, params=params) + + +def build_and_run(mod, inputs, outputs, params, device, enable_acl=True, no_runs=1): + """Build and run the relay module.""" + graph, lib, params = build_module(mod, device.target, params, enable_acl) Review comment: ```python lib = build_module(mod, device.target, params, enable_acl) ``` ########## File path: src/runtime/contrib/arm_compute_lib/acl_allocator.cc ########## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +/*! + * \file src/runtime/contrib/arm_compute_lib/acl_allocator.cc + * \brief ACL Allocator implementation that requests memory from TVM. + */ + +#include "acl_allocator.h" + +namespace tvm { +namespace runtime { +namespace contrib { +namespace arm_compute_lib { + +void* ACLAllocator::allocate(size_t size, size_t alignment) { + CHECK_GT(size, 0) << "Cannot allocate size less than or equal to zero"; + return this->device_api_->AllocWorkspace(this->ctx_, size, {}); +} + +void ACLAllocator::free(void* ptr) { this->device_api_->FreeWorkspace(this->ctx_, ptr); } + +std::unique_ptr<arm_compute::IMemoryRegion> ACLAllocator::make_region(size_t size, + size_t alignment) { + return arm_compute::support::cpp14::make_unique<ACLMemoryRegion>(size, alignment); +} + +ACLMemoryRegion::ACLMemoryRegion(size_t size, size_t alignment) + : IMemoryRegion(size), ptr_(nullptr) { + if (size != 0) { + this->ptr_ = this->device_api_->AllocDataSpace(this->ctx_, size, alignment, {}); + } +} + +ACLMemoryRegion::ACLMemoryRegion(void* ptr, size_t size) + : IMemoryRegion(size), ptr_(nullptr), is_subregion_(true) { + if (size != 0) { + this->ptr_ = ptr; + } +} + +ACLMemoryRegion::~ACLMemoryRegion() { + if (this->ptr_ != nullptr && !is_subregion_) { + this->device_api_->FreeDataSpace(this->ctx_, this->ptr_); + } +} + +std::unique_ptr<arm_compute::IMemoryRegion> ACLMemoryRegion::extract_subregion(size_t offset, + size_t size) { + if (this->ptr_ != nullptr && (offset < _size) && (_size - offset >= size)) { + return arm_compute::support::cpp14::make_unique<ACLMemoryRegion>( Review comment: TVM has used C++14. We could use `std::make_unique` here. ########## File path: docs/deploy/arm_compute_lib.rst ########## @@ -0,0 +1,135 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Relay Arm|reg| Compute Library Integration +========================================== + +Introduction +------------ + +Arm Compute Library (ACL) is an open source project that provides accelerated kernels for Arm CPU's +and GPU's. Currently the integration offloads operators to ACL to use hand-crafted assembler +routines in the library. By offloading select operators from a relay graph to ACL we can achieve +a performance boost on such devices. + +Building with ACL support +------------------------- + +The current implementation has two separate build options in cmake. The reason for this split is +because ACL cannot be used on an x86 machine. However, we still want to be able compile an ACL +runtime module on an x86 machine. + +* USE_ARM_COMPUTE_LIB=ON/OFF - Enabling this flag will add support for compiling an ACL runtime module. +* USE_ARM_COMPUTE_LIB_GRAPH_RUNTIME=ON/OFF/path-to-acl - Enabling this flag will allow the graph runtime to + compute the ACL offloaded functions. + +These flags can be used in different scenarios depending on your setup. For example, if you want +to compile ACL on an x86 machine and then run the module on a remote Arm device via RPC, you will +need to use USE_ACL=ON on the x86 machine and USE_GRAPH_RUNTIME_ACL=ON on the remote AArch64 +device. + +Usage +----- + +*Note:* this section may not stay up-to-date with changes to the API. + +Create a relay graph. This may be a single operator or a whole graph. The intention is that any +relay graph can be input. The ACL integration will only pick supported operators to be offloaded +whilst the rest will be computed via TVM. (For this example we will use a single +max_pool2d operator). + +.. code:: python + + import tvm + from tvm import relay + + data_type = "float32" + data_shape = (1, 14, 14, 512) + strides = (2, 2) + padding = (0, 0, 0, 0) + pool_size = (2, 2) + layout = "NHWC" + output_shape = (1, 7, 7, 512) + + data = relay.var('data', shape=data_shape, dtype=data_type) + out = relay.nn.max_pool2d(data, pool_size=pool_size, strides=strides, layout=layout, padding=padding) + module = tvm.IRModule.from_expr(out) + + +Annotate and partition the graph for ACL. + +..code:: python + + from tvm.relay.op.contrib.arm_compute_lib import partition_for_arm_compute_lib + partition_for_arm_compute_lib(module) + + +Build the Relay graph. + +.. code:: python + + target = "llvm -mtriple=aarch64-linux-gnu -mattr=+neon" + with tvm.transform.PassContext(opt_level=3, disabled_pass=["AlterOpLayout"]): + json, lib, params = relay.build(module, target=target) + + +Export the module. + +.. code:: python + + lib_path = '~/lib_acl.so' + cross_compile = 'aarch64-linux-gnu-c++' + lib.export_library(lib_path, cc=cross_compile) + + +Run Inference. This must be on an Arm device. If compiling on x86 device and running on aarch64 +consider using the RPC mechanism. + +.. code:: python + + tvm.runtime.load_module('lib_acl.so') + gen_module = tvm.contrib.graph_runtime.create(json, lib, ctx) + d_data = np.random.uniform(0, 1, data_shape).astype(data_type) + map_inputs = {'data': d_data} + gen_module.map_inputs(**map_inputs) Review comment: Where is definition of `map_inputs`? `set_input`? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
