[GitHub] [tvm] yelite commented on a diff in pull request #12914: [MetaSchedule] Add Script for TorchBench Model Tuning & Benchmarking

GitBox Wed, 28 Sep 2022 13:51:31 -0700


yelite commented on code in PR #12914:
URL: https://github.com/apache/tvm/pull/12914#discussion_r982850617



##########
python/tvm/meta_schedule/testing/torchbench/run.py:
##########
@@ -0,0 +1,591 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+This script is for benchmarking TVM performance on models from TorchBench.
+It uses the TorchDynamo as the frontend to ingest models into TVM, and it also
+leverages the benchmark util from TorchDynamo.
+
+TorchDynamo (https://github.com/pytorch/torchdynamo) and TorchBench
+(https://github.com/pytorch/benchmark) need to be in the parent directory of 
TVM.
+We need a local clone of these repos because torchbench and the benchmark 
runner
+in TorchDynamo isn't designed to be used as a Python package.
+
+To setup the environment, run the following commands in the parent directory 
of TVM and with 
+the appropriate Python environment:
+```bash
+# torchdynamo requires nightly pytorch. If it fails to find the specified 
version, try
+# installing the latest nightly pytorch.
+pip3 install --pre \
+    --extra-index-url https://download.pytorch.org/whl/nightly/cu116 \
+    torch==1.13.0.dev20220926 \
+    torchvision==0.14.0.dev20220926 \
+    torchtext==0.14.0.dev20220926
+
+git clone https://github.com/pytorch/torchdynamo
+pushd torchdynamo
+git checkout c537639f9712621dc04ca09908796dbbe86c354b
+pip install -e .
+popd
+
+sudo apt install git-lfs  # git lfs is used for TorchBench
+git clone https://github.com/pytorch/benchmark
+pushd benchmark
+python install.py --continue_on_fail  # fambench_xlmr might fail to install
+popd
+```
+
+To run a benchmark, the script can be run under 'tune' mode by
+```bash
+python python/tvm/meta_schedule/testing/torchbench/run.py \
+    --mode tune \
+    --model resnet50 \
+    --target "nvidia/geforce-rtx-3070" \
+    --work-dir ../workdir \
+    --num-trials 20000 \
+    --rpc-host <rpc tracker host for tuning> \
+    --rpc-port <rpc tracker port for tuning> \
+    --rpc-key <rpc key> \
+```
+
+All available target tags (like nvidia/geforce-rtx-3070) can be found at
+https://github.com/apache/tvm/blob/main/src/target/tag.cc
+
+Then the script can be run under 'eval' mode to actual benchmark the 
performance,
+using the tuning database under the work directory. This can be executed on a 
different
+machine than the one executes tuning (the database json files need to be inside
+of the work directory).
+```bash
+python python/tvm/meta_schedule/testing/torchbench/run.py \
+    --mode eval \
+    --model resnet50 \
+    --target "nvidia/geforce-rtx-3070" \
+    --work-dir ../workdir \
+    --num-trials 0 
+```
+
+Alternatively, both tuning and evaluation can be done in a single run on the 
same machine,
+by
+```bash
+python python/tvm/meta_schedule/testing/torchbench/run.py \
+    --mode all \
+    --model resnet50 \
+    --target "llvm -num-cores 6" \
+    --work-dir ../workdir \
+    --num-trials 0
+```
+"""
+
+import argparse
+import functools
+import logging
+import warnings
+from enum import Enum
+from typing import Callable, List, Tuple
+
+import numpy as np  # type: ignore
+import torch  # type: ignore
+from scipy.stats import ttest_ind  # type: ignore
+
+import tvm
+import tvm.relay
+from tvm import meta_schedule as ms
+from tvm.contrib.graph_executor import GraphModule
+from tvm.meta_schedule.testing.torchbench.utils import (
+    load_torchdynamo_benchmark_runner,
+    same,
+    timed,
+)
+from tvm.runtime.vm import VirtualMachine
+from tvm.support import describe
+
+
+class RunMode(Enum):
+    """
+    The running mode of this script. Available values are:
+    - tune: Only tune the model and create the tuning database.
+    - eval: Only benchmark model using pre-existing tuning database.
+    - all: Run both tuning and benchmark
+    """
+
+    ALL = "all"
+    TUNE = "tune"
+    EVAL = "eval"
+
+    @property
+    def should_tune(self):
+        """
+        Returns whether it should tune the model.
+        """
+        return self != RunMode.EVAL
+
+    @property
+    def should_eval(self):
+        """
+        Returns whether it should actually benchmark the model.
+        """
+        return self != RunMode.TUNE
+
+
+class ResultComparisonMetric(Enum):
+    """
+    This changes how it compares the resultl with the expected value during
+    accuracy check.
+    - cosine: Use the cosine similarity. It should be greater than 0.99.
+    - allclose-1e-4: Use the max element-wise absolute difference. It should 
be less than 1e-4.
+    """
+
+    COSINE = "cosine"
+    ALLCLOSE = "allclose-1e-4"
+
+
+def parse_args():
+    args = argparse.ArgumentParser()
+
+    args.add_argument(
+        "--mode",
+        type=RunMode,
+        default=RunMode.ALL,
+        help=RunMode.__doc__,
+    )
+    args.add_argument(
+        "--batch-size",
+        type=int,
+        default=None,
+        help="The batch size of model input. Use TorchBench's default value if 
not specified.",
+    )
+    args.add_argument(
+        "--result-metric",
+        type=ResultComparisonMetric,
+        default=ResultComparisonMetric.ALLCLOSE,
+        help=ResultComparisonMetric.__doc__,
+    )
+    args.add_argument(
+        "--benchmark-repeat",
+        type=int,
+        default=10,
+        help="The number of times to repeat the benchmark measurement.",
+    )
+    args.add_argument(
+        "--benchmark-warmup-rounds",
+        type=int,
+        default=5,
+        help="The number of rounds to warmup before starting to measure the 
performance.",
+    )
+
+    # Model selection
+    args.add_argument(
+        "--model",
+        type=str,
+        required=True,
+        help="""
+        The name of model to run. It should a directory name under 
+        https://github.com/pytorch/benchmark/tree/main/torchbenchmark/models.
+        """,
+    )
+
+    # Tuning-related config
+    args.add_argument(
+        "--target",
+        type=tvm.target.Target,
+        required=True,
+        help="The target to tune and run benchmark for.",
+    )
+    args.add_argument(
+        "--work-dir",
+        type=str,
+        required=True,
+        help="The working directory to save intermediate results and store 
databases for compilation.",
+    )
+    args.add_argument(
+        "--cache-dir",
+        type=str,
+        default=None,
+        help="""
+        The directory to cache the generated network.
+        If not specified, the cache will be disabled.
+        """,
+    )
+    args.add_argument(
+        "--num-trials",
+        type=int,
+        required=True,
+        help="The max number of trials to run MetaSchedule.",
+    )
+    args.add_argument(
+        "--max-trials-per-task",
+        type=int,
+        default=None,
+        help="""
+        The max number of trials to run per task extracted in MetaSchedule. 
+        By default it's the same as --num-trials.
+        """,
+    )
+    args.add_argument(
+        "--backend",
+        type=str,
+        choices=["graph", "vm"],
+        default="graph",
+        help="The backend to use for relay compilation(graph / vm).",
+    )
+    # TODO(@yelite): Add a layout arg to transform the network after
+    # ingesting into Relay and before feeding into MetaSchedule.
+
+    # Evaluator-related config
+    args.add_argument(
+        "--number",
+        type=int,
+        default=3,
+        help="The number of times to run the model for taking average in a 
single measurement.",
+    )
+    args.add_argument(
+        "--repeat",
+        type=int,
+        default=1,
+        help="The number of times to repeat the measurement.",
+    )
+    args.add_argument(
+        "--min-repeat-ms",
+        type=int,
+        default=100,
+        help="""
+        Minimum repeat time in ms. The number of runs will be increased if the 
actual
+        repeat time is lowered than this.
+        """,
+    )
+    args.add_argument(
+        "--adaptive-training",
+        action="store_true",
+        help="Whether to use adpative training for cost model.",
+    )
+    args.add_argument(
+        "--cpu-flush",
+        action="store_true",
+        help="Whether to perform CPU cache flush.",
+    )
+
+    # RPC-related args
+    args.add_argument(
+        "--rpc-host",
+        type=str,
+        help="Host of the RPC Tracker for tuning. Use LocalRunner if not 
provided",
+    )
+    args.add_argument(
+        "--rpc-port",
+        type=int,
+        help="Port of the RPC Tracker for tuning",
+    )
+    args.add_argument(
+        "--rpc-key",
+        type=str,
+        help="Key of the RPC Tracker for tuning",
+    )
+
+    parsed = args.parse_args()
+    return parsed
+
+
+logging.basicConfig(
+    format="%(asctime)s.%(msecs)03d %(levelname)s %(message)s", 
datefmt="%Y-%m-%d %H:%M:%S"
+)
+logging.getLogger("tvm.meta_schedule").setLevel(logging.DEBUG)
+ARGS = parse_args()
+IS_CUDA = ARGS.target.kind.name == "cuda"
+
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.INFO)
+
+
+runner = load_torchdynamo_benchmark_runner(
+    IS_CUDA, cosine_similarity=ARGS.result_comparison_metric == 
ResultComparisonMetric.COSINE
+)
+import torchdynamo  # type: ignore

Review Comment:
   It shouldn't be here. I moved it up. Thanks for catching this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] yelite commented on a diff in pull request #12914: [MetaSchedule] Add Script for TorchBench Model Tuning & Benchmarking

Reply via email to