mdw-octoml commented on a change in pull request #8807:
URL: https://github.com/apache/tvm/pull/8807#discussion_r694123503



##########
File path: python/tvm/runtime/module.py
##########
@@ -30,8 +31,65 @@
 from . import _ffi_api
 
 
-# profile result of time evaluator
-ProfileResult = namedtuple("ProfileResult", ["mean", "results"])
+class BenchmarkResult:
+    """Runtimes from benchmarking"""
+
+    def __init__(self, results: Sequence[float]):
+        """Construct a new BenchmarkResult from a sequence of runtimes.
+
+        Parameters
+        ----------
+        results : Sequence[float]
+            Raw times from benchmarking
+
+        Attributes
+        ----------
+        min : float
+            Minimum runtime in seconds of all results.
+        mean : float
+            Mean runtime in seconds of all results. Note that this mean is not
+            necessarily statistically correct as it is the mean of mean
+            runtimes.
+        median : float
+            Median runtime in seconds of all results. Note that this is not 
necessarily
+            statistically correct as it is the median of mean runtimes.
+        max : float
+            Maximum runtime in seconds of all results.
+        std : float
+            Standard deviation in seconds of runtimes. Note that this is not 
necessarily
+            correct as it is the std of mean runtimes.
+        results : Sequence[float]
+            The collected runtimes (in seconds). This may be a series of mean 
runtimes if
+            the benchmark was run with `number` > 1.
+        """
+        self.results = results
+        self.mean = np.mean(self.results)
+        self.std = np.std(self.results)
+        self.median = np.median(self.results)
+        self.min = np.min(self.results)
+        self.max = np.max(self.results)
+
+    def __repr__(self):
+        return "BenchmarkResult(min={}, mean={}, median={}, max={}, std={}, 
results={})".format(
+            self.min, self.mean, self.median, self.max, self.std, self.results
+        )
+
+    def __str__(self):
+        return """Execution time summary:
+{:^12} {:^12} {:^12} {:^12} {:^12}
+{:^12.2f} {:^12.2f} {:^12.2f} {:^12.2f} {:^12.2f}

Review comment:
       2 significant digits is not really enough for models that run quite 
fast. Would recommend using .4 instead.

##########
File path: python/tvm/driver/tvmc/model.py
##########
@@ -390,29 +391,15 @@ def format_times(self):
         This has the effect of producing a small table that looks like:
         .. code-block::
             Execution time summary:
-            mean (ms)   max (ms)    min (ms)    std (ms)
-            0.14310    0.16161    0.12933    0.01004
+            mean (ms)  median (ms) max (ms)    min (ms)    std (ms)
+            0.14310      0.14310   0.16161     0.12933    0.01004
 
         Returns
         -------
         str
             A formatted string containing the statistics.
         """
-
-        # timestamps
-        mean_ts = np.mean(self.times) * 1000
-        std_ts = np.std(self.times) * 1000
-        max_ts = np.max(self.times) * 1000
-        min_ts = np.min(self.times) * 1000
-
-        header = "Execution time summary:\n{0:^10} {1:^10} {2:^10} 
{3:^10}".format(
-            "mean (ms)", "max (ms)", "min (ms)", "std (ms)"
-        )
-        stats = "{0:^10.2f} {1:^10.2f} {2:^10.2f} {3:^10.2f}".format(
-            mean_ts, max_ts, min_ts, std_ts
-        )
-
-        return "%s\n%s\n" % (header, stats)
+        return str(times)

Review comment:
       I think it's probably a bad idea to change the semantics of 
time_evaluator, which is no doubt used in many places where it's expected to 
return the raw list of times. I would recommend moving the BenchmarkResult to 
the benchmark() functions and not change time_evaluator itself.
   

##########
File path: python/tvm/runtime/module.py
##########
@@ -230,8 +288,7 @@ def evaluator(*args):
                 blob = feval(*args)
                 fmt = "@" + ("d" * repeat)
                 results = struct.unpack(fmt, blob)
-                mean = sum(results) / float(repeat)
-                return ProfileResult(mean=mean, results=results)
+                return BenchmarkResult(results)

Review comment:
       Does time_evaluator itself have a unit test that needs to be updated? If 
not, would it make sense to add a quick one?

##########
File path: tests/python/relay/test_vm.py
##########
@@ -955,5 +956,17 @@ def test_get_input_index():
     assert vm_factory.get_input_index("invalid") == -1
 
 
[email protected]_llvm
+def test_benchmark():

Review comment:
       See comment above on using unittest.patch() so you can test the 
benchmark() function directly.

##########
File path: tests/python/relay/test_backend_graph_executor.py
##########
@@ -322,5 +323,17 @@ def test_graph_executor_api():
     assert mod.get_input_index("Invalid") == -1
 
 
[email protected]_llvm
+def test_benchmark():
+    mod, params = mlp.get_workload(1)
+    lib = relay.build(mod, target="llvm", params=params)
+    exe = graph_executor.create(lib.get_graph_json(), lib.lib, tvm.cpu())
+    data = tvm.nd.array(np.random.rand(1, 1, 28, 28).astype("float32"))
+    result = exe.benchmark(tvm.cpu(), data=data, func_name="run", repeat=2, 
number=1)

Review comment:
       A better test here would be to use unittest.patch() on the 
time_evaluator function itself so it always returns deterministic times, and 
hence you can fully test the behavior of the benchmark() function by itself 
independent of the actual module or time_evaluator behavior.

##########
File path: python/tvm/contrib/graph_executor.py
##########
@@ -320,3 +320,62 @@ def __getitem__(self, key):
             The key to the module.
         """
         return self.module[key]
+
+    def benchmark(self, device, func_name="run", repeat=5, number=5, 
min_repeat_ms=None, **kwargs):
+        """Calculate runtime of a function by repeatedly calling it.
+
+        Use this function to get an accurate measurement of the runtime of a 
function. The function
+        is run multiple times in order to account for variability in 
measurements, processor speed
+        or other external factors.  Mean, median, standard deviation, min and 
max runtime are all
+        reported.  On GPUs, CUDA and ROCm specifically, special on-device 
timers are used so that
+        synchonization and data transfer operations are not counted towards 
the runtime. This allows
+        for fair comparison of runtimes across different functions and models.
+
+        The benchmarking loop looks approximately like so:
+
+        .. code-block:: python
+
+            for r in range(repeat):
+                time_start = now()
+                for n in range(number):
+                    func_name()
+                time_end = now()
+                total_times.append((time_end - time_start)/number)
+
+
+        Parameters
+        ----------
+        func_name : str
+            The function to benchmark
+
+        repeat : int
+            Number of times to run the outer loop of the timing code (see 
above). The output will
+            contain `repeat` number of datapoints.
+
+        number : int
+            Number of times to run the inner loop of the timing code. This 
inner loop is run in
+            between the timer starting and stopping. In order to amortize any 
timing overhead,
+            `number` should be increased when the runtime of the function is 
small (less than a 1/10
+            of a millisecond).
+
+        min_repeat_ms : Optional[float]
+            If set, the inner loop will be run until it takes longer than 
`min_repeat_ms`
+            milliseconds. This can be used to ensure that the function is run 
enough to get an
+            accurate measurement.
+
+        kwargs : Dict[str, Object]
+            Named arguments to the function. These are cached before running 
timing code, so that
+            data transfer costs are not counted in the runtime.
+
+        Returns
+        -------
+        timing_results : ProfileResult

Review comment:
       I think this should be BenchmarkResult.
   

##########
File path: python/tvm/runtime/module.py
##########
@@ -30,8 +31,65 @@
 from . import _ffi_api
 
 
-# profile result of time evaluator
-ProfileResult = namedtuple("ProfileResult", ["mean", "results"])
+class BenchmarkResult:
+    """Runtimes from benchmarking"""
+
+    def __init__(self, results: Sequence[float]):
+        """Construct a new BenchmarkResult from a sequence of runtimes.
+
+        Parameters
+        ----------
+        results : Sequence[float]
+            Raw times from benchmarking
+
+        Attributes
+        ----------
+        min : float
+            Minimum runtime in seconds of all results.
+        mean : float
+            Mean runtime in seconds of all results. Note that this mean is not
+            necessarily statistically correct as it is the mean of mean
+            runtimes.
+        median : float
+            Median runtime in seconds of all results. Note that this is not 
necessarily
+            statistically correct as it is the median of mean runtimes.
+        max : float
+            Maximum runtime in seconds of all results.
+        std : float
+            Standard deviation in seconds of runtimes. Note that this is not 
necessarily
+            correct as it is the std of mean runtimes.
+        results : Sequence[float]
+            The collected runtimes (in seconds). This may be a series of mean 
runtimes if

Review comment:
       I think this needs more explanation. Currently a BenchmarkResult object 
contains no information on the benchmark parameters that were used, and it 
would be best (IMHO) to avoid there being surprises in terms of the 
interpretation of the results based on how the object was created. My 
recommendation would be to either fully document the behavior of what it means 
to benchmark with 'number > 1' or ensure that the BenchmarkResult object itself 
contains the benchmark parameters used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to