mdw-octoml commented on a change in pull request #8807:
URL: https://github.com/apache/tvm/pull/8807#discussion_r694123503
##########
File path: python/tvm/runtime/module.py
##########
@@ -30,8 +31,65 @@
from . import _ffi_api
-# profile result of time evaluator
-ProfileResult = namedtuple("ProfileResult", ["mean", "results"])
+class BenchmarkResult:
+ """Runtimes from benchmarking"""
+
+ def __init__(self, results: Sequence[float]):
+ """Construct a new BenchmarkResult from a sequence of runtimes.
+
+ Parameters
+ ----------
+ results : Sequence[float]
+ Raw times from benchmarking
+
+ Attributes
+ ----------
+ min : float
+ Minimum runtime in seconds of all results.
+ mean : float
+ Mean runtime in seconds of all results. Note that this mean is not
+ necessarily statistically correct as it is the mean of mean
+ runtimes.
+ median : float
+ Median runtime in seconds of all results. Note that this is not
necessarily
+ statistically correct as it is the median of mean runtimes.
+ max : float
+ Maximum runtime in seconds of all results.
+ std : float
+ Standard deviation in seconds of runtimes. Note that this is not
necessarily
+ correct as it is the std of mean runtimes.
+ results : Sequence[float]
+ The collected runtimes (in seconds). This may be a series of mean
runtimes if
+ the benchmark was run with `number` > 1.
+ """
+ self.results = results
+ self.mean = np.mean(self.results)
+ self.std = np.std(self.results)
+ self.median = np.median(self.results)
+ self.min = np.min(self.results)
+ self.max = np.max(self.results)
+
+ def __repr__(self):
+ return "BenchmarkResult(min={}, mean={}, median={}, max={}, std={},
results={})".format(
+ self.min, self.mean, self.median, self.max, self.std, self.results
+ )
+
+ def __str__(self):
+ return """Execution time summary:
+{:^12} {:^12} {:^12} {:^12} {:^12}
+{:^12.2f} {:^12.2f} {:^12.2f} {:^12.2f} {:^12.2f}
Review comment:
2 significant digits is not really enough for models that run quite
fast. Would recommend using .4 instead.
##########
File path: python/tvm/driver/tvmc/model.py
##########
@@ -390,29 +391,15 @@ def format_times(self):
This has the effect of producing a small table that looks like:
.. code-block::
Execution time summary:
- mean (ms) max (ms) min (ms) std (ms)
- 0.14310 0.16161 0.12933 0.01004
+ mean (ms) median (ms) max (ms) min (ms) std (ms)
+ 0.14310 0.14310 0.16161 0.12933 0.01004
Returns
-------
str
A formatted string containing the statistics.
"""
-
- # timestamps
- mean_ts = np.mean(self.times) * 1000
- std_ts = np.std(self.times) * 1000
- max_ts = np.max(self.times) * 1000
- min_ts = np.min(self.times) * 1000
-
- header = "Execution time summary:\n{0:^10} {1:^10} {2:^10}
{3:^10}".format(
- "mean (ms)", "max (ms)", "min (ms)", "std (ms)"
- )
- stats = "{0:^10.2f} {1:^10.2f} {2:^10.2f} {3:^10.2f}".format(
- mean_ts, max_ts, min_ts, std_ts
- )
-
- return "%s\n%s\n" % (header, stats)
+ return str(times)
Review comment:
I think it's probably a bad idea to change the semantics of
time_evaluator, which is no doubt used in many places where it's expected to
return the raw list of times. I would recommend moving the BenchmarkResult to
the benchmark() functions and not change time_evaluator itself.
##########
File path: python/tvm/runtime/module.py
##########
@@ -230,8 +288,7 @@ def evaluator(*args):
blob = feval(*args)
fmt = "@" + ("d" * repeat)
results = struct.unpack(fmt, blob)
- mean = sum(results) / float(repeat)
- return ProfileResult(mean=mean, results=results)
+ return BenchmarkResult(results)
Review comment:
Does time_evaluator itself have a unit test that needs to be updated? If
not, would it make sense to add a quick one?
##########
File path: tests/python/relay/test_vm.py
##########
@@ -955,5 +956,17 @@ def test_get_input_index():
assert vm_factory.get_input_index("invalid") == -1
[email protected]_llvm
+def test_benchmark():
Review comment:
See comment above on using unittest.patch() so you can test the
benchmark() function directly.
##########
File path: tests/python/relay/test_backend_graph_executor.py
##########
@@ -322,5 +323,17 @@ def test_graph_executor_api():
assert mod.get_input_index("Invalid") == -1
[email protected]_llvm
+def test_benchmark():
+ mod, params = mlp.get_workload(1)
+ lib = relay.build(mod, target="llvm", params=params)
+ exe = graph_executor.create(lib.get_graph_json(), lib.lib, tvm.cpu())
+ data = tvm.nd.array(np.random.rand(1, 1, 28, 28).astype("float32"))
+ result = exe.benchmark(tvm.cpu(), data=data, func_name="run", repeat=2,
number=1)
Review comment:
A better test here would be to use unittest.patch() on the
time_evaluator function itself so it always returns deterministic times, and
hence you can fully test the behavior of the benchmark() function by itself
independent of the actual module or time_evaluator behavior.
##########
File path: python/tvm/contrib/graph_executor.py
##########
@@ -320,3 +320,62 @@ def __getitem__(self, key):
The key to the module.
"""
return self.module[key]
+
+ def benchmark(self, device, func_name="run", repeat=5, number=5,
min_repeat_ms=None, **kwargs):
+ """Calculate runtime of a function by repeatedly calling it.
+
+ Use this function to get an accurate measurement of the runtime of a
function. The function
+ is run multiple times in order to account for variability in
measurements, processor speed
+ or other external factors. Mean, median, standard deviation, min and
max runtime are all
+ reported. On GPUs, CUDA and ROCm specifically, special on-device
timers are used so that
+ synchonization and data transfer operations are not counted towards
the runtime. This allows
+ for fair comparison of runtimes across different functions and models.
+
+ The benchmarking loop looks approximately like so:
+
+ .. code-block:: python
+
+ for r in range(repeat):
+ time_start = now()
+ for n in range(number):
+ func_name()
+ time_end = now()
+ total_times.append((time_end - time_start)/number)
+
+
+ Parameters
+ ----------
+ func_name : str
+ The function to benchmark
+
+ repeat : int
+ Number of times to run the outer loop of the timing code (see
above). The output will
+ contain `repeat` number of datapoints.
+
+ number : int
+ Number of times to run the inner loop of the timing code. This
inner loop is run in
+ between the timer starting and stopping. In order to amortize any
timing overhead,
+ `number` should be increased when the runtime of the function is
small (less than a 1/10
+ of a millisecond).
+
+ min_repeat_ms : Optional[float]
+ If set, the inner loop will be run until it takes longer than
`min_repeat_ms`
+ milliseconds. This can be used to ensure that the function is run
enough to get an
+ accurate measurement.
+
+ kwargs : Dict[str, Object]
+ Named arguments to the function. These are cached before running
timing code, so that
+ data transfer costs are not counted in the runtime.
+
+ Returns
+ -------
+ timing_results : ProfileResult
Review comment:
I think this should be BenchmarkResult.
##########
File path: python/tvm/runtime/module.py
##########
@@ -30,8 +31,65 @@
from . import _ffi_api
-# profile result of time evaluator
-ProfileResult = namedtuple("ProfileResult", ["mean", "results"])
+class BenchmarkResult:
+ """Runtimes from benchmarking"""
+
+ def __init__(self, results: Sequence[float]):
+ """Construct a new BenchmarkResult from a sequence of runtimes.
+
+ Parameters
+ ----------
+ results : Sequence[float]
+ Raw times from benchmarking
+
+ Attributes
+ ----------
+ min : float
+ Minimum runtime in seconds of all results.
+ mean : float
+ Mean runtime in seconds of all results. Note that this mean is not
+ necessarily statistically correct as it is the mean of mean
+ runtimes.
+ median : float
+ Median runtime in seconds of all results. Note that this is not
necessarily
+ statistically correct as it is the median of mean runtimes.
+ max : float
+ Maximum runtime in seconds of all results.
+ std : float
+ Standard deviation in seconds of runtimes. Note that this is not
necessarily
+ correct as it is the std of mean runtimes.
+ results : Sequence[float]
+ The collected runtimes (in seconds). This may be a series of mean
runtimes if
Review comment:
I think this needs more explanation. Currently a BenchmarkResult object
contains no information on the benchmark parameters that were used, and it
would be best (IMHO) to avoid there being surprises in terms of the
interpretation of the results based on how the object was created. My
recommendation would be to either fully document the behavior of what it means
to benchmark with 'number > 1' or ensure that the BenchmarkResult object itself
contains the benchmark parameters used.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]