[tvm-rfcs] branch main updated: [UnitTests] Parametrized Unit Tests (#7)

areusch Tue, 31 Aug 2021 01:26:49 -0700

This is an automated email from the ASF dual-hosted git repository.

areusch pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-rfcs.git



The following commit(s) were added to refs/heads/main by this push:
     new fd98681  [UnitTests] Parametrized Unit Tests (#7)
fd98681 is described below

commit fd98681b91205ef9cf1d933897a83b2f5fd336fa
Author: Lunderberg <[email protected]>
AuthorDate: Tue Aug 31 03:26:42 2021 -0500

    [UnitTests] Parametrized Unit Tests (#7)
    
    * Initial RFC commit
    
    * Update RFC with PR number and link to PR, now that the PR is open.
    
    * [0007] Added reference to use of `copy.deepcopy`.
    
    * [0007] Removed changes to .gitignore
    
    Following comments, I agree that it doesn't belong in this PR, and may
    include in a separate PR.
    
    * [0007] Updated per @areusch's suggestions.
---
 rfcs/0007-parametrized-unit-tests.md | 570 +++++++++++++++++++++++++++++++++++
 1 file changed, 570 insertions(+)

diff --git a/rfcs/0007-parametrized-unit-tests.md 
b/rfcs/0007-parametrized-unit-tests.md
new file mode 100644
index 0000000..975c751
--- /dev/null
+++ b/rfcs/0007-parametrized-unit-tests.md
@@ -0,0 +1,570 @@
+- Feature Name: Parametrized Unit Tests
+- Start Date: 2021-05-10(fill me in with today's date, YYYY-MM-DD)
+- RFC PR: [apache/tvm-rfcs#0007](https://github.com/apache/tvm-rfcs/pull/0007)
+- GitHub PR: [apache/tvm#8010](https://github.com/apache/tvm/issues/8010)
+
+# Summary
+[summary]: #summary
+
+This RFC documents how to implement unit tests that depend on input
+parameters, or have setup that depends on input parameters.
+
+# Motivation
+[motivation]: #motivation
+
+Some unit tests should be tested along a variety of parameters for
+better coverage.  For example, a unit test that does not depend on
+target-specific features could be tested on all targets that the test
+platform supports.  Alternatively, a unit test may need to pass
+different array sizes to a function, in order to exercise different
+code paths within that function.
+
+The simplest implementation would be to write a test function that
+internally loops over all parameters and throws an exception when the
+test fails.  However, this does not give full information to a
+developer, because `pytest` does not necessarily include the parameter
+in the test report.  Even when it does, the value will be printed in a
+different location depending on how the internal loop is written.  A
+unit-test that fails for all targets requires different debugging than
+a unit-test that fails on a single specific target, and so this
+information should be exposed.
+
+This RFC adds functionality for implementing parameterized unit tests,
+such that each set of parameters appears as a separate test result in
+the final output.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+## Parameters
+
+Before you can use a parameter in a test case, you need to register it
+with `pytest`.  Do this with the `tvm.testing.parameter` function.
+For example, the following will define a parameter named `array_size`
+that has three possible values.  This can appear either at global
+scope inside a test module to be usable by all test functions in that
+module, or in a directory's `conftest.py` to be usable by all tests in
+that directory.
+
+```python
+array_size = tvm.testing.parameter(8, 256, 1024)
+```
+
+To use a parameter, define a test function that accepts the parameter
+as an input, using the same argument name as was used above in the
+parameter registration.  This test will be run once for each value of
+the parameter.  For example, the `test_function` below would be run
+three times, each time with a different value of `array_size`
+according to the earlier definition.  These would show up in the
+output report as `test_function[8]`, `test_function[256]`, and
+`test_function[1024]`, with the name of the parameter as part of the
+function.
+
+```python
+def test_function(array_size):
+    input_array = np.random.uniform(size=array_size)
+    # Test code here
+```
+
+If a parameter is used by a test function, but isn't declared as a
+function argument, it will produce a `NameError` when accessed.  This
+happens even if the parameter is defined at module scope, and would
+otherwise be accessible by the usual scoping rules.  This is
+intentional, as access of the global variable would otherwise access
+an `array_size` function definition, rather than the specific
+parameter value.
+
+```python
+def test_function_broken():
+    # Throws NameError, undefined variable "array_size"
+    input_array = np.random.uniform(size=array_size)
+    # Test code here
+```
+
+By default, a test function that accepts multiple parameters as
+arguments will be run for all combinations of values of those
+parameters.  If only some combinations of parameters should be used,
+the `tvm.testing.parameters` function can be used to simultaneously
+define multiple parameters.  A test function that accepts parameters
+that were defined through `tvm.testing.parameters` will only be called
+once for each set of parameters.
+
+```python
+array_size = tvm.testing.parameter(8, 256, 1024)
+dtype = tvm.testing.parameter('float32', 'int32')
+
+# Called 6 times, once for each combination of array_size and dtype.
+def test_function1(array_size, dtype):
+    assert(True)
+
+test_data, reference_result = tvm.testing.parameters(
+    ('test_data_1.dat', 'result_1.txt'),
+    ('test_data_2.dat', 'result_2.txt'),
+    ('test_data_3.dat', 'result_3.txt'),
+)
+
+# Called 3 times, once for each (test_data, reference_result) tuple.
+def test_function3(test_data, reference_result):
+    assert(True)
+```
+
+## Fixtures
+
+Fixtures in pytest separate setup code from test code, and are used
+for two primary purposes.  The first is for improved readability when
+debugging, so that a failure in the setup is distinguishable from a
+failure in the test.  The second is to avoid performing expensive test
+setup that is shared across multiple tests, letting the test suite run
+faster.
+
+For example, the following function first reads test data, and then
+performs tests that use the test data.
+
+```python
+# test_function_old() calls read_test_data().  If read_test_data()
+# throws an error, test_function_old() shows as a failed test.
+
+def test_function_old():
+    dataset = read_test_data()
+    assert(True) # Test succeeds
+```
+
+This can be pulled out into a separate setup function, which the test
+function then accepts as an argument.  In this usage, this is
+equivalent to using a bare `@pytest.fixture` decorator.  By default,
+the fixture value is recalculated for every test function, to minimize
+the potential for interaction between unit tests.
+
+```python
[email protected]
+def dataset():
+    print('Prints once for each test function that uses dataset.')
+    return read_test_data()
+
+# test_function_new() accepts the dataset fixture.  If
+# read_test_data() throws an error, test_function_new() shows
+# as unrunnable.
+def test_function_new(dataset):
+    assert(True) # Test succeeds
+```
+
+If the fixture is more expensive to calculate, then it may be worth
+caching the computed fixture.  This is done with the
+`cache_return_value=True` argument.
+
+```python
[email protected](cache_return_value = True)
+def dataset():
+    print('Prints once no matter how many test functions use dataset.')
+    return download_test_data()
+
+def test_function(dataset):
+    assert(True) # Test succeeds
+```
+
+The caching can be disabled entirely by setting the environment
+variable `TVM_TEST_DISABLE_CACHE` to a non-zero integer.  This can be
+useful to re-run tests that failed, to check whether the failure is
+due to modification/re-use of a cached value.
+
+A fixture can also depend on parameters or on other fixtures.  This is
+defined by accepting additional parameters.  For example, consider the
+following test function.  In this example, the calculation of
+`correct_output` depends on the test data, and the `schedule` depends
+on some block size.  The `generate_output` function contains the
+functionality to be tested.
+
+```python
+def test_function_old():
+    dataset = download_test_data()
+    correct_output = calculate_correct_output(dataset)
+    for block_size in [8, 256, 1024]:
+        schedule = setup_schedule(block_size)
+        output = generate_output(dataset, schedule)
+        tvm.testing.assert_allclose(output, correct_output)
+```
+
+These can be split out into separate parameters and fixtures to
+isolate the functionality to be tested.  Whether to split out the
+setup code, and whether to cache it is dependent on the test function,
+how expensive the setup is to perform the setup, whether other tests
+can share the same setup code, and so on.
+
+```python
[email protected](cache_return_value = True)
+def dataset():
+    return download_test_data()
+    
[email protected]
+def correct_output(dataset):
+    return calculate_correct_output(dataset)
+    
+array_size = tvm.testing.parameter(8, 256, 1024)
+
[email protected]
+def schedule(array_size):
+    return setup_schedule(array_size)
+    
+def test_function_new(dataset, correct_output, schedule):
+    output = generate_output(dataset, schedule)
+    tvm.testing.assert_allclose(output, correct_output)
+```
+
+## Target/Device Parametrization
+
+The global TVM test configuration contains definitions for `target`
+and `dev`, which can be accepted as input by any test function.  These
+replace the previous use of `tvm.testing.enabled_targets()`.
+
+```python
+def test_function_old():
+    for target, dev in tvm.testing.enabled_targets():
+        assert(True) # Target-based test
+        
+def test_function_new(target, dev):
+    assert(True) # Target-based test
+```
+
+The parametrized values of `target` are read from the environment
+variable `TVM_TEST_TARGETS`, a semicolon-separated list of targets.
+If `TVM_TEST_TARGETS` is not defined, the target list falls back to
+`tvm.testing.DEFAULT_TEST_TARGETS`.  All parametrized targets have
+appropriate markers for checking device capability
+(e.g. `@tvm.testing.uses_gpu`).  If a platform cannot run a test, it
+is explicitly listed as being skipped.
+
+It is expected both that enabling unit tests across additional targets
+may uncover several unit tests failures, and that some unit tests may
+fail during the early implementation of supporting a new runtime or
+hardware.  In these cases, the `@tvm.testing.known_failing_targets`
+decorator can be used.  This marks a test with `pytest.xfail`,
+allowing the test suite to pass.  This is intended for cases where an
+implementation will be improved in the future.
+
+```python
[email protected]_failing_targets("my_newly_implemented_target")
+def test_function(target, dev):
+    # Test fails on new target, but marking as xfail allows CI suite
+    # to pass during development.
+    assert(target != "my_newly_implemented_target")
+```
+
+If a test should be run over a most targets, but isn't applicable for
+some particular targets, the test should be marked with
+`@tvm.testing.exclude_targets`.  For example, a test that exercises
+GPU capabilities may wish to be run against all targets except for
+`llvm`.
+
+```python
[email protected]_targets("llvm")
+def test_gpu_functionality(target, dev):
+    # Test isn't run on llvm, is excluded from the report entirely.
+    assert(target != "llvm")
+```
+
+If a testing should be run over only a specific set of targets and
+devices, the `@tvm.testing.parametrize_targets` decorator can be used.
+It is intended for use where a test is applicable only to a specific
+target, and is inapplicable to any others (e.g. verifying
+target-specific assembly code matches known assembly code).  In most
+circumstances, `@tvm.testing.exclude_targets` or
+`@tvm.testing.known_failing_targets` should be used instead.  For
+example, a test that verifies vulkan-specific code generation should
+be marked with `@tvm.testing.parametrize_targets("vulkan")`.
+
+```python
[email protected]_targets("vulkan")
+def test_vulkan_codegen(target):
+    f = tvm.build(..., target)
+    assembly = f.imported_modules[0].get_source()
+    assert("%v4bool = OpTypeVector %bool 4" in assembly)
+```
+
+The bare decorator `@tvm.testing.parametrize_targets` is maintained
+for backwards compatibility, but is no longer the preferred style.
+
+## Running Test Subsets
+
+Individual python test files are no longer executable outside of the
+pytest framework.  To maintain the existing behavior of running the
+tests defined in a particular file, the following change should be
+made.
+
+```python
+# Before
+if __name__=='__main__':
+    test_function_1()
+    test_function_2()
+    ...
+    
+# After
+if __name__=='__main__':
+    sys.exit(pytest.main(sys.argv))
+```
+
+Alternatively, single files, single tests, or single parameterizations
+of tests can be explicitly specified when calling pytest.
+
+```bash
+# Run all tests in a file
+python3 -mpytest path_to_my_test_file.py
+
+# Run all parameterizations of a single test
+python3 -mpytest path_to_my_test_file.py::test_function_name
+
+# Run a single parameterization of a single test.  The brackets should
+# contain the parameters as listed in the pytest verbose output.
+python3 -mpytest 'path_to_my_test_file.py::test_function_name[1024]'
+```
+
+
+## Cache-Related Debugging
+
+If test failure is suspected to be due to multiple tests having access
+to the same cached value, the source of the cross-talk can be narrowed
+down with the following steps.
+
+1. Test with `TVM_TEST_DISABLE_CACHE=1`.  If the error stops, then the
+   issue is due to some cache-related cross-talk.
+    
+2. Reduce the number of parameters being used for a single unit test,
+   overriding the global parameter definition by marking it with
+   `@pytest.mark.parametrize`.  If the error stops, then the issue is
+   due to cross-talk between different parametrizations of a single
+   test.
+   
+3. Run a single test function using `python3 -mpytest
+   path/to/my/test_file.py::test_my_test_case`.  If the error stops,
+   then the issue is due to cross-talk between the failing unit test
+   and some other unit test in the same file.
+   
+   1. If it is due to cross-talk between multiple unit tests, run the
+      failing unit test alongside each other unit test in the same
+      file that makes use of the cached fixture.  This is the same
+      command-line as above, but passing multiple test cases as
+      arguments.  If the error stops when run with a particular unit
+      test, then that test is the one that is modifying the cached
+      fixture.
+   
+4. Run a single test function on its own, with a single
+   parametrization, using `python3 -mpytest
+   path/to/my/test_file.py::test_my_test_case[parameter_value]`.  If
+   the error still occurs, and is still avoided by using
+   `TVM_TEST_DISABLE_CACHE=1`, then the error is in
+   `tvm.testing._fixture_cache`.
+
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+Both `tvm.testing.parameter` and `tvm.testing.fixture` are implemented
+on top of `pytest.fixture`.  A call to `tvm.testing.parameter` defines
+a fixture that takes specific values.  The following two definitions
+of `array_size` are equivalent.
+
+```python
+# With new functionality
+array_size = tvm.testing.parameter(8, 256, 1024)
+
+# With vanilla pytest functionality
[email protected](params=[8, 256, 1024])
+def array_size(request):
+    return request.param
+```
+
+The `@tvm.testing.fixture` without any arguments is equivalent to the
+`@pytest.fixture` without any arguments.
+
+```python
[email protected]
+def test_data(array_size):
+    return np.random.uniform(size=array_size)
+    
[email protected]
+def test_data(array_size):
+    return np.random.uniform(size=array_size)
+```
+
+The `@tvm.testing.fixture(cached_return_value=True)` does not have a
+direct analog in vanilla pytest.  While pytest does allow for re-use
+of fixtures between functions, it only ever maintains [a single cached
+value of each
+fixture](https://docs.pytest.org/en/6.2.x/fixture.html#fixture-scopes).
+This works in cases where only a single cached value is required, but
+causes repeated calls to setup code if a test requires multiple
+different cached values.  This can be reduced by careful ordering of
+the pytest fixture scopes, but cannot be completely eliminated.  The
+different possible cache usage in vanilla pytest, and with
+`tvm.testing.fixture` are shown below.
+
+```python
+# Possible ordering of tests if `target` is defined in a tighter scope
+# than `array_size`.  The call to `generate_setup2` is repeated.
+for array_size in array_sizes:
+    setup1 = generate_setup1(array_size)
+    for target in targets:
+        setup2 = generate_setup2(target)
+        run_test(setup1, setup2)
+        
+# Possible ordering of tests if `target` is defined in a tighter scope
+# than `array_size`.  The call to `generate_setup2` is repeated. 
+for target in targets:
+    setup2 = generate_setup2(target)
+    for array_size in array_sizes:
+        setup1 = generate_setup1(array_size)
+        run_test(setup1, setup2)
+        
+# Pseudo-code equivalent of `tvm.testing.fixture(cache_return_value=True)`.  
+# No repeated calls to setup code.
+cache_setup1 = {}
+cache_setup2 = {}
+for array_size in array_sizes:
+    for target in targets:
+        if array_size in cache_setup1:
+            setup1 = cache_setup1[array_size]
+        else:
+            setup1 = cache_setup1[array] = generate_setup1(array_size)
+
+        if target in cache_setup2:
+            setup2 = cache_setup2[target]
+        else:
+            setup2 = cache_setup2[target] = generate_setup2(target)
+
+        run_test(setup1, setup2)
+
+del cache_setup1
+del cache_setup2
+```
+
+The cache for a fixture defined with `tvm.testing.fixture` is cleared
+after all tests using that fixture are completed, to avoid excessive
+memory usage.
+
+If a test function is marked with `@pytest.mark.parametrize` for a
+parameter that is also defined with `tvm.testing.parameter`, the test
+function uses only the parameters in `@pytest.mark.parametrize`.  This
+allows an individual function to override the parameter definitions if
+needed.  Any parameter-dependent fixture are also determined based on
+the values in `@pytest.mark.parametrize`.
+
+# Drawbacks
+[drawbacks]: #drawbacks
+
+- This makes the individual unit tests be more dependent on the test
+  framework and setup.  Incorrect setup may result in confusing test
+  results.
+
+- Caching setup between different tests introduces potential
+  cross-talk between tests.  While this risk is also present when
+  looping over parameter values, separating cached values out into
+  fixtures hides that potential cross-talk.
+
+# Rationale and alternatives
+[rationale-and-alternatives]: #rationale-and-alternatives
+
+- Option: Explicitly loop over parameter values or
+  `tvm.testing.enabled_parameters` in the test function.  (Most common
+  previous usage.)
+  
+  - Pros:
+    - Explicit at the definition of a test function.
+    
+  - Cons:
+    - Requires opt-in at each test functions.
+    - Doesn't report information on which parameter value(s) failed.
+    
+    
+- Option: Use `@tvm.testing.parametrize_targets` as a bare fixture.
+  (Previously implemented behavior, less common usage.)
+
+  - Pros:
+    - Explicit at the definition of a test function.
+    
+  - Cons:
+    - Requires opt-in at each test function.
+    - Doesn't provide functionality for shared setup.
+    
+
+- Option: Pararametrize using `@pytest.mark.parametrize` rather than
+  `tvm.testing.parameter`.
+  
+  - Pros:
+    - Would explicitly show the parameter values next to the function
+      it applies to.
+      
+  - Cons:
+    - Must be explicitly added at each test function definition.
+    - Extending the parameters that apply across all tests in a
+      file/directory requires updating several locations.
+    - Similar parameters (e.g. 1000 vs 1024 for an array length) would
+      be defined at separate locations, and would then require
+      separate fixture setup.
+
+# Prior art
+[prior-art]: #prior-art
+
+- 
[`pytest.mark.parametrize`](https://docs.pytest.org/en/6.2.x/parametrize.html)
+  exists to combine several related unit tests into a single function
+  with varying parameters.  However, it must be applied to each
+  individual python function.
+  
+- 
[`pytest.fixture`](https://docs.pytest.org/en/6.2.x/reference.html#pytest.fixture)
+  Both TVM parameters and fixtures are built on top of the existing
+  pytest functionality for parametrizations.  While pytest's default
+  fixtures can be cached using the `scope` parameter, only a single
+  cached value is retained at any time, which can lead to repetition
+  of expensive fixture setup.
+
+# Unresolved questions
+[unresolved-questions]: #unresolved-questions
+
+- What values are appropriate to cache using
+  `@tvm.testing.fixture(cache_return_value=True)`?  Should
+  non-serializable values be allowed?
+
+  If only serializable values are allowed to be cached, this may aid
+  in debugging, since the values of all test parameters and cached
+  fixtures could be saved and reproduced.  Currently, nearly all cases
+  (e.g. datasets, array sizes, targets) are serializable.  The only
+  non-serializable case after brainstorming would be RPC server
+  connections.  There is some concern that caching RPC server
+  connections could cause difficulties in reproducing test failures.
+
+  Current proposed answer is to only cache serializable values, and
+  that the discussion can be resumed when we have other possible use
+  cases for caching non-serializable values.
+
+  For the time-being, both to prevent non-serializable values from
+  being cached, and to maintain separation between unit tests, all
+  cached values will be copied using
+  [`copy.deepcopy`](https://docs.python.org/3/library/copy.html#copy.deepcopy)
+  prior to returning the generated value.
+
+# Future possibilities
+[future-possibilities]: #future-possibilities
+
+- Parameters common across many tests could be defined at a
+  larger scope (e.g. `${TVM_HOME}/conftest.py`) and be usable in a
+  file without additional declaration.
+  
+- Parameters common across many tests could have additional randomly
+  generated values added to the list, adding fuzzing to the tests.
+  
+- Parametrized unit tests interact very nicely with the
+  [pytest-benchmark](https://pytest-benchmark.readthedocs.io/en/stable/)
+  plugin for comparing low-level functionality.  For example, the
+  definition below would benchmark and record statistics for the
+  runtime to copy data from a device to the CPU, with the benchmarks
+  tagged by the parameter values of `array_size`, `dtype`, and
+  `target`.  The benchmarking can be disabled by default and run only
+  with the `--benchmark-enable` command-line argument.
+
+  ```python
+  def test_copy_data_from_device(benchmark, array_size, dtype, dev):
+      A = tvm.te.placeholder((array_size,), name="A", dtype=dtype)
+      a_np = np.random.uniform(size=(array_size,)).astype(A.dtype)
+      a = tvm.nd.array(a_np, dev)
+  
+      b_np = benchmark(a.numpy)
+      tvm.testing.assert_allclose(a_np, b_np)
+  ```

[tvm-rfcs] branch main updated: [UnitTests] Parametrized Unit Tests (#7)

Reply via email to