[GitHub] [tvm-rfcs] Lunderberg commented on a change in pull request #7: [UnitTests] Parametrized Unit Tests

GitBox Mon, 30 Aug 2021 06:33:00 -0700


Lunderberg commented on a change in pull request #7:
URL: https://github.com/apache/tvm-rfcs/pull/7#discussion_r698491103




##########
File path: rfcs/0007-parametrized-unit-tests.md
##########
@@ -0,0 +1,568 @@
+- Feature Name: Parametrized Unit Tests
+- Start Date: 2021-05-10(fill me in with today's date, YYYY-MM-DD)
+- RFC PR: [apache/tvm-rfcs#0007](https://github.com/apache/tvm-rfcs/pull/0007)
+- GitHub PR: [apache/tvm#8010](https://github.com/apache/tvm/issues/8010)
+
+# Summary
+[summary]: #summary
+
+This RFC documents how to implement unit tests that depend on input
+parameters, or have setup that depends on input parameters.
+
+# Motivation
+[motivation]: #motivation
+
+Some unit tests should be tested along a variety of parameters for
+better coverage.  For example, a unit test that does not depend on
+target-specific features should be tested on all targets that the test
+platform supports.  Alternatively, a unit test may need to pass
+different array sizes to a function, in order to exercise different
+code paths within that function.
+
+The simplest implementation would be to write a test function that
+loops over all parameters, throwing an exception if any parameter
+fails the test.  However, this does not give full information to a
+developer, as a failure from any parameter results in the entire test
+to be marked as failing.  A unit-test that fails for all targets
+requires different debugging than a unit-test that fails on a single
+specific target, and so this information should be exposed.
+
+This RFC adds functionality for implementing parameterized unit tests,
+such that each set of parameters appears as a separate test result in
+the final output.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+## Parameters
+
+To make a new parameter for unit tests to use, define it with the
+`tvm.testing.parameter` function.  For example, the following will
+define a parameter named `array_size` that has three possible values.
+This can appear either at global scope inside a test module to be
+usable by all test functions in that module, or in a directory's
+`conftest.py` to be usable by all tests in that directory.
+
+```python
+array_size = tvm.testing.parameter(8, 256, 1024)
+```
+
+To use a parameter, define a test function that accepts the parameter
+as an input.  This test will be run once for each value of the
+parameter.  For example, the `test_function` below would be run three
+times, each time with a different value of `array_size` according to
+the earlier definition.  These would show up in the output report as
+`test_function[8]`, `test_function[256]`, and `test_function[1024]`,
+with the name of the parameter as part of the function.
+
+```python
+def test_function(array_size):
+    input_array = np.random.uniform(size=array_size)
+    # Test code here
+```
+
+If a parameter is used by a test function, but isn't declared as a
+function argument, it will produce a `NameError` when accessed.  This
+happens even if the parameter is defined at module scope, and would
+otherwise be accessible by the usual scoping rules.  This is
+intentional, as access of the global variable would otherwise access
+an `array_size` function definition, rather than the specific
+parameter value.
+
+```python
+def test_function_broken():
+    # Throws NameError, undefined variable "array_size"
+    input_array = np.random.uniform(size=array_size)
+    # Test code here
+```
+
+By default, a test function that accepts multiple parameters as
+arguments will be run for all combinations of values of those
+parameters.  If only some combinations of parameters should be used,
+the `tvm.testing.parameters` function can be used to simultaneously
+define multiple parameters.  A test function that accepts parameters
+that were defined through `tvm.testing.parameters` will only be called
+once for each set of parameters.
+
+```python
+array_size = tvm.testing.parameter(8, 256, 1024)
+dtype = tvm.testing.parameter('float32', 'int32')
+
+# Called 6 times, once for each combination of array_size and dtype.
+def test_function1(array_size, dtype):
+    assert(True)
+
+test_data, reference_result = tvm.testing.parameters(
+    ('test_data_1.dat', 'result_1.txt'),
+    ('test_data_2.dat', 'result_2.txt'),
+    ('test_data_3.dat', 'result_3.txt'),
+)
+
+# Called 3 times, once for each (test_data, reference_result) tuple.
+def test_function3(test_data, reference_result):
+    assert(True)
+```
+
+## Fixtures
+
+Fixtures in pytest separate setup code from test code, and are used
+for two primary purposes.  The first is for improved readability when
+debugging, so that a failure in the setup is distinguishable from a
+failure in the test.  The second is to avoid performing expensive test
+setup that is shared across multiple tests, letting the test suite run
+faster.
+
+For example, the following function first reads test data, and then
+performs tests that use the test data.
+
+```python
+# test_function_old() calls read_test_data().  If read_test_data()
+# throws an error, test_function_old() shows as a failed test.
+
+def test_function_old():
+    dataset = read_test_data()
+    assert(True) # Test succeeds
+```
+
+This can be pulled out into a separate setup function, which the test
+function then accepts as an argument.  In this usage, this is
+equivalent to using a bare `@pytest.fixture` decorator.  By default,
+the fixture value is recalculated for every test function, to minimize
+the potential for interaction between unit tests.
+
+```python
[email protected]
+def dataset():
+    print('Prints once for each test function that uses dataset.')
+    return read_test_data()
+
+# test_function_new() accepts the dataset fixture.  If
+# read_test_data() throws an error, test_function_new() shows
+# as unrunnable.
+def test_function_new(dataset):
+    assert(True) # Test succeeds
+```
+
+If the fixture is more expensive to calculate, then it may be worth
+caching the computed fixture.  This is done with the
+`cache_return_value=True` argument.
+
+```python
[email protected](cache_return_value = True)
+def dataset():
+    print('Prints once no matter how many test functions use dataset.')
+    return download_test_data()
+
+def test_function(dataset):
+    assert(True) # Test succeeds
+```
+
+The caching can be disabled entirely by setting the environment
+variable `TVM_TEST_DISABLE_CACHE` to a non-zero integer.  This can be
+useful to re-run tests that failed, to check whether the failure is
+due to modification/re-use of a cached value.
+
+A fixture can depend on parameters, or on other fixtures.  This is
+defined by accepting additional parameters.  For example, consider the
+following test function.  In this example, the calculation of
+`correct_output` depends on the test data, and the `schedule` depends
+on some block size.  The `generate_output` function contains the
+functionality to be tested.
+
+```python
+def test_function_old():
+    dataset = download_test_data()
+    correct_output = calculate_correct_output(dataset)
+    for block_size in [8, 256, 1024]:
+        schedule = setup_schedule(block_size)
+        output = generate_output(dataset, schedule)
+        tvm.testing.assert_allclose(output, correct_output)
+```
+
+These can be split out into separate parameters and fixtures to
+isolate the functionality to be tested.  Whether to split out the
+setup code, and whether to cache it is dependent on the test function,
+how expensive the setup is to perform the setup, whether other tests
+can share the same setup code, and so on.
+
+```python
[email protected](cache_return_value = True)
+def dataset():
+    return download_test_data()
+    
[email protected]
+def correct_output(dataset):
+    return calculate_correct_output(dataset)
+    
+array_size = tvm.testing.parameter(8, 256, 1024)
+
[email protected]
+def schedule(array_size):
+    return setup_schedule(array_size)
+    
+def test_function_new(dataset, correct_output, schedule):
+    output = generate_output(dataset, schedule)
+    tvm.testing.assert_allclose(output, correct_output)
+```
+
+## Target/Device Parametrization
+
+The TVM test configuration contains definitions for `target` and
+`dev`, which can be accepted as input by any test function.  These
+replace the previous use of `tvm.testing.enabled_targets()`.
+
+```python
+def test_function_old():
+    for target, dev in tvm.testing.enabled_targets():
+        assert(True) # Target-based test
+        
+def test_function_new(target, dev):
+    assert(True) # Target-based test
+```
+
+The parametrized values of `target` are read from the environment
+variable `TVM_TEST_TARGETS`, a semicolon-separated list of targets.
+If `TVM_TEST_TARGETS` is not defined, the target list falls back to
+`tvm.testing.DEFAULT_TEST_TARGETS`.  All parametrized targets have
+appropriate markers for checking device capability
+(e.g. `@tvm.testing.uses_gpu`).  If a platform cannot run a test, it
+is explicitly listed as being skipped.
+
+It is expected both that enabling unit tests across additional targets
+may uncover several unit tests failures, and that some unit tests may
+fail during the early implementation of supporting a new runtime or
+hardware.  In these cases, the `@tvm.testing.known_failing_targets`
+decorator can be used.  This marks a test with `pytest.xfail`,
+allowing the test suite to pass.  This is intended for cases where an
+implementation will be improved in the future.
+
+```python
[email protected]_failing_targets("my_newly_implemented_target")
+def test_function(target, dev):
+    # Test fails on new target, but marking as xfail allows CI suite
+    # to pass during development.
+    assert(target != "my_newly_implemented_target")
+```
+
+If a test should be run over a most targets, but isn't applicable for
+some particular targets, the test should be marked with
+`@tvm.testing.exclude_targets`.  For example, a test that exercises
+GPU capabilities may wish to be run against all targets except for
+`llvm`.
+
+```python
[email protected]_targets("llvm")
+def test_gpu_functionality(target, dev):
+    # Test isn't run on llvm, is excluded from the report entirely.
+    assert(target != "llvm")
+```
+
+If a testing should be run over only a specific set of targets and
+devices, the `@tvm.testing.parametrize_targets` decorator can be used.
+It is intended for use where a test is applicable only to a specific
+target, and is inapplicable to any others (e.g. verifying
+target-specific assembly code matches known assembly code).  In most
+circumstances, `@tvm.testing.exclude_targets` or
+`@tvm.testing.known_failing_targets` should be used instead.  For
+example, a test that verifies vulkan-specific code generation should
+be marked with `@tvm.testing.parametrize_targets("vulkan")`.
+
+```python
[email protected]_targets("vulkan")
+def test_vulkan_codegen(target):
+    f = tvm.build(..., target)
+    assembly = f.imported_modules[0].get_source()
+    assert("%v4bool = OpTypeVector %bool 4" in assembly)
+```
+
+The bare decorator `@tvm.testing.parametrize_targets` is maintained
+for backwards compatibility, but is no longer the preferred style.
+
+## Running Test Subsets
+
+Individual python test files are no longer executable outside of the
+pytest framework.  To maintain the existing behavior of running the
+tests defined in a particular file, the following change should be
+made.
+
+```python
+# Before
+if __name__=='__main__':
+    test_function_1()
+    test_function_2()
+    ...
+    
+# After
+if __name__=='__main__':
+    sys.exit(pytest.main(sys.argv))

Review comment:
       Agreed, and that would help with readability.  I've added it to my todo 
list, though it's low on the priority.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm-rfcs] Lunderberg commented on a change in pull request #7: [UnitTests] Parametrized Unit Tests

Reply via email to