Repository: spark
Updated Branches:
refs/heads/master 7143e9d72 -> 7e3eb3cd2
[SPARK-26252][PYTHON] Add support to run specific unittests and/or doctests in
python/run-tests script
## What changes were proposed in this pull request?
This PR proposes add a developer option, `--testnames`, to our testing script
to allow run specific set of unittests and doctests.
**1. Run unittests in the class**
```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests'
```
```
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'pypy']
Will test the following Python tests: ['pyspark.sql.tests.test_arrow
ArrowTests']
Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests
Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests
Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests (14s)
Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests (14s) ... 22 tests
were skipped
Tests passed in 14 seconds
Skipped tests in pyspark.sql.tests.test_arrow ArrowTests with pypy:
test_createDataFrame_column_name_encoding
(pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be
installed; however, it was not found.'
test_createDataFrame_does_not_modify_input
(pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be
installed; however, it was not found.'
test_createDataFrame_fallback_disabled
(pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be
installed; however, it was not found.'
test_createDataFrame_fallback_enabled
(pyspark.sql.tests.test_arrow.ArrowTests) ... skipped
...
```
**2. Run single unittest in the class.**
```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow
ArrowTests.test_null_conversion'
```
```
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'pypy']
Will test the following Python tests: ['pyspark.sql.tests.test_arrow
ArrowTests.test_null_conversion']
Starting test(pypy): pyspark.sql.tests.test_arrow
ArrowTests.test_null_conversion
Starting test(python2.7): pyspark.sql.tests.test_arrow
ArrowTests.test_null_conversion
Finished test(pypy): pyspark.sql.tests.test_arrow
ArrowTests.test_null_conversion (0s) ... 1 tests were skipped
Finished test(python2.7): pyspark.sql.tests.test_arrow
ArrowTests.test_null_conversion (8s)
Tests passed in 8 seconds
Skipped tests in pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion
with pypy:
test_null_conversion (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped
'Pandas >= 0.19.2 must be installed; however, it was not found.'
```
**3. Run doctests in single PySpark module.**
```bash
./run-tests --testnames pyspark.sql.dataframe
```
```
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'pypy']
Will test the following Python tests: ['pyspark.sql.dataframe']
Starting test(pypy): pyspark.sql.dataframe
Starting test(python2.7): pyspark.sql.dataframe
Finished test(python2.7): pyspark.sql.dataframe (47s)
Finished test(pypy): pyspark.sql.dataframe (48s)
Tests passed in 48 seconds
```
Of course, you can mix them:
```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow
ArrowTests,pyspark.sql.dataframe'
```
```
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'pypy']
Will test the following Python tests: ['pyspark.sql.tests.test_arrow
ArrowTests', 'pyspark.sql.dataframe']
Starting test(pypy): pyspark.sql.dataframe
Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests
Starting test(python2.7): pyspark.sql.dataframe
Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests
Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests (0s) ... 22 tests
were skipped
Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests (18s)
Finished test(python2.7): pyspark.sql.dataframe (50s)
Finished test(pypy): pyspark.sql.dataframe (52s)
Tests passed in 52 seconds
Skipped tests in pyspark.sql.tests.test_arrow ArrowTests with pypy:
test_createDataFrame_column_name_encoding
(pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be
installed; however, it was not found.'
test_createDataFrame_does_not_modify_input
(pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be
installed; however, it was not found.'
test_createDataFrame_fallback_disabled
(pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be
installed; however, it was not found.'
```
and also you can use all other options (except `--modules`, which will be
ignored)
```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow
ArrowTests.test_null_conversion' --python-executables=python
```
```
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python']
Will test the following Python tests: ['pyspark.sql.tests.test_arrow
ArrowTests.test_null_conversion']
Starting test(python): pyspark.sql.tests.test_arrow
ArrowTests.test_null_conversion
Finished test(python): pyspark.sql.tests.test_arrow
ArrowTests.test_null_conversion (12s)
Tests passed in 12 seconds
```
See help below:
```bash
./run-tests --help
```
```
Usage: run-tests [options]
Options:
...
Developer Options:
--testnames=TESTNAMES
A comma-separated list of specific modules, classes
and functions of doctest or unittest to test. For
example, 'pyspark.sql.foo' to run the module as
unittests or doctests, 'pyspark.sql.tests FooTests' to
run the specific class of unittests,
'pyspark.sql.tests FooTests.test_foo' to run the
specific unittest in the class. '--modules' option is
ignored if they are given.
```
I intentionally grouped it as a developer option to be more conservative.
## How was this patch tested?
Manually tested. Negative tests were also done.
```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow
ArrowTests.test_null_conversion1' --python-executables=python
```
```
...
AttributeError: type object 'ArrowTests' has no attribute
'test_null_conversion1'
...
```
```bash
./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowT'
--python-executables=python
```
```
...
AttributeError: 'module' object has no attribute 'ArrowT'
...
```
```bash
./run-tests --testnames 'pyspark.sql.tests.test_ar' --python-executables=python
```
```
...
/.../python2.7: No module named pyspark.sql.tests.test_ar
```
Closes #23203 from HyukjinKwon/SPARK-26252.
Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7e3eb3cd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7e3eb3cd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7e3eb3cd
Branch: refs/heads/master
Commit: 7e3eb3cd209d83394ca2b2cec79b26b1bbe9d7ea
Parents: 7143e9d
Author: Hyukjin Kwon <[email protected]>
Authored: Wed Dec 5 15:22:08 2018 +0800
Committer: Hyukjin Kwon <[email protected]>
Committed: Wed Dec 5 15:22:08 2018 +0800
----------------------------------------------------------------------
python/run-tests-with-coverage | 2 --
python/run-tests.py | 68 +++++++++++++++++++++++++------------
2 files changed, 46 insertions(+), 24 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/7e3eb3cd/python/run-tests-with-coverage
----------------------------------------------------------------------
diff --git a/python/run-tests-with-coverage b/python/run-tests-with-coverage
index 6d74b56..4578210 100755
--- a/python/run-tests-with-coverage
+++ b/python/run-tests-with-coverage
@@ -50,8 +50,6 @@ export SPARK_CONF_DIR="$COVERAGE_DIR/conf"
# This environment variable enables the coverage.
export COVERAGE_PROCESS_START="$FWDIR/.coveragerc"
-# If you'd like to run a specific unittest class, you could do such as
-# SPARK_TESTING=1 ../bin/pyspark pyspark.sql.tests VectorizedUDFTests
./run-tests "$@"
# Don't run coverage for the coverage command itself
http://git-wip-us.apache.org/repos/asf/spark/blob/7e3eb3cd/python/run-tests.py
----------------------------------------------------------------------
diff --git a/python/run-tests.py b/python/run-tests.py
index 01a6e81..e45268c 100755
--- a/python/run-tests.py
+++ b/python/run-tests.py
@@ -19,7 +19,7 @@
from __future__ import print_function
import logging
-from optparse import OptionParser
+from optparse import OptionParser, OptionGroup
import os
import re
import shutil
@@ -99,7 +99,7 @@ def run_individual_python_test(target_dir, test_name,
pyspark_python):
try:
per_test_output = tempfile.TemporaryFile()
retcode = subprocess.Popen(
- [os.path.join(SPARK_HOME, "bin/pyspark"), test_name],
+ [os.path.join(SPARK_HOME, "bin/pyspark")] + test_name.split(),
stderr=per_test_output, stdout=per_test_output, env=env).wait()
shutil.rmtree(tmp_dir, ignore_errors=True)
except:
@@ -190,6 +190,20 @@ def parse_opts():
help="Enable additional debug logging"
)
+ group = OptionGroup(parser, "Developer Options")
+ group.add_option(
+ "--testnames", type="string",
+ default=None,
+ help=(
+ "A comma-separated list of specific modules, classes and functions
of doctest "
+ "or unittest to test. "
+ "For example, 'pyspark.sql.foo' to run the module as unittests or
doctests, "
+ "'pyspark.sql.tests FooTests' to run the specific class of
unittests, "
+ "'pyspark.sql.tests FooTests.test_foo' to run the specific
unittest in the class. "
+ "'--modules' option is ignored if they are given.")
+ )
+ parser.add_option_group(group)
+
(opts, args) = parser.parse_args()
if args:
parser.error("Unsupported arguments: %s" % ' '.join(args))
@@ -213,25 +227,31 @@ def _check_coverage(python_exec):
def main():
opts = parse_opts()
- if (opts.verbose):
+ if opts.verbose:
log_level = logging.DEBUG
else:
log_level = logging.INFO
+ should_test_modules = opts.testnames is None
logging.basicConfig(stream=sys.stdout, level=log_level,
format="%(message)s")
LOGGER.info("Running PySpark tests. Output is in %s", LOG_FILE)
if os.path.exists(LOG_FILE):
os.remove(LOG_FILE)
python_execs = opts.python_executables.split(',')
- modules_to_test = []
- for module_name in opts.modules.split(','):
- if module_name in python_modules:
- modules_to_test.append(python_modules[module_name])
- else:
- print("Error: unrecognized module '%s'. Supported modules: %s" %
- (module_name, ", ".join(python_modules)))
- sys.exit(-1)
LOGGER.info("Will test against the following Python executables: %s",
python_execs)
- LOGGER.info("Will test the following Python modules: %s", [x.name for x in
modules_to_test])
+
+ if should_test_modules:
+ modules_to_test = []
+ for module_name in opts.modules.split(','):
+ if module_name in python_modules:
+ modules_to_test.append(python_modules[module_name])
+ else:
+ print("Error: unrecognized module '%s'. Supported modules: %s"
%
+ (module_name, ", ".join(python_modules)))
+ sys.exit(-1)
+ LOGGER.info("Will test the following Python modules: %s", [x.name for
x in modules_to_test])
+ else:
+ testnames_to_test = opts.testnames.split(',')
+ LOGGER.info("Will test the following Python tests: %s",
testnames_to_test)
task_queue = Queue.PriorityQueue()
for python_exec in python_execs:
@@ -246,16 +266,20 @@ def main():
LOGGER.debug("%s python_implementation is %s", python_exec,
python_implementation)
LOGGER.debug("%s version is: %s", python_exec, subprocess_check_output(
[python_exec, "--version"], stderr=subprocess.STDOUT,
universal_newlines=True).strip())
- for module in modules_to_test:
- if python_implementation not in
module.blacklisted_python_implementations:
- for test_goal in module.python_test_goals:
- heavy_tests = ['pyspark.streaming.tests',
'pyspark.mllib.tests',
- 'pyspark.tests', 'pyspark.sql.tests',
'pyspark.ml.tests']
- if any(map(lambda prefix: test_goal.startswith(prefix),
heavy_tests)):
- priority = 0
- else:
- priority = 100
- task_queue.put((priority, (python_exec, test_goal)))
+ if should_test_modules:
+ for module in modules_to_test:
+ if python_implementation not in
module.blacklisted_python_implementations:
+ for test_goal in module.python_test_goals:
+ heavy_tests = ['pyspark.streaming.tests',
'pyspark.mllib.tests',
+ 'pyspark.tests', 'pyspark.sql.tests',
'pyspark.ml.tests']
+ if any(map(lambda prefix:
test_goal.startswith(prefix), heavy_tests)):
+ priority = 0
+ else:
+ priority = 100
+ task_queue.put((priority, (python_exec, test_goal)))
+ else:
+ for test_goal in testnames_to_test:
+ task_queue.put((0, (python_exec, test_goal)))
# Create the target directory before starting tasks to avoid races.
target_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
'target'))
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]