This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 227611d70d52 [SPARK-46802][PYTHON][TESTS] Clean up obsolete code in
PySpark coverage script
227611d70d52 is described below
commit 227611d70d5293bbb5d67b62af649e3bf36eaec6
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Tue Jan 23 10:55:01 2024 +0900
[SPARK-46802][PYTHON][TESTS] Clean up obsolete code in PySpark coverage
script
### What changes were proposed in this pull request?
This PR cleans up the obsolete code in PySpark coverage script
### Why are the changes needed?
We used to use `coverage_daemon.py` for Python workers to track the
coverage of the Python worker side (e.g., the coverage within Python UDF),
added in https://github.com/apache/spark/pull/20204. However, seems it does not
work anymore. In fact, it has been multiple years that it stopped working. The
approach of replacing the Python worker itself was a bit hacky workaround. We
should just get rid of them first, and find a proper way.
This should also deflake the scheduled jobs, and speed up the build.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manually tested via:
```python
./run-tests-with-coverage --python-executables=python3
--testname="pyspark.sql.functions.builtin"
```
```
...
Finished test(python3): pyspark.sql.tests.test_functions (87s)
Tests passed in 87 seconds
Combining collected coverage data under
...
Creating XML report file at python/coverage.xml
Wrote XML report to coverage.xml
Reporting the coverage data at
/.../spark/python/test_coverage/coverage_data/coverage
Name Stmts Miss Branch BrPart Cover
-------------------------------------------------------------------------
pyspark/__init__.py 48 7 10 3 76%
pyspark/_globals.py 16 3 4 2 75%
pyspark/accumulators.py 123 38 26 5 66%
pyspark/broadcast.py 121 79 40 3 33%
pyspark/conf.py 99 33 50 5 64%
pyspark/context.py 451 216 151 26 51%
pyspark/errors/__init__.py 3 0 0 0 100%
pyspark/errors/error_classes.py 3 0 0 0 100%
pyspark/errors/exceptions/__init__.py 0 0 0 0 100%
pyspark/errors/exceptions/base.py 91 15 24 4 83%
pyspark/errors/exceptions/captured.py 168 81 57 17 48%
pyspark/errors/utils.py 34 8 6 2 70%
pyspark/files.py 34 15 12 3 57%
pyspark/find_spark_home.py 30 24 12 2 19%
pyspark/java_gateway.py 114 31 30 12 69%
pyspark/join.py 66 58 58 0 6%
pyspark/profiler.py 244 182 92 3 22%
pyspark/rdd.py 1064 741 378 9 27%
pyspark/rddsampler.py 68 50 32 0 18%
pyspark/resource/__init__.py 5 0 0 0 100%
pyspark/resource/information.py 11 4 4 0 73%
pyspark/resource/profile.py 110 82 58 1 27%
pyspark/resource/requests.py 139 90 70 0 35%
pyspark/resultiterable.py 14 6 2 1 56%
pyspark/serializers.py 349 185 90 13 43%
pyspark/shuffle.py 397 322 180 1 13%
pyspark/sql/__init__.py 14 0 0 0 100%
pyspark/sql/catalog.py 203 127 66 2 30%
pyspark/sql/column.py 268 78 64 12 67%
pyspark/sql/conf.py 40 16 10 3 58%
pyspark/sql/context.py 170 95 58 2 47%
pyspark/sql/dataframe.py 900 475 459 40 45%
pyspark/sql/functions/__init__.py 3 0 0 0 100%
pyspark/sql/functions/builtin.py 1741 542 1126 26 76%
pyspark/sql/functions/partitioning.py 41 19 18 3 59%
pyspark/sql/group.py 81 30 32 3 65%
pyspark/sql/observation.py 54 37 22 1 26%
pyspark/sql/pandas/__init__.py 1 0 0 0 100%
pyspark/sql/pandas/conversion.py 277 249 156 2 8%
pyspark/sql/pandas/functions.py 67 49 34 0 18%
pyspark/sql/pandas/group_ops.py 89 65 22 2 25%
pyspark/sql/pandas/map_ops.py 37 27 10 2 26%
pyspark/sql/pandas/serializers.py 381 323 172 0 10%
pyspark/sql/pandas/typehints.py 41 32 26 1 15%
pyspark/sql/pandas/types.py 407 383 326 1 3%
pyspark/sql/pandas/utils.py 29 11 10 5 59%
pyspark/sql/profiler.py 80 47 54 1 39%
pyspark/sql/readwriter.py 362 253 146 7 27%
pyspark/sql/session.py 469 206 228 22 56%
pyspark/sql/sql_formatter.py 41 26 16 1 28%
pyspark/sql/streaming/__init__.py 4 0 0 0 100%
pyspark/sql/streaming/listener.py 400 200 186 1 61%
pyspark/sql/streaming/query.py 102 63 40 1 39%
pyspark/sql/streaming/readwriter.py 268 207 118 2 21%
pyspark/sql/streaming/state.py 100 68 44 0 29%
pyspark/sql/tests/__init__.py 0 0 0 0 100%
pyspark/sql/tests/test_functions.py 646 2 244 7 99%
pyspark/sql/types.py 1013 355 528 74 62%
pyspark/sql/udf.py 240 132 90 20 42%
pyspark/sql/udtf.py 152 98 52 2 33%
pyspark/sql/utils.py 160 83 54 10 45%
pyspark/sql/window.py 89 23 56 5 77%
pyspark/statcounter.py 79 58 20 0 21%
pyspark/status.py 36 13 6 0 55%
pyspark/storagelevel.py 41 9 0 0 78%
pyspark/taskcontext.py 111 63 40 1 40%
pyspark/testing/__init__.py 2 0 0 0 100%
pyspark/testing/sqlutils.py 149 44 52 1 75%
pyspark/testing/utils.py 312 238 162 2 17%
pyspark/traceback_utils.py 38 4 14 6 81%
pyspark/util.py 153 120 56 2 18%
pyspark/version.py 1 0 0 0 100%
...
```
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #44842 from HyukjinKwon/SPARK-46802.
Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/run-tests-with-coverage | 3 --
python/test_coverage/conf/spark-defaults.conf | 21 ------------
python/test_coverage/coverage_daemon.py | 48 ---------------------------
3 files changed, 72 deletions(-)
diff --git a/python/run-tests-with-coverage b/python/run-tests-with-coverage
index d1c2dacbf9d8..aa23e16e8e43 100755
--- a/python/run-tests-with-coverage
+++ b/python/run-tests-with-coverage
@@ -44,9 +44,6 @@ export PYTHONPATH="$FWDIR:$PYTHONPATH"
# Also, our sitecustomize.py and coverage_daemon.py are included in the path.
export PYTHONPATH="$COVERAGE_DIR:$PYTHONPATH"
-# We use 'spark.python.daemon.module' configuration to insert the coverage
supported workers.
-export SPARK_CONF_DIR="$COVERAGE_DIR/conf"
-
# This environment variable enables the coverage.
export COVERAGE_PROCESS_START="$FWDIR/.coveragerc"
diff --git a/python/test_coverage/conf/spark-defaults.conf
b/python/test_coverage/conf/spark-defaults.conf
deleted file mode 100644
index bf44ea6e7cfe..000000000000
--- a/python/test_coverage/conf/spark-defaults.conf
+++ /dev/null
@@ -1,21 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements. See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# This is used to generate PySpark coverage results. Seems there's no way to
-# add a configuration when SPARK_TESTING environment variable is set because
-# we will directly execute modules by python -m.
-spark.python.daemon.module coverage_daemon
diff --git a/python/test_coverage/coverage_daemon.py
b/python/test_coverage/coverage_daemon.py
deleted file mode 100644
index 4372135d6fc3..000000000000
--- a/python/test_coverage/coverage_daemon.py
+++ /dev/null
@@ -1,48 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements. See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-import os
-import imp
-import platform
-
-
-# This is a hack to always refer the main code rather than built zip.
-main_code_dir = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
-daemon = imp.load_source("daemon", "%s/pyspark/daemon.py" % main_code_dir)
-
-if "COVERAGE_PROCESS_START" in os.environ:
- # PyPy with coverage makes the tests flaky, and CPython is enough for
coverage report.
- if "pypy" not in platform.python_implementation().lower():
- worker = imp.load_source("worker", "%s/pyspark/worker.py" %
main_code_dir)
-
- def _cov_wrapped(*args, **kwargs):
- import coverage
- cov = coverage.coverage(
- config_file=os.environ["COVERAGE_PROCESS_START"])
- cov.start()
- try:
- worker.main(*args, **kwargs)
- finally:
- cov.stop()
- cov.save()
- daemon.worker_main = _cov_wrapped
-else:
- raise RuntimeError("COVERAGE_PROCESS_START environment variable is not
set, exiting.")
-
-
-if __name__ == '__main__':
- daemon.manager()
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]