(spark) branch master updated: [SPARK-46802][PYTHON][TESTS] Clean up obsolete code in PySpark coverage script

gurwls223 Mon, 22 Jan 2024 17:55:28 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 227611d70d52 [SPARK-46802][PYTHON][TESTS] Clean up obsolete code in 
PySpark coverage script
227611d70d52 is described below

commit 227611d70d5293bbb5d67b62af649e3bf36eaec6
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Tue Jan 23 10:55:01 2024 +0900

    [SPARK-46802][PYTHON][TESTS] Clean up obsolete code in PySpark coverage 
script
    
    ### What changes were proposed in this pull request?
    
    This PR cleans up the obsolete code in PySpark coverage script
    
    ### Why are the changes needed?
    
    We used to use `coverage_daemon.py` for Python workers to track the 
coverage of the Python worker side (e.g., the coverage within Python UDF), 
added in https://github.com/apache/spark/pull/20204. However, seems it does not 
work anymore. In fact, it has been multiple years that it stopped working. The 
approach of replacing the Python worker itself was a bit hacky workaround. We 
should just get rid of them first, and find a proper way.
    
    This should also deflake the scheduled jobs, and speed up the build.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Manually tested via:
    
    ```python
    ./run-tests-with-coverage --python-executables=python3 
--testname="pyspark.sql.functions.builtin"
    ```
    
    ```
    ...
    Finished test(python3): pyspark.sql.tests.test_functions (87s)
    Tests passed in 87 seconds
    Combining collected coverage data under
    ...
    Creating XML report file at python/coverage.xml
    Wrote XML report to coverage.xml
    Reporting the coverage data at 
/.../spark/python/test_coverage/coverage_data/coverage
    Name                                    Stmts   Miss Branch BrPart  Cover
    -------------------------------------------------------------------------
    pyspark/__init__.py                        48      7     10      3    76%
    pyspark/_globals.py                        16      3      4      2    75%
    pyspark/accumulators.py                   123     38     26      5    66%
    pyspark/broadcast.py                      121     79     40      3    33%
    pyspark/conf.py                            99     33     50      5    64%
    pyspark/context.py                        451    216    151     26    51%
    pyspark/errors/__init__.py                  3      0      0      0   100%
    pyspark/errors/error_classes.py             3      0      0      0   100%
    pyspark/errors/exceptions/__init__.py       0      0      0      0   100%
    pyspark/errors/exceptions/base.py          91     15     24      4    83%
    pyspark/errors/exceptions/captured.py     168     81     57     17    48%
    pyspark/errors/utils.py                    34      8      6      2    70%
    pyspark/files.py                           34     15     12      3    57%
    pyspark/find_spark_home.py                 30     24     12      2    19%
    pyspark/java_gateway.py                   114     31     30     12    69%
    pyspark/join.py                            66     58     58      0     6%
    pyspark/profiler.py                       244    182     92      3    22%
    pyspark/rdd.py                           1064    741    378      9    27%
    pyspark/rddsampler.py                      68     50     32      0    18%
    pyspark/resource/__init__.py                5      0      0      0   100%
    pyspark/resource/information.py            11      4      4      0    73%
    pyspark/resource/profile.py               110     82     58      1    27%
    pyspark/resource/requests.py              139     90     70      0    35%
    pyspark/resultiterable.py                  14      6      2      1    56%
    pyspark/serializers.py                    349    185     90     13    43%
    pyspark/shuffle.py                        397    322    180      1    13%
    pyspark/sql/__init__.py                    14      0      0      0   100%
    pyspark/sql/catalog.py                    203    127     66      2    30%
    pyspark/sql/column.py                     268     78     64     12    67%
    pyspark/sql/conf.py                        40     16     10      3    58%
    pyspark/sql/context.py                    170     95     58      2    47%
    pyspark/sql/dataframe.py                  900    475    459     40    45%
    pyspark/sql/functions/__init__.py           3      0      0      0   100%
    pyspark/sql/functions/builtin.py         1741    542   1126     26    76%
    pyspark/sql/functions/partitioning.py      41     19     18      3    59%
    pyspark/sql/group.py                       81     30     32      3    65%
    pyspark/sql/observation.py                 54     37     22      1    26%
    pyspark/sql/pandas/__init__.py              1      0      0      0   100%
    pyspark/sql/pandas/conversion.py          277    249    156      2     8%
    pyspark/sql/pandas/functions.py            67     49     34      0    18%
    pyspark/sql/pandas/group_ops.py            89     65     22      2    25%
    pyspark/sql/pandas/map_ops.py              37     27     10      2    26%
    pyspark/sql/pandas/serializers.py         381    323    172      0    10%
    pyspark/sql/pandas/typehints.py            41     32     26      1    15%
    pyspark/sql/pandas/types.py               407    383    326      1     3%
    pyspark/sql/pandas/utils.py                29     11     10      5    59%
    pyspark/sql/profiler.py                    80     47     54      1    39%
    pyspark/sql/readwriter.py                 362    253    146      7    27%
    pyspark/sql/session.py                    469    206    228     22    56%
    pyspark/sql/sql_formatter.py               41     26     16      1    28%
    pyspark/sql/streaming/__init__.py           4      0      0      0   100%
    pyspark/sql/streaming/listener.py         400    200    186      1    61%
    pyspark/sql/streaming/query.py            102     63     40      1    39%
    pyspark/sql/streaming/readwriter.py       268    207    118      2    21%
    pyspark/sql/streaming/state.py            100     68     44      0    29%
    pyspark/sql/tests/__init__.py               0      0      0      0   100%
    pyspark/sql/tests/test_functions.py       646      2    244      7    99%
    pyspark/sql/types.py                     1013    355    528     74    62%
    pyspark/sql/udf.py                        240    132     90     20    42%
    pyspark/sql/udtf.py                       152     98     52      2    33%
    pyspark/sql/utils.py                      160     83     54     10    45%
    pyspark/sql/window.py                      89     23     56      5    77%
    pyspark/statcounter.py                     79     58     20      0    21%
    pyspark/status.py                          36     13      6      0    55%
    pyspark/storagelevel.py                    41      9      0      0    78%
    pyspark/taskcontext.py                    111     63     40      1    40%
    pyspark/testing/__init__.py                 2      0      0      0   100%
    pyspark/testing/sqlutils.py               149     44     52      1    75%
    pyspark/testing/utils.py                  312    238    162      2    17%
    pyspark/traceback_utils.py                 38      4     14      6    81%
    pyspark/util.py                           153    120     56      2    18%
    pyspark/version.py                          1      0      0      0   100%
    ...
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #44842 from HyukjinKwon/SPARK-46802.
    
    Authored-by: Hyukjin Kwon <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/run-tests-with-coverage                |  3 --
 python/test_coverage/conf/spark-defaults.conf | 21 ------------
 python/test_coverage/coverage_daemon.py       | 48 ---------------------------
 3 files changed, 72 deletions(-)

diff --git a/python/run-tests-with-coverage b/python/run-tests-with-coverage
index d1c2dacbf9d8..aa23e16e8e43 100755
--- a/python/run-tests-with-coverage
+++ b/python/run-tests-with-coverage
@@ -44,9 +44,6 @@ export PYTHONPATH="$FWDIR:$PYTHONPATH"
 # Also, our sitecustomize.py and coverage_daemon.py are included in the path.
 export PYTHONPATH="$COVERAGE_DIR:$PYTHONPATH"
 
-# We use 'spark.python.daemon.module' configuration to insert the coverage 
supported workers.
-export SPARK_CONF_DIR="$COVERAGE_DIR/conf"
-
 # This environment variable enables the coverage.
 export COVERAGE_PROCESS_START="$FWDIR/.coveragerc"
 
diff --git a/python/test_coverage/conf/spark-defaults.conf 
b/python/test_coverage/conf/spark-defaults.conf
deleted file mode 100644
index bf44ea6e7cfe..000000000000
--- a/python/test_coverage/conf/spark-defaults.conf
+++ /dev/null
@@ -1,21 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# This is used to generate PySpark coverage results. Seems there's no way to
-# add a configuration when SPARK_TESTING environment variable is set because
-# we will directly execute modules by python -m.
-spark.python.daemon.module coverage_daemon
diff --git a/python/test_coverage/coverage_daemon.py 
b/python/test_coverage/coverage_daemon.py
deleted file mode 100644
index 4372135d6fc3..000000000000
--- a/python/test_coverage/coverage_daemon.py
+++ /dev/null
@@ -1,48 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-import os
-import imp
-import platform
-
-
-# This is a hack to always refer the main code rather than built zip.
-main_code_dir = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
-daemon = imp.load_source("daemon", "%s/pyspark/daemon.py" % main_code_dir)
-
-if "COVERAGE_PROCESS_START" in os.environ:
-    # PyPy with coverage makes the tests flaky, and CPython is enough for 
coverage report.
-    if "pypy" not in platform.python_implementation().lower():
-        worker = imp.load_source("worker", "%s/pyspark/worker.py" % 
main_code_dir)
-
-        def _cov_wrapped(*args, **kwargs):
-            import coverage
-            cov = coverage.coverage(
-                config_file=os.environ["COVERAGE_PROCESS_START"])
-            cov.start()
-            try:
-                worker.main(*args, **kwargs)
-            finally:
-                cov.stop()
-                cov.save()
-        daemon.worker_main = _cov_wrapped
-else:
-    raise RuntimeError("COVERAGE_PROCESS_START environment variable is not 
set, exiting.")
-
-
-if __name__ == '__main__':
-    daemon.manager()


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-46802][PYTHON][TESTS] Clean up obsolete code in PySpark coverage script

Reply via email to