Repository: spark
Updated Branches:
refs/heads/branch-2.3 235ec9ee7 -> 52a420ff6
[SPARK-23853][PYSPARK][TEST] Run Hive-related PySpark tests only for `-Phive`
## What changes were proposed in this pull request?
When `PyArrow` or `Pandas` are not available, the corresponding PySpark tests
are skipped automatically. Currently, PySpark tests fail when we are not using
`-Phive`. This PR aims to skip Hive related PySpark tests when `-Phive` is not
given.
**BEFORE**
```bash
$ build/mvn -DskipTests clean package
$ python/run-tests.py --python-executables python2.7 --modules pyspark-sql
File "/Users/dongjoon/spark/python/pyspark/sql/readwriter.py", line 295, in
pyspark.sql.readwriter.DataFrameReader.table
...
IllegalArgumentException: u"Error while instantiating
'org.apache.spark.sql.hive.HiveExternalCatalog':"
**********************************************************************
1 of 3 in pyspark.sql.readwriter.DataFrameReader.table
***Test Failed*** 1 failures.
```
**AFTER**
```bash
$ build/mvn -DskipTests clean package
$ python/run-tests.py --python-executables python2.7 --modules pyspark-sql
...
Tests passed in 138 seconds
Skipped tests in pyspark.sql.tests with python2.7:
...
test_hivecontext (pyspark.sql.tests.HiveSparkSubmitTests) ... skipped 'Hive
is not available.'
```
## How was this patch tested?
This is a test-only change. First, this should pass the Jenkins. Then, manually
do the following.
```bash
build/mvn -DskipTests clean package
python/run-tests.py --python-executables python2.7 --modules pyspark-sql
```
Author: Dongjoon Hyun <[email protected]>
Closes #21141 from dongjoon-hyun/SPARK-23853.
(cherry picked from commit b857fb549f3bf4e6f289ba11f3903db0a3696dec)
Signed-off-by: hyukjinkwon <[email protected]>
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/52a420ff
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/52a420ff
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/52a420ff
Branch: refs/heads/branch-2.3
Commit: 52a420ff62fabcc649d7acf85a72a3c0cd6437e8
Parents: 235ec9e
Author: Dongjoon Hyun <[email protected]>
Authored: Tue May 1 09:06:23 2018 +0800
Committer: hyukjinkwon <[email protected]>
Committed: Tue May 1 09:06:51 2018 +0800
----------------------------------------------------------------------
python/pyspark/sql/readwriter.py | 2 +-
python/pyspark/sql/tests.py | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/52a420ff/python/pyspark/sql/readwriter.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index e70aa9e..28c10aa 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -956,7 +956,7 @@ def _test():
globs = pyspark.sql.readwriter.__dict__.copy()
sc = SparkContext('local[4]', 'PythonTest')
try:
- spark = SparkSession.builder.enableHiveSupport().getOrCreate()
+ spark = SparkSession.builder.getOrCreate()
except py4j.protocol.Py4JError:
spark = SparkSession(sc)
http://git-wip-us.apache.org/repos/asf/spark/blob/52a420ff/python/pyspark/sql/tests.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index daa97e1..f8da7d8 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -2935,6 +2935,26 @@ class SQLTests(ReusedSQLTestCase):
class HiveSparkSubmitTests(SparkSubmitTests):
+ @classmethod
+ def setUpClass(cls):
+ # get a SparkContext to check for availability of Hive
+ sc = SparkContext('local[4]', cls.__name__)
+ cls.hive_available = True
+ try:
+ sc._jvm.org.apache.hadoop.hive.conf.HiveConf()
+ except py4j.protocol.Py4JError:
+ cls.hive_available = False
+ except TypeError:
+ cls.hive_available = False
+ finally:
+ # we don't need this SparkContext for the test
+ sc.stop()
+
+ def setUp(self):
+ super(HiveSparkSubmitTests, self).setUp()
+ if not self.hive_available:
+ self.skipTest("Hive is not available.")
+
def test_hivecontext(self):
# This test checks that HiveContext is using Hive metastore
(SPARK-16224).
# It sets a metastore url and checks if there is a derby dir created by
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]