This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new b2a7f14cbd8 [SPARK-42752][PYSPARK][SQL] Make PySpark exceptions printable during initialization b2a7f14cbd8 is described below commit b2a7f14cbd8fd3b1a51d7b53fc7c23fb71e9f370 Author: Gera Shegalov <g...@apache.org> AuthorDate: Tue Mar 14 08:30:15 2023 -0500 [SPARK-42752][PYSPARK][SQL] Make PySpark exceptions printable during initialization Ignore SQLConf initialization exceptions during Python exception creation. Otherwise there is no diagnostics for the issue in the following scenario: 1. download a standard "Hadoop Free" build 2. Start PySpark REPL with Hive support ```bash SPARK_DIST_CLASSPATH=$(~/dist/hadoop-3.4.0-SNAPSHOT/bin/hadoop classpath) \ ~/dist/spark-3.2.3-bin-without-hadoop/bin/pyspark --conf spark.sql.catalogImplementation=hive ``` 3. Execute any simple dataframe operation ```Python >>> spark.range(100).show() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/session.py", line 416, in range jdf = self._jsparkSession.range(0, int(start), int(step), int(numPartitions)) File "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__ File "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/utils.py", line 117, in deco raise converted from None pyspark.sql.utils.IllegalArgumentException: <exception str() failed> ``` 4. In fact just spark.conf already exhibits the issue ```Python >>> spark.conf Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/session.py", line 347, in conf self._conf = RuntimeConfig(self._jsparkSession.conf()) File "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__ File "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/utils.py", line 117, in deco raise converted from None pyspark.sql.utils.IllegalArgumentException: <exception str() failed> ``` There are probably two issues here: 1) that Hive support should be gracefully disabled if it the dependency not on the classpath as claimed by https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html 2) but at the very least the user should be able to see the exception to understand the issue, and take an action ### What changes were proposed in this pull request? Ignore exceptions during `CapturedException` creation ### Why are the changes needed? To make the cause visible to the user ```Python Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/user/gits/apache/spark/python/pyspark/sql/session.py", line 679, in conf self._conf = RuntimeConfig(self._jsparkSession.conf()) File "/home/user/gits/apache/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ File "/home/user/gits/apache/spark/python/pyspark/errors/exceptions/captured.py", line 166, in deco raise converted from None pyspark.errors.exceptions.captured.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder': JVM stacktrace: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder': at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1237) at org.apache.spark.sql.SparkSession.$anonfun$sessionState$2(SparkSession.scala:162) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:160) at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:157) at org.apache.spark.sql.SparkSession.conf$lzycompute(SparkSession.scala:185) at org.apache.spark.sql.SparkSession.conf(SparkSession.scala:185) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.HiveSessionStateBuilder at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:225) at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1232) ... 18 more ``` ### Does this PR introduce _any_ user-facing change? The only semantic change is that the conf `spark.sql.pyspark.jvmStacktrace.enabled` is ignored if the SQLConf is broken. ### How was this patch tested? Manual testing using the repro steps above Closes #40372 from gerashegalov/SPARK-42752. Authored-by: Gera Shegalov <g...@apache.org> Signed-off-by: Sean Owen <sro...@gmail.com> --- python/pyspark/errors/exceptions/captured.py | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/python/pyspark/errors/exceptions/captured.py b/python/pyspark/errors/exceptions/captured.py index 1764ed7d02c..6313665b3fe 100644 --- a/python/pyspark/errors/exceptions/captured.py +++ b/python/pyspark/errors/exceptions/captured.py @@ -65,8 +65,15 @@ class CapturedException(PySparkException): assert SparkContext._jvm is not None jvm = SparkContext._jvm - sql_conf = jvm.org.apache.spark.sql.internal.SQLConf.get() - debug_enabled = sql_conf.pysparkJVMStacktraceEnabled() + + # SPARK-42752: default to True to see issues with initialization + debug_enabled = True + try: + sql_conf = jvm.org.apache.spark.sql.internal.SQLConf.get() + debug_enabled = sql_conf.pysparkJVMStacktraceEnabled() + except BaseException: + pass + desc = self.desc if debug_enabled: desc = desc + "\n\nJVM stacktrace:\n%s" % self.stackTrace --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org