[GitHub] [systemml] niketanpansare commented on issue #857: [SYSTEMML-2523] Update SystemML to Support Spark 2.3.0

GitBox Thu, 21 Mar 2019 13:44:39 -0700

niketanpansare commented on issue #857: [SYSTEMML-2523] Update SystemML to 
Support Spark 2.3.0
URL: https://github.com/apache/systemml/pull/857#issuecomment-475394794
 
 
   @romeokienzler You are getting the error because the setup contains two 
SystemML (possibly conflicting dependencies) jars. There are two possible 
solutions to your problem:
   1. *Recommended:* Remove the older incubating jar and do not include the 
corresponding 1.2.0 or 1.3.0-snapshot jars (i.e. no need for `ln -s` trick).
   2. Use the python package compiled by this PR.
   
   Since there are is weird behavior, I am including the logs. I apologize it 
advance for the long trace, but I felt it shed some light on the error. Please 
ignore the below logs if you agree to the above statements:
   
   Setup 1. With only incubating jar (FAILS !!)
   ```
   $ ~/spark-2.3.0-bin-hadoop2.7/bin/pyspark --driver-memory 20g --master 
local[*] --driver-class-path systemml-0.14.0-incubating.jar
   Python 3.6.3 (default, Mar 20 2018, 13:50:41) 
   [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   2019-03-21 13:07:11 WARN  NativeCodeLoader:62 - Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /__ / .__/\_,_/_/ /_/\_\   version 2.3.0
         /_/
   
   Using Python version 3.6.3 (default, Mar 20 2018 13:50:41)
   SparkSession available as 'spark'.
   >>> from systemml import MLContext
   >>> ml = MLContext(spark)
   2019-03-21 13:07:20 WARN  ObjectStore:568 - Failed to get database 
global_temp, returning NoSuchObjectException
   
   Welcome to Apache SystemML!
   
   >>> ml.version()
   '0.14.0-incubating'
   >>> df=spark.read.parquet('shake.parquet')
   >>> df.show()
   +-----+---------+-----+-----+-----+
   |CLASS| SENSORID|    X|    Y|    Z|
   +-----+---------+-----+-----+-----+
   |    2| qqqqqqqq| 0.12| 0.12| 0.12|
   |    2|aUniqueID| 0.03| 0.03| 0.03|
   |    2| qqqqqqqq|-3.84|-3.84|-3.84|
   |    2| 12345678| -0.1| -0.1| -0.1|
   |    2| 12345678|-0.15|-0.15|-0.15|
   |    2| 12345678| 0.47| 0.47| 0.47|
   |    2| 12345678|-0.06|-0.06|-0.06|
   |    2| 12345678|-0.09|-0.09|-0.09|
   |    2| 12345678| 0.21| 0.21| 0.21|
   |    2| 12345678|-0.08|-0.08|-0.08|
   |    2| 12345678| 0.44| 0.44| 0.44|
   |    2|    gholi| 0.76| 0.76| 0.76|
   |    2|    gholi| 1.62| 1.62| 1.62|
   |    2|    gholi| 5.81| 5.81| 5.81|
   |    2| bcbcbcbc| 0.58| 0.58| 0.58|
   |    2| bcbcbcbc|-8.24|-8.24|-8.24|
   |    2| bcbcbcbc|-0.45|-0.45|-0.45|
   |    2| bcbcbcbc| 1.03| 1.03| 1.03|
   |    2|aUniqueID|-0.05|-0.05|-0.05|
   |    2| qqqqqqqq|-0.44|-0.44|-0.44|
   +-----+---------+-----+-----+-----+
   only showing top 20 rows
   
   >>> df.createOrReplaceTempView("df")
   ANTLR Tool version 4.7 used for code generation does not match the current 
runtime version 4.5.3ANTLR Runtime version 4.7 used for parser compilation does 
not match the current runtime version 4.5.3Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File 
"/home/npansar/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/dataframe.py", line 
176, in createOrReplaceTempView
       self._jdf.createOrReplaceTempView(name)
     File 
"/home/npansar/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py",
 line 1160, in __call__
     File 
"/home/npansar/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, 
in deco
       return f(*a, **kw)
     File 
"/home/npansar/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py",
 line 320, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling 
o52.createOrReplaceTempView.
   : java.lang.ExceptionInInitializerError
        at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:84)
        at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
        at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseTableIdentifier(ParseDriver.scala:49)
        at 
org.apache.spark.sql.Dataset.createTempViewCommand(Dataset.scala:3079)
        at 
org.apache.spark.sql.Dataset.createOrReplaceTempView(Dataset.scala:3034)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.UnsupportedOperationException: 
java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; Could not 
deserialize ATN with UUID 59627784-3be5-417a-b9eb-8131a7286089 (expected 
aadb8d7e-aeef-4415-ad2b-8204d6cf042e or a legacy UUID).
        at 
org.antlr.v4.runtime.atn.ATNDeserializer.deserialize(ATNDeserializer.java:153)
        at 
org.apache.spark.sql.catalyst.parser.SqlBaseLexer.<clinit>(SqlBaseLexer.java:1153)
        ... 16 more
   Caused by: java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; 
Could not deserialize ATN with UUID 59627784-3be5-417a-b9eb-8131a7286089 
(expected aadb8d7e-aeef-4415-ad2b-8204d6cf042e or a legacy UUID).
        ... 18 more
   
   >>>
   ```
   
   Setup 2: Put the older incubating jar before the current SystemML 1.2.0 jars 
(FAILS !!)
   ```
   $ ~/spark-2.3.0-bin-hadoop2.7/bin/pyspark --driver-memory 20g --master 
local[*] --driver-class-path 
systemml-0.14.0-incubating.jar:systemml-1.2.0-extra.jar:systemml-1.2.0.jar
   Python 3.6.3 (default, Mar 20 2018, 13:50:41) 
   [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   2019-03-21 13:12:11 WARN  NativeCodeLoader:62 - Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /__ / .__/\_,_/_/ /_/\_\   version 2.3.0
         /_/
   
   Using Python version 3.6.3 (default, Mar 20 2018 13:50:41)
   SparkSession available as 'spark'.
   >>> from systemml import MLContext
   >>> ml = MLContext(spark)
   2019-03-21 13:12:21 WARN  ObjectStore:568 - Failed to get database 
global_temp, returning NoSuchObjectException
   
   Welcome to Apache SystemML!
   
   >>> ml.version()
   '0.14.0-incubating'
   >>> df=spark.read.parquet('shake.parquet')
   >>> df.show()
   +-----+---------+-----+-----+-----+
   |CLASS| SENSORID|    X|    Y|    Z|
   +-----+---------+-----+-----+-----+
   |    2| qqqqqqqq| 0.12| 0.12| 0.12|
   |    2|aUniqueID| 0.03| 0.03| 0.03|
   |    2| qqqqqqqq|-3.84|-3.84|-3.84|
   |    2| 12345678| -0.1| -0.1| -0.1|
   |    2| 12345678|-0.15|-0.15|-0.15|
   |    2| 12345678| 0.47| 0.47| 0.47|
   |    2| 12345678|-0.06|-0.06|-0.06|
   |    2| 12345678|-0.09|-0.09|-0.09|
   |    2| 12345678| 0.21| 0.21| 0.21|
   |    2| 12345678|-0.08|-0.08|-0.08|
   |    2| 12345678| 0.44| 0.44| 0.44|
   |    2|    gholi| 0.76| 0.76| 0.76|
   |    2|    gholi| 1.62| 1.62| 1.62|
   |    2|    gholi| 5.81| 5.81| 5.81|
   |    2| bcbcbcbc| 0.58| 0.58| 0.58|
   |    2| bcbcbcbc|-8.24|-8.24|-8.24|
   |    2| bcbcbcbc|-0.45|-0.45|-0.45|
   |    2| bcbcbcbc| 1.03| 1.03| 1.03|
   |    2|aUniqueID|-0.05|-0.05|-0.05|
   |    2| qqqqqqqq|-0.44|-0.44|-0.44|
   +-----+---------+-----+-----+-----+
   only showing top 20 rows
   
   >>> df.createOrReplaceTempView("df")
   ANTLR Tool version 4.7 used for code generation does not match the current 
runtime version 4.5.3ANTLR Runtime version 4.7 used for parser compilation does 
not match the current runtime version 4.5.3Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File 
"/home/npansar/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/dataframe.py", line 
176, in createOrReplaceTempView
       self._jdf.createOrReplaceTempView(name)
     File 
"/home/npansar/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py",
 line 1160, in __call__
     File 
"/home/npansar/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, 
in deco
       return f(*a, **kw)
     File 
"/home/npansar/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py",
 line 320, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling 
o52.createOrReplaceTempView.
   : java.lang.ExceptionInInitializerError
        at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:84)
        at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
        at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseTableIdentifier(ParseDriver.scala:49)
        at 
org.apache.spark.sql.Dataset.createTempViewCommand(Dataset.scala:3079)
        at 
org.apache.spark.sql.Dataset.createOrReplaceTempView(Dataset.scala:3034)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.UnsupportedOperationException: 
java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; Could not 
deserialize ATN with UUID 59627784-3be5-417a-b9eb-8131a7286089 (expected 
aadb8d7e-aeef-4415-ad2b-8204d6cf042e or a legacy UUID).
        at 
org.antlr.v4.runtime.atn.ATNDeserializer.deserialize(ATNDeserializer.java:153)
        at 
org.apache.spark.sql.catalyst.parser.SqlBaseLexer.<clinit>(SqlBaseLexer.java:1153)
        ... 16 more
   Caused by: java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; 
Could not deserialize ATN with UUID 59627784-3be5-417a-b9eb-8131a7286089 
(expected aadb8d7e-aeef-4415-ad2b-8204d6cf042e or a legacy UUID).
        ... 18 more
   
   >>>
   ```
   
   Setup 3: Put the the current SystemML 1.2.0 jars before the older incubating 
jar (FAILS !!)
   ```
   $ ~/spark-2.3.0-bin-hadoop2.7/bin/pyspark --driver-memory 20g --master 
local[*] --driver-class-path 
systemml-1.2.0-extra.jar:systemml-1.2.0.jar:systemml-0.14.0-incubating.jar
   Python 3.6.3 (default, Mar 20 2018, 13:50:41) 
   [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   2019-03-21 13:14:49 WARN  NativeCodeLoader:62 - Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /__ / .__/\_,_/_/ /_/\_\   version 2.3.0
         /_/
   
   Using Python version 3.6.3 (default, Mar 20 2018 13:50:41)
   SparkSession available as 'spark'.
   >>> from systemml import MLContext
   >>> ml = MLContext(spark)
   2019-03-21 13:15:11 WARN  ObjectStore:568 - Failed to get database 
global_temp, returning NoSuchObjectException
   
   Welcome to Apache SystemML!
   Version 1.2.0
   >>> ml.version()
   '1.2.0'
   >>> df=spark.read.parquet('shake.parquet')
   >>> df.show()
   +-----+---------+-----+-----+-----+
   |CLASS| SENSORID|    X|    Y|    Z|
   +-----+---------+-----+-----+-----+
   |    2| qqqqqqqq| 0.12| 0.12| 0.12|
   |    2|aUniqueID| 0.03| 0.03| 0.03|
   |    2| qqqqqqqq|-3.84|-3.84|-3.84|
   |    2| 12345678| -0.1| -0.1| -0.1|
   |    2| 12345678|-0.15|-0.15|-0.15|
   |    2| 12345678| 0.47| 0.47| 0.47|
   |    2| 12345678|-0.06|-0.06|-0.06|
   |    2| 12345678|-0.09|-0.09|-0.09|
   |    2| 12345678| 0.21| 0.21| 0.21|
   |    2| 12345678|-0.08|-0.08|-0.08|
   |    2| 12345678| 0.44| 0.44| 0.44|
   |    2|    gholi| 0.76| 0.76| 0.76|
   |    2|    gholi| 1.62| 1.62| 1.62|
   |    2|    gholi| 5.81| 5.81| 5.81|
   |    2| bcbcbcbc| 0.58| 0.58| 0.58|
   |    2| bcbcbcbc|-8.24|-8.24|-8.24|
   |    2| bcbcbcbc|-0.45|-0.45|-0.45|
   |    2| bcbcbcbc| 1.03| 1.03| 1.03|
   |    2|aUniqueID|-0.05|-0.05|-0.05|
   |    2| qqqqqqqq|-0.44|-0.44|-0.44|
   +-----+---------+-----+-----+-----+
   only showing top 20 rows
   
   >>> df.createOrReplaceTempView("df")
   ANTLR Tool version 4.7 used for code generation does not match the current 
runtime version 4.5.3ANTLR Runtime version 4.7 used for parser compilation does 
not match the current runtime version 4.5.3Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File 
"/home/npansar/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/dataframe.py", line 
176, in createOrReplaceTempView
       self._jdf.createOrReplaceTempView(name)
     File 
"/home/npansar/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py",
 line 1160, in __call__
     File 
"/home/npansar/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, 
in deco
       return f(*a, **kw)
     File 
"/home/npansar/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py",
 line 320, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling 
o52.createOrReplaceTempView.
   : java.lang.ExceptionInInitializerError
        at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:84)
        at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
        at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseTableIdentifier(ParseDriver.scala:49)
        at 
org.apache.spark.sql.Dataset.createTempViewCommand(Dataset.scala:3079)
        at 
org.apache.spark.sql.Dataset.createOrReplaceTempView(Dataset.scala:3034)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.UnsupportedOperationException: 
java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; Could not 
deserialize ATN with UUID 59627784-3be5-417a-b9eb-8131a7286089 (expected 
aadb8d7e-aeef-4415-ad2b-8204d6cf042e or a legacy UUID).
        at 
org.antlr.v4.runtime.atn.ATNDeserializer.deserialize(ATNDeserializer.java:153)
        at 
org.apache.spark.sql.catalyst.parser.SqlBaseLexer.<clinit>(SqlBaseLexer.java:1153)
        ... 16 more
   Caused by: java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; 
Could not deserialize ATN with UUID 59627784-3be5-417a-b9eb-8131a7286089 
(expected aadb8d7e-aeef-4415-ad2b-8204d6cf042e or a legacy UUID).
        ... 18 more
   
   >>>
   ```
   
   Setup 4: Put the jar from the PR before the older incubating jar (SUCCEEDS 
!!)
   ```
   $ ~/spark-2.3.0-bin-hadoop2.7/bin/pyspark --driver-memory 20g --master 
local[*] --driver-class-path 
systemml-1.3.0-SNAPSHOT-extra-pr.jar:systemml-1.3.0-SNAPSHOT-pr.jar:systemml-0.14.0-incubating.jar
   Python 3.6.3 (default, Mar 20 2018, 13:50:41) 
   [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   2019-03-21 13:19:59 WARN  NativeCodeLoader:62 - Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /__ / .__/\_,_/_/ /_/\_\   version 2.3.0
         /_/
   
   Using Python version 3.6.3 (default, Mar 20 2018 13:50:41)
   SparkSession available as 'spark'.
   >>> from systemml import MLContext
   >>> ml = MLContext(spark)
   2019-03-21 13:20:22 WARN  ObjectStore:568 - Failed to get database 
global_temp, returning NoSuchObjectException
   
   Welcome to Apache SystemML!
   Version 1.3.0-SNAPSHOT
   >>> ml.version()
   '1.3.0-SNAPSHOT'
   >>> df=spark.read.parquet('shake.parquet')
   >>> df.show()
   +-----+---------+-----+-----+-----+
   |CLASS| SENSORID|    X|    Y|    Z|
   +-----+---------+-----+-----+-----+
   |    2| qqqqqqqq| 0.12| 0.12| 0.12|
   |    2|aUniqueID| 0.03| 0.03| 0.03|
   |    2| qqqqqqqq|-3.84|-3.84|-3.84|
   |    2| 12345678| -0.1| -0.1| -0.1|
   |    2| 12345678|-0.15|-0.15|-0.15|
   |    2| 12345678| 0.47| 0.47| 0.47|
   |    2| 12345678|-0.06|-0.06|-0.06|
   |    2| 12345678|-0.09|-0.09|-0.09|
   |    2| 12345678| 0.21| 0.21| 0.21|
   |    2| 12345678|-0.08|-0.08|-0.08|
   |    2| 12345678| 0.44| 0.44| 0.44|
   |    2|    gholi| 0.76| 0.76| 0.76|
   |    2|    gholi| 1.62| 1.62| 1.62|
   |    2|    gholi| 5.81| 5.81| 5.81|
   |    2| bcbcbcbc| 0.58| 0.58| 0.58|
   |    2| bcbcbcbc|-8.24|-8.24|-8.24|
   |    2| bcbcbcbc|-0.45|-0.45|-0.45|
   |    2| bcbcbcbc| 1.03| 1.03| 1.03|
   |    2|aUniqueID|-0.05|-0.05|-0.05|
   |    2| qqqqqqqq|-0.44|-0.44|-0.44|
   +-----+---------+-----+-----+-----+
   only showing top 20 rows
   
   >>> df.createOrReplaceTempView("df")
   >>>
   ```
   
   Setup 5: No jar provided (SUCCEEDS !!)
   
   ```
   $ ~/spark-2.3.0-bin-hadoop2.7/bin/pyspark --driver-memory 20g --master 
local[*]
   Python 3.6.3 (default, Mar 20 2018, 13:50:41) 
   [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   2019-03-21 13:23:26 WARN  NativeCodeLoader:62 - Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /__ / .__/\_,_/_/ /_/\_\   version 2.3.0
         /_/
   
   Using Python version 3.6.3 (default, Mar 20 2018 13:50:41)
   SparkSession available as 'spark'.
   >>> from systemml import MLContext
   >>> ml = MLContext(spark)
   2019-03-21 13:23:46 WARN  ObjectStore:568 - Failed to get database 
global_temp, returning NoSuchObjectException
   
   Welcome to Apache SystemML!
   Version 1.2.0
   >>> ml.version()
   '1.2.0'
   >>> df=spark.read.parquet('shake.parquet')
   >>> df.show()
   +-----+---------+-----+-----+-----+
   |CLASS| SENSORID|    X|    Y|    Z|
   +-----+---------+-----+-----+-----+
   |    2| qqqqqqqq| 0.12| 0.12| 0.12|
   |    2|aUniqueID| 0.03| 0.03| 0.03|
   |    2| qqqqqqqq|-3.84|-3.84|-3.84|
   |    2| 12345678| -0.1| -0.1| -0.1|
   |    2| 12345678|-0.15|-0.15|-0.15|
   |    2| 12345678| 0.47| 0.47| 0.47|
   |    2| 12345678|-0.06|-0.06|-0.06|
   |    2| 12345678|-0.09|-0.09|-0.09|
   |    2| 12345678| 0.21| 0.21| 0.21|
   |    2| 12345678|-0.08|-0.08|-0.08|
   |    2| 12345678| 0.44| 0.44| 0.44|
   |    2|    gholi| 0.76| 0.76| 0.76|
   |    2|    gholi| 1.62| 1.62| 1.62|
   |    2|    gholi| 5.81| 5.81| 5.81|
   |    2| bcbcbcbc| 0.58| 0.58| 0.58|
   |    2| bcbcbcbc|-8.24|-8.24|-8.24|
   |    2| bcbcbcbc|-0.45|-0.45|-0.45|
   |    2| bcbcbcbc| 1.03| 1.03| 1.03|
   |    2|aUniqueID|-0.05|-0.05|-0.05|
   |    2| qqqqqqqq|-0.44|-0.44|-0.44|
   +-----+---------+-----+-----+-----+
   only showing top 20 rows
   
   >>> df.createOrReplaceTempView("df")
   >>>
   ```
   
   Setup 6: Provide just `1.2.0` jars (FAILS !!)
   ```
   $ ~/spark-2.3.0-bin-hadoop2.7/bin/pyspark --driver-memory 20g --master 
local[*] --driver-class-path systemml-1.2.0
   systemml-1.2.0-extra.jar  systemml-1.2.0.jar        
   [npansar@dml3 debug_classpath]$ ~/spark-2.3.0-bin-hadoop2.7/bin/pyspark 
--driver-memory 20g --master local[*] --driver-class-path 
systemml-1.2.0.jar:systemml-1.2.0-extra.jar
   Python 3.6.3 (default, Mar 20 2018, 13:50:41) 
   [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
   Type "help", "copyright", "credits" or "license" for more information.
   2019-03-21 13:32:09 WARN  NativeCodeLoader:62 - Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /__ / .__/\_,_/_/ /_/\_\   version 2.3.0
         /_/
   
   Using Python version 3.6.3 (default, Mar 20 2018 13:50:41)
   SparkSession available as 'spark'.
   >>> from systemml import MLContext
   >>> ml = MLContext(spark)
   2019-03-21 13:32:25 WARN  ObjectStore:568 - Failed to get database 
global_temp, returning NoSuchObjectException
   
   Welcome to Apache SystemML!
   Version 1.2.0
   >>> ml.version()
   '1.2.0'
   >>> df=spark.read.parquet('shake.parquet')
   >>> df.show()
   +-----+---------+-----+-----+-----+
   |CLASS| SENSORID|    X|    Y|    Z|
   +-----+---------+-----+-----+-----+
   |    2| qqqqqqqq| 0.12| 0.12| 0.12|
   |    2|aUniqueID| 0.03| 0.03| 0.03|
   |    2| qqqqqqqq|-3.84|-3.84|-3.84|
   |    2| 12345678| -0.1| -0.1| -0.1|
   |    2| 12345678|-0.15|-0.15|-0.15|
   |    2| 12345678| 0.47| 0.47| 0.47|
   |    2| 12345678|-0.06|-0.06|-0.06|
   |    2| 12345678|-0.09|-0.09|-0.09|
   |    2| 12345678| 0.21| 0.21| 0.21|
   |    2| 12345678|-0.08|-0.08|-0.08|
   |    2| 12345678| 0.44| 0.44| 0.44|
   |    2|    gholi| 0.76| 0.76| 0.76|
   |    2|    gholi| 1.62| 1.62| 1.62|
   |    2|    gholi| 5.81| 5.81| 5.81|
   |    2| bcbcbcbc| 0.58| 0.58| 0.58|
   |    2| bcbcbcbc|-8.24|-8.24|-8.24|
   |    2| bcbcbcbc|-0.45|-0.45|-0.45|
   |    2| bcbcbcbc| 1.03| 1.03| 1.03|
   |    2|aUniqueID|-0.05|-0.05|-0.05|
   |    2| qqqqqqqq|-0.44|-0.44|-0.44|
   +-----+---------+-----+-----+-----+
   only showing top 20 rows
   
   >>> df.createOrReplaceTempView("df")
   ANTLR Tool version 4.7 used for code generation does not match the current 
runtime version 4.5.3ANTLR Runtime version 4.7 used for parser compilation does 
not match the current runtime version 4.5.3Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File 
"/home/npansar/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/dataframe.py", line 
176, in createOrReplaceTempView
       self._jdf.createOrReplaceTempView(name)
     File 
"/home/npansar/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py",
 line 1160, in __call__
     File 
"/home/npansar/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, 
in deco
       return f(*a, **kw)
     File 
"/home/npansar/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py",
 line 320, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling 
o52.createOrReplaceTempView.
   : java.lang.ExceptionInInitializerError
        at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:84)
        at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
        at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseTableIdentifier(ParseDriver.scala:49)
        at 
org.apache.spark.sql.Dataset.createTempViewCommand(Dataset.scala:3079)
        at 
org.apache.spark.sql.Dataset.createOrReplaceTempView(Dataset.scala:3034)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.UnsupportedOperationException: 
java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; Could not 
deserialize ATN with UUID 59627784-3be5-417a-b9eb-8131a7286089 (expected 
aadb8d7e-aeef-4415-ad2b-8204d6cf042e or a legacy UUID).
        at 
org.antlr.v4.runtime.atn.ATNDeserializer.deserialize(ATNDeserializer.java:153)
        at 
org.apache.spark.sql.catalyst.parser.SqlBaseLexer.<clinit>(SqlBaseLexer.java:1153)
        ... 16 more
   Caused by: java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; 
Could not deserialize ATN with UUID 59627784-3be5-417a-b9eb-8131a7286089 
(expected aadb8d7e-aeef-4415-ad2b-8204d6cf042e or a legacy UUID).
        ... 18 more
   
   >>> 
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [systemml] niketanpansare commented on issue #857: [SYSTEMML-2523] Update SystemML to Support Spark 2.3.0

Reply via email to