[
https://issues.apache.org/jira/browse/PHOENIX-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648565#comment-15648565
]
Josh Mahonin commented on PHOENIX-3333:
---------------------------------------
I've got a proof-of-concept version that works with Spark 2.0 here:
https://github.com/jmahonin/phoenix/tree/spark_2.0
Although this code compiles against either Spark 1.6 or Spark 2.0,
unfortunately, due to Spark changing the DataFrame API, as well as a Scala
version change, the resultant JAR isn't binary compatible with Spark versions <
2.0.
Other projects have wrestled with this in a variety of ways, e.g. HBase
[1|https://issues.apache.org/jira/browse/HBASE-16179], Cassandra
[2|https://github.com/datastax/spark-cassandra-connector/pull/996] and
ElasticSearch
[3|https://github.com/elastic/elasticsearch-hadoop/commit/43017a2566f7b50ebca1e20e96820f0d037655ff]
In terms of simplicity, dropping all support for Spark 1.6 and below would be
easiest, but least user friendly. Another option is to use maven profiles to
switch between which Spark version Phoenix gets compiled against. The down-side
there is it's not plainly obvious for those using the client JAR which version
of Spark it will be compatible with. And yet another option is to create two
client JARs which are compatible with specific Spark versions, but adds more
bloat and complexity to the existing assembly process.
I'm leaning towards using a Maven profile that defaults to Spark 2.0+, but I'd
be curious if other users (vendors?) have any opinions here.
cc [~jamestaylor] [~sergey.soldatov] [~kalyanhadoop] [~ankit.singhal] [~devaraj]
> can not load phoenix table by spark 2.0 which is working well under spark 1.6
> -----------------------------------------------------------------------------
>
> Key: PHOENIX-3333
> URL: https://issues.apache.org/jira/browse/PHOENIX-3333
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 4.8.0
> Environment: spark 2.0 ,phoenix 4.8.0 , os is centos 6.7 ,hadoop is
> hdp 2.5
> Reporter: dalin qin
>
> spark version is 2.0.0.2.5.0.0-1245
> As mentioned by Josh , I believe spark 2.0 changed their api so that failed
> phoenix. Please come up with update version to adapt spark's change.
> In [1]: df = sqlContext.read \
> ...: .format("org.apache.phoenix.spark") \
> ...: .option("table", "TABLE1") \
> ...: .option("zkUrl", "namenode:2181:/hbase-unsecure") \
> ...: .load()
> ---------------------------------------------------------------------------
> Py4JJavaError Traceback (most recent call last)
> <ipython-input-1-e5dfb7bbb28b> in <module>()
> ----> 1 df = sqlContext.read .format("org.apache.phoenix.spark")
> .option("table", "TABLE1") .option("zkUrl",
> "namenode:2181:/hbase-unsecure") .load()
> /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/readwriter.pyc in load(self,
> path, format, schema, **options)
> 151 return
> self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
> 152 else:
> --> 153 return self._df(self._jreader.load())
> 154
> 155 @since(1.4)
> /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py
> in __call__(self, *args)
> 931 answer = self.gateway_client.send_command(command)
> 932 return_value = get_return_value(
> --> 933 answer, self.gateway_client, self.target_id, self.name)
> 934
> 935 for temp_arg in temp_args:
> /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/utils.pyc in deco(*a, **kw)
> 61 def deco(*a, **kw):
> 62 try:
> ---> 63 return f(*a, **kw)
> 64 except py4j.protocol.Py4JJavaError as e:
> 65 s = e.java_exception.toString()
> /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py
> in get_return_value(answer, gateway_client, target_id, name)
> 310 raise Py4JJavaError(
> 311 "An error occurred while calling {0}{1}{2}.\n".
> --> 312 format(target_id, ".", name), value)
> 313 else:
> 314 raise Py4JError(
> Py4JJavaError: An error occurred while calling o43.load.
> : java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame
> at java.lang.Class.getDeclaredMethods0(Native Method)
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> at java.lang.Class.getDeclaredMethod(Class.java:2128)
> at
> java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475)
> at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:472)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
> at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
> at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
> at
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
> at
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
> at
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
> at org.apache.spark.SparkContext.clean(SparkContext.scala:2037)
> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:366)
> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:365)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
> at org.apache.spark.rdd.RDD.map(RDD.scala:365)
> at
> org.apache.phoenix.spark.PhoenixRDD.toDataFrame(PhoenixRDD.scala:119)
> at
> org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:59)
> at
> org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:40)
> at
> org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382)
> at
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143)
> at
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:280)
> at
> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:211)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.DataFrame
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 45 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)