Alexandru Barbulescu created SPARK-27623:
--------------------------------------------
Summary: Provider org.apache.spark.sql.avro.AvroFileFormat could
not be instantiated
Key: SPARK-27623
URL: https://issues.apache.org/jira/browse/SPARK-27623
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 2.4.2
Reporter: Alexandru Barbulescu
After updating to spark 2.4.2 when using the
{code:java}
spark.read.format().options().load()
{code}
chain of methods, regardless of what parameter is passed to "format" we get the
following error related to avro:
{code:java}
- .options(**load_options)
- File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 172,
in load
- File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line
1257, in __call__
- File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in
deco
- File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328,
in get_return_value
- py4j.protocol.Py4JJavaError: An error occurred while calling o69.load.
- : java.util.ServiceConfigurationError:
org.apache.spark.sql.sources.DataSourceRegister: Provider
org.apache.spark.sql.avro.AvroFileFormat could not be instantiated
- at java.util.ServiceLoader.fail(ServiceLoader.java:232)
- at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
- at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
- at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
- at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
- at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44)
- at scala.collection.Iterator.foreach(Iterator.scala:941)
- at scala.collection.Iterator.foreach$(Iterator.scala:941)
- at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
- at scala.collection.IterableLike.foreach(IterableLike.scala:74)
- at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
- at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
- at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:250)
- at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:248)
- at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
- at scala.collection.TraversableLike.filter(TraversableLike.scala:262)
- at scala.collection.TraversableLike.filter$(TraversableLike.scala:262)
- at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
- at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
- at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
- at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
- at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
- at java.lang.reflect.Method.invoke(Method.java:498)
- at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
- at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
- at py4j.Gateway.invoke(Gateway.java:282)
- at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
- at py4j.commands.CallCommand.execute(CallCommand.java:79)
- at py4j.GatewayConnection.run(GatewayConnection.java:238)
- at java.lang.Thread.run(Thread.java:748)
- Caused by: java.lang.NoClassDefFoundError:
org/apache/spark/sql/execution/datasources/FileFormat$class
- at org.apache.spark.sql.avro.AvroFileFormat.<init>(AvroFileFormat.scala:44)
- at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
- at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
- at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
- at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
- at java.lang.Class.newInstance(Class.java:442)
- at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
- ... 29 more
- Caused by: java.lang.ClassNotFoundException:
org.apache.spark.sql.execution.datasources.FileFormat$class
- at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
- at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
- ... 36 more
{code}
The code we run looks like this:
{code:java}
spark_session = (
SparkSession.builder
.appName(APPLICATION_NAME)
.master(MASTER_URL)
.config('spark.cassandra.connection.host', SERVER_IP_ADDRESS)
.config('spark.cassandra.auth.username', CASSANDRA_USERNAME)
.config('spark.cassandra.auth.password', CASSANDRA_PASSWORD)
.config('spark.sql.shuffle.partitions', 16)
.config('parquet.enable.summary-metadata', 'true')
.getOrCreate())
load_options = {
'keyspace': CASSANDRA_KEYSPACE,
'table': TABLE_NAME,
'spark.cassandra.input.fetch.size_in_rows': '150' }
df = (spark_session.read.format('org.apache.spark.sql.cassandra')
.options(**load_options)
.load())
{code}
We get the exact same error when trying to read a local .avro file instead of
from Cassandra.
Up to now we included the .jar file for Spark-Avro using the spark-submit
--jars option. The version of Spark-Avro that we used, and worked with Spark
2.4.1, was Spark-Avro 2.4.0.
In an attempt to fix this problem we tried updating the .jar file version. We
also tried using the --packages option, with different version combinations,
but none of these solutions worked. The same error shows up every time.
When rolling back to Spark 2.4.1 with the exact same setup and code, the error
doesn't show up and everything works fine.
Any ideas on what could be causing this?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]