Alexandru Barbulescu created SPARK-27623:
--------------------------------------------

             Summary: Provider org.apache.spark.sql.avro.AvroFileFormat could 
not be instantiated
                 Key: SPARK-27623
                 URL: https://issues.apache.org/jira/browse/SPARK-27623
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.4.2
            Reporter: Alexandru Barbulescu


After updating to spark 2.4.2 when using the 
{code:java}
spark.read.format().options().load()
{code}
 

chain of methods, regardless of what parameter is passed to "format" we get the 
following error related to avro:

 
{code:java}
- .options(**load_options)
- File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 172, 
in load
- File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 
1257, in __call__
- File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in 
deco
- File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, 
in get_return_value
- py4j.protocol.Py4JJavaError: An error occurred while calling o69.load.
- : java.util.ServiceConfigurationError: 
org.apache.spark.sql.sources.DataSourceRegister: Provider 
org.apache.spark.sql.avro.AvroFileFormat could not be instantiated
- at java.util.ServiceLoader.fail(ServiceLoader.java:232)
- at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
- at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
- at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
- at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
- at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44)
- at scala.collection.Iterator.foreach(Iterator.scala:941)
- at scala.collection.Iterator.foreach$(Iterator.scala:941)
- at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
- at scala.collection.IterableLike.foreach(IterableLike.scala:74)
- at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
- at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
- at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:250)
- at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:248)
- at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
- at scala.collection.TraversableLike.filter(TraversableLike.scala:262)
- at scala.collection.TraversableLike.filter$(TraversableLike.scala:262)
- at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
- at 
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
- at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
- at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
- at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
- at java.lang.reflect.Method.invoke(Method.java:498)
- at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
- at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
- at py4j.Gateway.invoke(Gateway.java:282)
- at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
- at py4j.commands.CallCommand.execute(CallCommand.java:79)
- at py4j.GatewayConnection.run(GatewayConnection.java:238)
- at java.lang.Thread.run(Thread.java:748)
- Caused by: java.lang.NoClassDefFoundError: 
org/apache/spark/sql/execution/datasources/FileFormat$class
- at org.apache.spark.sql.avro.AvroFileFormat.<init>(AvroFileFormat.scala:44)
- at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
- at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
- at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
- at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
- at java.lang.Class.newInstance(Class.java:442)
- at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
- ... 29 more
- Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.sql.execution.datasources.FileFormat$class
- at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
- at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
- ... 36 more

{code}
 

The code we run looks like this:

 
{code:java}
spark_session = (
 SparkSession.builder
 .appName(APPLICATION_NAME)
 .master(MASTER_URL)
 .config('spark.cassandra.connection.host', SERVER_IP_ADDRESS)
 .config('spark.cassandra.auth.username', CASSANDRA_USERNAME)
 .config('spark.cassandra.auth.password', CASSANDRA_PASSWORD)
 .config('spark.sql.shuffle.partitions', 16)
 .config('parquet.enable.summary-metadata', 'true')
 .getOrCreate())


 load_options = {
 'keyspace': CASSANDRA_KEYSPACE,
 'table': TABLE_NAME,
 'spark.cassandra.input.fetch.size_in_rows': '150' }


 df = (spark_session.read.format('org.apache.spark.sql.cassandra')
 .options(**load_options)
 .load())
{code}
 

We get the exact same error when trying to read a local .avro file instead of 
from Cassandra.

Up to now we included the .jar file for Spark-Avro using the spark-submit 
--jars option. The version of Spark-Avro that we used, and worked with Spark 
2.4.1, was Spark-Avro 2.4.0.

In an attempt to fix this problem we tried updating the .jar file version. We 
also tried using the --packages option, with different version combinations, 
but none of these solutions worked. The same error shows up every time. 

When rolling back to Spark 2.4.1 with the exact same setup and code, the error 
doesn't show up and everything works fine. 

Any ideas on what could be causing this?

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to