[ https://issues.apache.org/jira/browse/SPARK-27623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951375#comment-16951375 ]
Tom Tang edited comment on SPARK-27623 at 10/14/19 9:41 PM: ------------------------------------------------------------ I found the same issue with spark 2.4.3, and when I fall back to spark-avro 2.11, that seems solve the issue. {{ spark-sql --packages org.apache.spark:spark-avro_2.11:2.4.3 pyspark --packages org.apache.spark:spark-avro_2.11:2.4.3 }} was (Author: tomtang): I found the same issue with spark 2.4.3, and when I fall back to spark-avro 2.11, that seems solve the issue. ``` spark-sql --packages org.apache.spark:spark-avro_2.11:2.4.3 pyspark --packages org.apache.spark:spark-avro_2.11:2.4.3 ``` > Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated > --------------------------------------------------------------------------- > > Key: SPARK-27623 > URL: https://issues.apache.org/jira/browse/SPARK-27623 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.4.2 > Reporter: Alexandru Barbulescu > Priority: Major > > After updating to spark 2.4.2 when using the > {code:java} > spark.read.format().options().load() > {code} > > chain of methods, regardless of what parameter is passed to "format" we get > the following error related to avro: > > {code:java} > - .options(**load_options) > - File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line > 172, in load > - File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line > 1257, in __call__ > - File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in > deco > - File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line > 328, in get_return_value > - py4j.protocol.Py4JJavaError: An error occurred while calling o69.load. > - : java.util.ServiceConfigurationError: > org.apache.spark.sql.sources.DataSourceRegister: Provider > org.apache.spark.sql.avro.AvroFileFormat could not be instantiated > - at java.util.ServiceLoader.fail(ServiceLoader.java:232) > - at java.util.ServiceLoader.access$100(ServiceLoader.java:185) > - at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384) > - at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) > - at java.util.ServiceLoader$1.next(ServiceLoader.java:480) > - at > scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44) > - at scala.collection.Iterator.foreach(Iterator.scala:941) > - at scala.collection.Iterator.foreach$(Iterator.scala:941) > - at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > - at scala.collection.IterableLike.foreach(IterableLike.scala:74) > - at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > - at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > - at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:250) > - at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:248) > - at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108) > - at scala.collection.TraversableLike.filter(TraversableLike.scala:262) > - at scala.collection.TraversableLike.filter$(TraversableLike.scala:262) > - at scala.collection.AbstractTraversable.filter(Traversable.scala:108) > - at > org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630) > - at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194) > - at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167) > - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > - at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > - at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > - at java.lang.reflect.Method.invoke(Method.java:498) > - at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > - at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > - at py4j.Gateway.invoke(Gateway.java:282) > - at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > - at py4j.commands.CallCommand.execute(CallCommand.java:79) > - at py4j.GatewayConnection.run(GatewayConnection.java:238) > - at java.lang.Thread.run(Thread.java:748) > - Caused by: java.lang.NoClassDefFoundError: > org/apache/spark/sql/execution/datasources/FileFormat$class > - at org.apache.spark.sql.avro.AvroFileFormat.<init>(AvroFileFormat.scala:44) > - at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > - at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > - at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > - at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > - at java.lang.Class.newInstance(Class.java:442) > - at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380) > - ... 29 more > - Caused by: java.lang.ClassNotFoundException: > org.apache.spark.sql.execution.datasources.FileFormat$class > - at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > - at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > - at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > - at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > - ... 36 more > {code} > > The code we run looks like this: > > {code:java} > spark_session = ( > SparkSession.builder > .appName(APPLICATION_NAME) > .master(MASTER_URL) > .config('spark.cassandra.connection.host', SERVER_IP_ADDRESS) > .config('spark.cassandra.auth.username', CASSANDRA_USERNAME) > .config('spark.cassandra.auth.password', CASSANDRA_PASSWORD) > .config('spark.sql.shuffle.partitions', 16) > .config('parquet.enable.summary-metadata', 'true') > .getOrCreate()) > load_options = { > 'keyspace': CASSANDRA_KEYSPACE, > 'table': TABLE_NAME, > 'spark.cassandra.input.fetch.size_in_rows': '150' } > df = (spark_session.read.format('org.apache.spark.sql.cassandra') > .options(**load_options) > .load()) > {code} > > We get the exact same error when trying to read a local .avro file instead of > from Cassandra. > Up to now we included the .jar file for Spark-Avro using the spark-submit > --jars option. The version of Spark-Avro that we used, and worked with Spark > 2.4.1, was Spark-Avro 2.4.0. > In an attempt to fix this problem we tried updating the .jar file version. We > also tried using the --packages option, with different version combinations, > but none of these solutions worked. The same error shows up every time. > When rolling back to Spark 2.4.1 with the exact same setup and code, the > error doesn't show up and everything works fine. > Any ideas on what could be causing this? > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org