[ https://issues.apache.org/jira/browse/SPARK-26801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758869#comment-16758869 ]
Hyukjin Kwon commented on SPARK-26801: -------------------------------------- Thanks for reporting this. Would you be interested in narrowing down the problem? > Spark unable to read valid avro types > ------------------------------------- > > Key: SPARK-26801 > URL: https://issues.apache.org/jira/browse/SPARK-26801 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0 > Reporter: Dhruve Ashar > Priority: Major > > Currently the external avro package reads avro schemasĀ for type records only. > This is probably because of representation of InternalRow in spark sql. As a > result, if the avro file has anything other than a sequence of records it > fails to read it. > We faced this issue earlier while trying to read primitive types. We > encountered this again while trying to read an array of records. Below are > code examples trying to read valid avro data showing the stack traces. > {code:java} > spark.read.format("avro").load("avroTypes/randomInt.avro").show > java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL > StructType: > "int" > at > org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180) > at scala.Option.orElse(Option.scala:289) > at > org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) > ... 49 elided > ====================================================================== > scala> spark.read.format("avro").load("avroTypes/randomEnum.avro").show > java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL > StructType: > { > "type" : "enum", > "name" : "Suit", > "symbols" : [ "SPADES", "HEARTS", "DIAMONDS", "CLUBS" ] > } > at > org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180) > at scala.Option.orElse(Option.scala:289) > at > org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) > ... 49 elided > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org