Have you seen this thread ? http://search-hadoop.com/m/q3RTtWmyYB5fweR&subj=Re+Best+way+to+store+Avro+Objects+as+Parquet+using+SPARK
On Thu, May 26, 2016 at 6:55 AM, Govindasamy, Nagarajan < ngovindas...@turbine.com> wrote: > Hi, > > I am trying to save RDD of Avro GenericRecord as parquet. I am using Spark > 1.6.1. > > > DStreamOfAvroGenericRecord.foreachRDD(rdd => > rdd.toDF().write.parquet("s3://bucket/data.parquet")) > > Getting the following exception. Is there a way to save Avro GenericRecord > as Parquet or ORC file? > > > > > > > > > > > > > > > > > *java.lang.UnsupportedOperationException: Schema for type > org.apache.avro.generic.GenericRecord is not supported at > org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:715) > at > org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:690) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:689) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:689) > at > org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30) > at > org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:642) > at > org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30) > at > org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:414) > at > org.apache.spark.sql.SQLImplicits.rddToDataFrameHolder(SQLImplicits.scala:155)* > > Thanks, > > Raj >