Any idea what causing this error 15/08/28 21:03:03 WARN scheduler.TaskSetManager: Lost task 34.0 in stage 9.0 (TID 20, dtord01hdw0228p.dc.dotomi.net): java.lang.RuntimeException: cannot find field message_campaign_id from [0:error_error_error_error_error_error_error, 1:cannot_determine_schema, 2:check, 3:schema, 4:url, 5:and, 6:literal] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:278) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:277) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.hive.HadoopTableReader$.fillObject(TableReader.scala:277) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$4$$anonfun$9.apply(TableReader.scala:194) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$4$$anonfun$9.apply(TableReader.scala:188) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
On Thu, Aug 27, 2015 at 12:02 PM, Michael Armbrust <mich...@databricks.com> wrote: > BTY, spark-avro works great for our experience, but still, some non-tech >> people just want to use as a SQL shell in spark, like HIVE-CLI. >> > > To clarify: you can still use the spark-avro library with pure SQL. Just > use the CREATE TABLE ... USING com.databricks.spark.avro OPTIONS (path > '...') syntax. >