Alexander Bezzubov created ZEPPELIN-194:
-------------------------------------------

             Summary: Exception on %sql over CSV dataframe
                 Key: ZEPPELIN-194
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-194
             Project: Zeppelin
          Issue Type: Bug
            Reporter: Alexander Bezzubov


In local spark exception happens on {{%sql select * from ... limit 10}} but 
{{z.show(df)}} for same dataset shows well.

Steps to reproduce:
{code}
%dep
z.load("com.databricks:spark-csv_2.10:1.1.0")

%sh
wget https://s3.amazonaws.com/apache-zeppelin/tutorial/bank/bank.csv

%spark
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", 
"true").option("delimiter", "\t").load("bank.csv")
df.registerTempTable("bank")

z.show(df)

%sql select * from bank limit 10
{code}

Exception:
{code}
INFO [2015-08-01 18:06:38,986] ({pool-2-thread-3} Logging.scala[logInfo]:59) - 
Created broadcast 2 from textFile at CsvRelation.scala:66
ERROR [2015-08-01 18:06:39,002] ({pool-2-thread-3} Job.java[run]:183) - Job 
failed
org.apache.zeppelin.interpreter.InterpreterException: 
java.lang.reflect.InvocationTargetException
        at 
org.apache.zeppelin.spark.ZeppelinContext.showRDD(ZeppelinContext.java:301)
        at 
org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:134)
        at 
org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
        at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
        at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:276)
        at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
        at 
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.zeppelin.spark.ZeppelinContext.showRDD(ZeppelinContext.java:296)
        ... 13 more
Caused by: java.lang.ClassNotFoundException: 
com.databricks.spark.csv.CsvRelation$$anonfun$buildScan$1$$anonfun$1
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:270)
        at 
org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:455)
        at 
com.esotericsoftware.reflectasm.shaded.org.objectweb.asm.ClassReader.accept(Unknown
 Source)
        at 
com.esotericsoftware.reflectasm.shaded.org.objectweb.asm.ClassReader.accept(Unknown
 Source)
        at 
org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:101)
        at 
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:197)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:1893)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:683)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:682)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
        at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:682)
        at com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:83)
        at 
org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:101)
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
        at 
org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:314)
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:943)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:941)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:947)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:947)
        at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1269)
        at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1203)
        at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1262)
        ... 18 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to