Hi,
I noticed that the following code compiles:
val df =
spark.read.format("com.databricks.spark.avro").load("/tmp/whatever/output")
val count = df.filter(x => x.getAs[Int]("day") == 2).count
It surprises me as `filter()` takes a Column, not a `Row => Boolean`.
Also, this code returns the right result, but takes 1m30 to run (while
it takes less than 1 second when using `$"day" === 2`) and gives the
error pasted in the bottom of this message.
I was just wondering why it does work (implicit conversion?), why it is
slow, and why the error occurs.
Can someone explain please?
Thank you,
Samy
--
[error] org.codehaus.commons.compiler.CompileException: File
'generated.java', Line 398, Column 41: Expression "scan_isNull10" is not
an rvalue
[error] at
org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:10174)
[error] at
org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:6036)
[error] at
org.codehaus.janino.UnitCompiler.getConstantValue2(UnitCompiler.java:4440)
[error] at
org.codehaus.janino.UnitCompiler.access$9900(UnitCompiler.java:185)
[error] at
org.codehaus.janino.UnitCompiler$11.visitAmbiguousName(UnitCompiler.java:4417)
[error] at org.codehaus.janino.Java$AmbiguousName.accept(Java.java:3138)
[error] at
org.codehaus.janino.UnitCompiler.getConstantValue(UnitCompiler.java:4427)
[error] at
org.codehaus.janino.UnitCompiler.getConstantValue2(UnitCompiler.java:4634)
[error] at
org.codehaus.janino.UnitCompiler.access$8900(UnitCompiler.java:185)
[error] at
org.codehaus.janino.UnitCompiler$11.visitBinaryOperation(UnitCompiler.java:4394)
[error] at
org.codehaus.janino.Java$BinaryOperation.accept(Java.java:3768)
[error] at
org.codehaus.janino.UnitCompiler.getConstantValue(UnitCompiler.java:4427)
[error] at
org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4360)
[error] at
org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1845)
[error] at
org.codehaus.janino.UnitCompiler.access$2000(UnitCompiler.java:185)
[error] at
org.codehaus.janino.UnitCompiler$4.visitLocalVariableDeclarationStatement(UnitCompiler.java:945)
[error] at
org.codehaus.janino.Java$LocalVariableDeclarationStatement.accept(Java.java:2508)
[error] at
org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:958)
[error] at
org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1007)
[error] at
org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2293)
[error] at
org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:822)
[error] at
org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:794)
[error] at
org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:507)
[error] at
org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:658)
[error] at
org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:662)
[error] at
org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:185)
[error] at
org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:350)
[error] at
org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1035)
[error] at
org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:354)
[error] at
org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:769)
[error] at
org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:532)
[error] at
org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:393)
[error] at
org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:185)
[error] at
org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:347)
[error] at
org.codehaus.janino.Java$PackageMemberClassDeclaration.accept(Java.java:1139)
[error] at
org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:354)
[error] at
org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:322)
[error] at
org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:383)
[error] at
org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:315)
[error] at
org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:233)
[error] at
org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:192)
[error] at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:84)
[error] at
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:883)
[error] at
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
[error] at
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
[error] at
org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
[error] at
org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
[error] at
org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
[error] at
org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
[error] at
org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
[error] at
org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
[error] at
org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
[error] at
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:837)
[error] at
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:350)
[error] at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
[error] at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
[error] at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
[error] at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[error] at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
[error] at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
[error] at
org.apache.spark.sql.execution.exchange.ShuffleExchange.prepareShuffleDependency(ShuffleExchange.scala:86)
[error] at
org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:122)
[error] at
org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:113)
[error] at
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
[error] at
org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:113)
[error] at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
[error] at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
[error] at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
[error] at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[error] at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
[error] at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
[error] at
org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:233)
[error] at
org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:138)
[error] at
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:361)
[error] at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
[error] at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
[error] at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
[error] at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[error] at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
[error] at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
[error] at
org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:240)
[error] at
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:287)
[error] at
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2183)
[error] at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
[error] at
org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2532)
[error] at
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2182)
[error] at
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2189)
[error] at
org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2217)
[error] at
org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2216)
[error] at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2545)
[error] at org.apache.spark.sql.Dataset.count(Dataset.scala:2216)
[error] at
com.sam4m.kafkafsconnector.Foo$.delayedEndpoint$com$sam4m$kafkafsconnector$Foo$1(App.scala:92)
[error] at
com.sam4m.kafkafsconnector.Foo$delayedInit$body.apply(App.scala:80)
[error] at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
[error] at
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
[error] at scala.App$$anonfun$main$1.apply(App.scala:76)
[error] at scala.App$$anonfun$main$1.apply(App.scala:76)
[error] at scala.collection.immutable.List.foreach(List.scala:381)
[error] at
scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
[error] at scala.App$class.main(App.scala:76)
[error] at com.sam4m.kafkafsconnector.Foo$.main(App.scala:80)
[error] at com.sam4m.kafkafsconnector.Foo.main(App.scala)
[error] 16/09/29 17:49:49 WARN WholeStageCodegenExec: Whole-stage
codegen disabled for this plan:
[error] *HashAggregate(keys=[], functions=[partial_count(1)],
output=[count#41L])
[error] +- *Project
[error] +- *Filter <function1>.apply
[error] +- *Scan avro
[minute#0,second#1,info#2,status#3,year#4,month#5,day#6,hour#7] Format:
com.databricks.spark.avro.DefaultSource@5864e8bf, InputPaths:
file:/tmp/k2d-tests/output, PushedFilters: [], ReadSchema:
struct<minute:int,second:int,info:struct<date:bigint,statID:string,eventType:string,deviceAdverti...
---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org