David Vogelbacher created SPARK-23598:
-----------------------------------------
Summary: WholeStageCodegen can lead to IllegalAccessError calling
append for HashAggregateExec
Key: SPARK-23598
URL: https://issues.apache.org/jira/browse/SPARK-23598
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.4.0
Reporter: David Vogelbacher
Got the following stacktrace for a large QueryPlan using WholeStageCodeGen:
{code:java}
java.lang.IllegalAccessError: tried to access method
org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V
from class
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown
Source)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown
Source)
at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
{code}
After disabling codegen, everything works.
The root cause seems to be that we are trying to call the protected _append_
method of
[BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68]
__ from an inner-class of a sub-class that is loaded by a different
class-loader (after codegen compilation).
[https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4]
states that a protected method _R_ can be accessed only if one of the following
two conditions is fulfilled:
# R is protected and is declared in a class C, and D is either a subclass of C
or C itself. Furthermore, if R is not static, then the symbolic reference to R
must contain a symbolic reference to a class T, such that T is either a
subclass of D, a superclass of D, or D itself.
# R is either protected or has default access (that is, neither public nor
protected nor private), and is declared by a class in the same run-time package
as D.
2.) doesn't apply as we have loaded the class with a different class loader
(and are in a different package) and 1.) doesn't apply because we are
apparently trying to call the method from an inner class of a subclass of
_BufferedRowIterator_.
Looking at the Code path of _WholeStageCodeGen_, the following happens:
# In
[WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527],
we create the subclass of _BufferedRowIterator_, along with a _processNext_
method for processing the output of the child plan.
# In the child, which is a
[HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517],
we create the method which shows up at the top of the stack trace (called
_doAggregateWithKeysOutput_ )
# We add this method to the compiled code invoking _addNewFunction_ of
[CodeGenerator|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460]
Now, this method states that:
{noformat}
If the code for the `OuterClass` grows too large, the function will be inlined
into a new private, inner class
{noformat}
This indeed seems to happen: the _doAggregateWithKeysOutput_ method is put into
a new private inner class. Thus, it doesn't have access to the protected
_append_ method anymore but still tries to call it, which results in the
___IllegalAccessError._
Possible fixes:
* Pass in the _inlineToOuterClass_ flag when invoking __ _addNewFunction_
* Make the _append_ method public
* Re-declare the _append_ method in the generated subclass (just invoking
_super_). This way, inner classes should have access to it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]