Do you want to see the code that whole stage codegen produces?
You can prepend a SQL statement with EXPLAIN CODEGEN ...
Or you can add the following code to a DataFrame/Dataset command:
import org.apache.spark.sql.execution.debug._
and call the the debugCodegen() command on a Dataframe/Dataset, for example:
range(0, 100).debugCodegen
...
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Range (0, 100, splits=8)
Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */ return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends
org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */ private Object[] references;
/* 007 */ private org.apache.spark.sql.execution.metric.SQLMetric
range_numOutputRows;
/* 008 */ private boolean range_initRange;
/* 009 */ private long range_partitionEnd;
...
On Fri, Aug 5, 2016 at 9:55 AM, Maciej Bryński <[email protected]> wrote:
> Hi,
> I have some operation on DataFrame / Dataset.
> How can I see source code for whole stage codegen ?
> Is there any API for this ? Or maybe I should configure log4j in specific
> way ?
>
> Regards,
> --
> Maciek Bryński
>