Do you want to see the code that whole stage codegen produces? You can prepend a SQL statement with EXPLAIN CODEGEN ...
Or you can add the following code to a DataFrame/Dataset command: import org.apache.spark.sql.execution.debug._ and call the the debugCodegen() command on a Dataframe/Dataset, for example: range(0, 100).debugCodegen ... Found 1 WholeStageCodegen subtrees. == Subtree 1 / 1 == *Range (0, 100, splits=8) Generated code: /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIterator(references); /* 003 */ } /* 004 */ /* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { /* 006 */ private Object[] references; /* 007 */ private org.apache.spark.sql.execution.metric.SQLMetric range_numOutputRows; /* 008 */ private boolean range_initRange; /* 009 */ private long range_partitionEnd; ... On Fri, Aug 5, 2016 at 9:55 AM, Maciej Bryński <mac...@brynski.pl> wrote: > Hi, > I have some operation on DataFrame / Dataset. > How can I see source code for whole stage codegen ? > Is there any API for this ? Or maybe I should configure log4j in specific > way ? > > Regards, > -- > Maciek Bryński >