[spark] branch master updated: [SPARK-31187][SQL] Sort the whole-stage codegen debug output by codegenStageId

yamamuro Thu, 19 Mar 2020 04:55:25 -0700

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new a177628  [SPARK-31187][SQL] Sort the whole-stage codegen debug output 
by codegenStageId
a177628 is described below

commit a1776288f48d450fea28f50fef78fd6aa10a8160
Author: Kris Mok <[email protected]>
AuthorDate: Thu Mar 19 20:53:01 2020 +0900

    [SPARK-31187][SQL] Sort the whole-stage codegen debug output by 
codegenStageId
    
    ### What changes were proposed in this pull request?
    
    Spark SQL's whole-stage codegen (WSCG) supports dumping the generated code 
to help with debugging. One way to get the generated code is through 
`df.queryExecution.debug.codegen`, or SQL `EXPLAIN CODEGEN` statement.
    
    The generated code is currently printed without specific ordering, which 
can make debugging a bit annoying. This PR makes a minor improvement to sort 
the codegen dump by the `codegenStageId`, ascending.
    
    After this change, the following query:
    ```scala
    spark.range(10).agg(sum('id)).queryExecution.debug.codegen
    ```
    will always dump the generated code in a natural, stable order. A version 
of this example with shorter output is:
    ```
    
spark.range(10).agg(sum('id)).queryExecution.debug.codegenToSeq.map(_._1).foreach(println)
    *(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], 
output=[sum#15L])
    +- *(1) Range (0, 10, step=1, splits=16)
    
    *(2) HashAggregate(keys=[], functions=[sum(id#8L)], output=[sum(id)#12L])
    +- Exchange SinglePartition, true, [id=#30]
       +- *(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], 
output=[sum#15L])
          +- *(1) Range (0, 10, step=1, splits=16)
    ```
    
    The number of codegen stages within a single SQL query tends to be very 
small, most likely < 50, so the overhead of adding the sorting shouldn't be 
significant.
    
    ### Why are the changes needed?
    
    Minor improvement to aid WSCG debugging.
    
    ### Does this PR introduce any user-facing change?
    
    No user-facing change for end-users; minor change for developers who debug 
WSCG generated code.
    
    ### How was this patch tested?
    
    Manually tested the output; all other tests still pass.
    
    Closes #27955 from rednaxelafx/codegen.
    
    Authored-by: Kris Mok <[email protected]>
    Signed-off-by: Takeshi Yamamuro <[email protected]>
---
 .../src/main/scala/org/apache/spark/sql/execution/debug/package.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala
index 6a57ef2..6c40104 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala
@@ -113,7 +113,7 @@ package object debug {
         s
       case s => s
     }
-    codegenSubtrees.toSeq.map { subtree =>
+    codegenSubtrees.toSeq.sortBy(_.codegenStageId).map { subtree =>
       val (_, source) = subtree.doCodeGen()
       val codeStats = try {
         CodeGenerator.compile(source)._2


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch master updated: [SPARK-31187][SQL] Sort the whole-stage codegen debug output by codegenStageId

Reply via email to