parthchandra commented on code in PR #577:
URL: https://github.com/apache/datafusion-comet/pull/577#discussion_r1643449854


##########
spark/inspections/CometTPCHQueriesList-results.txt:
##########
@@ -1,133 +1,133 @@
-Query: q1 TPCH Snappy. Comet Exec: Enabled (CometHashAggregate, CometProject)
+Query: q1 TPCH Snappy. Comet Exec: Enabled (CometHashAggregate, CometFilter, 
CometProject)
 Query: q1 TPCH Snappy: ExplainInfo:
 Comet shuffle is not enabled: spark.sql.adaptive.coalescePartitions.enabled is 
enabled and spark.comet.shuffle.enforceMode.enabled is not enabled
 
 Query: q2 TPCH Snappy. Comet Exec: Enabled (CometFilter, CometProject)
 Query: q2 TPCH Snappy: ExplainInfo:
-BroadcastExchange is not supported
+BroadcastHashJoin is not enabled because not all child plans are native 

Review Comment:
   TL;DR To get the reason why a specific operation in a specific query is not 
native we must look at the extended explain info in conjunction with the plan.
   
   One of the goals of the extended explain plan output (perhaps the primary 
goal) was to run this over a large number of queries and aggregate which 
operations end users are likely to benefit from. For instance, `xxhash64` 
appears to be an expression which prevents several TPC-H queries from running 
natively, so we were able to prioritise the work on it.
   Getting detailed information  for specific queries was not originally a 
goal, so we subsume some of the extended explain information, preferring to 
only output root causes. This is also the reason why we do not associate the 
information with any specific node.
   
   In TPC-H q2, there are _seven_ BroadcastHashJoins in the plan. None of them 
are converted to native because the child plan is not native. Some of the nodes 
have one arm which is native but the other is a BroadcastExchange, other nodes 
have a Project as one arm and a BroadcastExchange  as the other. Now the 
Project arm in the node is not native because its child is not native. But 
there are different reasons for why the Project has a child plan that is not 
native. In this case, some have a child which is a BroadcastHashJoin and others 
have a SortMergeJoin. And so on.
   The only good way to make this something that end users can leverage is to 
print the entire plan with the explanation on the same line as the node. That 
has two disadvantages. Firstly, we can no longer aggregate this information 
because each extended explain plan is unique to the query. Secondly, the 
extended explain plan becomes verbose and hard to read.
   
   I've made an attempt that makes it a little more verbose without getting too 
query specific. For TPC-H q2 it looks like this - 
   ```
   BroadcastHashJoin is not enabled because the following children are not 
native (BroadcastExchange)
   BroadcastHashJoin is not enabled because the following children are not 
native (Project, BroadcastExchange)
   Comet shuffle is not enabled: spark.sql.adaptive.coalescePartitions.enabled 
is enabled and spark.comet.shuffle.enforceMode.enabled is not enabled
   Filter is not native because the following children are not native 
(HashAggregate)
   HashAggregate is not native because the following children are not native 
(Project)
   ObjectHashAggregate is not supported
   Project is not native because the following children are not native 
(BroadcastHashJoin)
   Project is not native because the following children are not native 
(SortMergeJoin)
   Sort is not native because the following children are not native (Exchange)
   SortMergeJoin is not enabled because the following children are not native 
(Sort, Sort)
   TakeOrderedAndProject requires shuffle to be enabled
   xxhash64 is disabled by default. Set spark.comet.xxhash64.enabled=true to 
enable it.
   ```
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to