Re: [PR] [VL] Add independent operator for top-n processing in TakeOrderedAndProjectExecTransformer [incubator-gluten]

via GitHub Tue, 16 Apr 2024 00:48:37 -0700


zhztheplayer commented on code in PR #5409:
URL: https://github.com/apache/incubator-gluten/pull/5409#discussion_r1566884037



##########
backends-velox/src/main/scala/org/apache/gluten/execution/TopNTransformer.scala:
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gluten.execution
+
+import org.apache.gluten.backendsapi.BackendsApiManager
+import org.apache.gluten.expression.{ConverterUtils, ExpressionConverter}
+import org.apache.gluten.extension.ValidationResult
+import org.apache.gluten.metrics.{MetricsUpdater, NoopMetricsUpdater}
+import org.apache.gluten.substrait.`type`.TypeBuilder
+import org.apache.gluten.substrait.SubstraitContext
+import org.apache.gluten.substrait.extensions.ExtensionBuilder
+import org.apache.gluten.substrait.rel.{RelBuilder, RelNode}
+
+import org.apache.spark.sql.catalyst.expressions.{Attribute, SortOrder}
+import org.apache.spark.sql.catalyst.plans.physical.{AllTuples, Distribution, 
Partitioning, UnspecifiedDistribution}
+import org.apache.spark.sql.catalyst.util.truncatedString
+import org.apache.spark.sql.execution.SparkPlan
+
+import io.substrait.proto.SortField
+
+import scala.collection.JavaConverters._
+
+case class TopNTransformer(

Review Comment:
   > I remember a transformer aims to convert Spark plan as Substrait plan. As 
Spark does not keep a TopN operator, I'm wondering why we need a 
TopNTransformer. With this change, it seems we are corresponding with Velox 
operator instead of Spark. How do you think?
   
   This is what I was thinking about either. My opinion is that we have some 
debatable design from the very beginning.
   
   Physical plan is basically an abstraction for the actual query execution 
procedure. The planner / optimizer tend to obtain more information form 
physical plan for better optimization. A simplest example, optimizer may place 
exchanges to alter data distribution in the middle of two physical operators 
on-demand. Top-N can also be an example, if optimizer can distinguish between 
OrderBy+Limit and TopN, it could optimize the former to the latter when it's 
needed. Say if N in TopN is abnormally large and TopN algorithm may not help 
much in that case, optimizer can decide not converting OrderBy+Limit into it.
   
   We have limited capability to extend our physical plan if we don't extend 
Substrait or using upstream Substrait. I am inclined that in long term we 
should not follow our previous criteria and allow developers to extend Gluten's 
Substrait proto whenever it's needed - then the physical plan can carry more 
execution detail and create more potential to optimize.
   
   In the case of we want to keep compatibility with Substrait ecosystem, we'd 
have a new criteria that when we extend Gluten's Substrait proto file, it's 
recommended to add new operators instead of modifying the old one. By doing 
this if we want to support a new backend that compatible to upstream Substrait, 
we still can use the same proto definition for it. Indeed, better solutions may 
include one Substrait proto (or whatever Spark physical plan serde you like) 
per backend, but that may require for more refactor work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [VL] Add independent operator for top-n processing in TakeOrderedAndProjectExecTransformer [incubator-gluten]

Reply via email to