Re: [PR] [GLUTEN-8580][CORE][Part-1] Clean up unnecessary code related to input file expression [incubator-gluten]

via GitHub Tue, 21 Jan 2025 19:01:40 -0800


zml1206 commented on code in PR #8584:
URL: https://github.com/apache/incubator-gluten/pull/8584#discussion_r1924628747



##########
gluten-substrait/src/main/scala/org/apache/gluten/execution/GlutenWholeStageColumnarRDD.scala:
##########
@@ -62,17 +59,6 @@ class GlutenWholeStageColumnarRDD(
   private val numaBindingInfo = GlutenConfig.get.numaBindingInfo
 
   override def compute(split: Partition, context: TaskContext): 
Iterator[ColumnarBatch] = {
-
-    // To support input_file_name(). According to semantic we should return
-    // the exact file name a row belongs to. However in columnar engine it's
-    // not easy to accomplish this. so we return a list of file(part) names
-    split match {
-      case FirstZippedPartitionsPartition(_, g: GlutenPartition, _) =>
-        InputFileBlockHolderProxy.set(g.files.mkString(","))
-      case _ =>
-        InputFileBlockHolderProxy.unset()
-    }
-

Review Comment:
   There are problems with the previous input file expression implementation. 
#7124 optimizes the solution and pushes the input file expression down to 
scanTransform or the project before scan. The results come from native scan or 
spark thread local, so there is no need to retain the information in 
InputFileBlockHolder.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [GLUTEN-8580][CORE][Part-1] Clean up unnecessary code related to input file expression [incubator-gluten]

Reply via email to