[GitHub] [iceberg] kbendick commented on a change in pull request #2584: Spark: Fix MergeInto when Source and Target are SinglePartition Distr…

GitBox Thu, 13 May 2021 13:49:35 -0700


kbendick commented on a change in pull request #2584:
URL: https://github.com/apache/iceberg/pull/2584#discussion_r631617870




##########
File path: 
spark3-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DynamicFileFilterExec.scala
##########
@@ -48,8 +48,32 @@ abstract class DynamicFileFilterExecBase(
   override def outputOrdering: Seq[SortOrder] = scanExec.outputOrdering
   override def supportsColumnar: Boolean = scanExec.supportsColumnar
 
-  override protected def doExecute(): RDD[InternalRow] = scanExec.execute()
-  override protected def doExecuteColumnar(): RDD[ColumnarBatch] = 
scanExec.executeColumnar()
+  /*
+  If both target and source have the same partitioning we can have a problem 
here if our filter exec actually
+  changes the output partitioning of the node. Currently this can only occur 
in the SinglePartition distribution is
+  in use which only happens if both the target and source have a single 
partition, but if it does we have the potential
+  of eliminating the only partition in the target. If there are no partitions 
in the target then we will throw an
+  exception because the partitioning was assumed to be the same 1 partition in 
source and target. We fix this by making
+  sure that we always return at least 1 empty partition, in the future we may 
need to handle more complicated
+  partitioner scenarios.
+   */

Review comment:
       Nit: Should you use either a `/** ... */` or just a bunch of `//` for 
the comments?
   
   I don't normally see a block comment in the iceberg repo with `/* .... */` 
like this. If this is a standard practice in the repo, then ignore my comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on a change in pull request #2584: Spark: Fix MergeInto when Source and Target are SinglePartition Distr…

Reply via email to