iemejia commented on code in PR #12390:
URL: https://github.com/apache/gluten/pull/12390#discussion_r3491498041
##########
gluten-delta/src-delta33/main/scala/org/apache/gluten/delta/DeltaDeletionVectorScanInfo.scala:
##########
@@ -61,10 +63,23 @@ object DeltaDeletionVectorScanInfo {
* Materializes per-file Delta DV read options for a split, alongside each
file's metadata with
* the DV bookkeeping keys stripped. Returns None when no file in the split
carries a deletion
* vector, so callers can keep the generic split representation.
+ *
+ * Performance: resolves the table path once (using the first file) and
reuses a single Hadoop
+ * Configuration instance across all files in the partition to avoid
redundant filesystem I/O and
+ * object allocation.
*/
def normalize(partitionColumnCount: Int, partitionFiles:
Seq[PartitionedFile])
: Option[(Seq[JMap[String, Object]], Seq[DeltaFileReadOptions])] = {
- val scanInfos = extractAll(activeSparkSession, partitionColumnCount,
partitionFiles)
+ val spark = activeSparkSession
+ // Create a single Hadoop Configuration for the entire partition.
+ val hadoopConf = spark.sessionState.newHadoopConf()
+ // Resolve table path once using the first file -- all files in a Delta
table share the same
+ // root, so this avoids N-1 redundant filesystem existence checks.
+ val cachedTablePath = resolveTablePath(hadoopConf, partitionColumnCount,
partitionFiles.head)
Review Comment:
Fixed. Same empty-input guard added.
##########
gluten-delta/src/test/scala/org/apache/gluten/execution/DeltaSuite.scala:
##########
@@ -794,4 +796,67 @@ abstract class DeltaSuite extends
WholeStageTransformerSuite {
checkAnswer(df, Row(0, null) :: Row(101, Seq(Row("a", 1), null)) :: Nil)
}
}
+
+ test("post-transform rules are no-op on non-Delta plans") {
+ withTempPath {
+ p =>
+ val path = p.getCanonicalPath
+ spark.range(100).selectExpr("id", "id * 2 as
value").write.parquet(path)
+ val df = spark.read.parquet(path)
+ val plan = df.queryExecution.executedPlan
+
+ // Rules should return the plan unchanged (early-exit guard)
+ val transformed = DeltaPostTransformRules.rules.foldLeft(plan) {
+ (p, rule) => rule(p)
+ }
+ // No DeltaScanTransformer in the plan, so rules should be identity
+ assert(
+ !transformed.exists(_.isInstanceOf[DeltaScanTransformer]),
+ "Non-Delta plan should not contain DeltaScanTransformer")
+ assert(
+ !plan.exists(_.isInstanceOf[DeltaScanTransformer]),
+ "Original plan should not contain DeltaScanTransformer")
Review Comment:
Fixed. The test now applies only the Delta-specific rules (skipping
`RemoveTransitions` which is generic) and asserts referential identity
(`transformed eq plan`), which directly validates the early-exit guard returns
the same object.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]