amogh-jahagirdar commented on code in PR #6899:
URL: https://github.com/apache/iceberg/pull/6899#discussion_r1114953978
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkUtil.java:
##########
@@ -292,4 +292,8 @@ public static List<Expression> partitionMapToExpression(
public static String toColumnName(NamedReference ref) {
return DOT.join(ref.fieldNames());
}
+
+ public static boolean caseSensitive(SparkSession spark) {
+ return Boolean.parseBoolean(spark.conf().get("spark.sql.caseSensitive"));
Review Comment:
We don't need to take it in this PR, but I see hardcoded
"spark.sql.caseSensitive" in a few places in the spark code, is there a good
place to define this constant? I know it's a general spark sql option and not
Iceberg specific so `SparkSQLProperties` is probably not the right place
##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestDelete.java:
##########
@@ -91,6 +97,35 @@ public void removeTables() {
sql("DROP TABLE IF EXISTS deleted_dep");
}
+ @Test
+ public void testDeleteWithoutScanningTable() throws Exception {
+ createAndInitPartitionedTable();
+
+ append(new Employee(1, "hr"), new Employee(3, "hr"));
+ append(new Employee(1, "hardware"), new Employee(2, "hardware"));
+
+ Table table = validationCatalog.loadTable(tableIdent);
+
+ List<String> manifestLocations =
+ table.currentSnapshot().allManifests(table.io()).stream()
+ .map(ManifestFile::path)
+ .collect(Collectors.toList());
+
+ withUnavailableLocations(
+ manifestLocations,
+ () -> {
+ LogicalPlan parsed = parsePlan("DELETE FROM %s WHERE dep = 'hr'",
tableName);
+
+ DeleteFromIcebergTable analyzed =
+ (DeleteFromIcebergTable)
spark.sessionState().analyzer().execute(parsed);
+ Assert.assertTrue("Should have rewrite plan",
analyzed.rewritePlan().isDefined());
+
+ DeleteFromIcebergTable optimized =
+ (DeleteFromIcebergTable)
OptimizeMetadataOnlyDeleteFromIcebergTable.apply(analyzed);
+ Assert.assertTrue("Should discard rewrite plan",
optimized.rewritePlan().isEmpty());
Review Comment:
Asserting the plan is good and gets the crux of the optimization but should
we still actually perform the delete and read the table with an assertion on
the expected records as a sanity validation?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]