shangxinli commented on code in PR #289:
URL: https://github.com/apache/iceberg-cpp/pull/289#discussion_r2511809361


##########
src/iceberg/expression/predicate.cc:
##########
@@ -246,7 +269,69 @@ Result<std::shared_ptr<Expression>> 
UnboundPredicate<B>::BindLiteralOperation(
     }
   }
 
-  // TODO(gangwu): translate truncate(col) == value to startsWith(value)
+  // Optimize: translate truncate(col, width) == value to col startsWith(value)
+  // This optimization allows better predicate pushdown and index usage
+  // IMPORTANT: Only valid when literal has exactly `width` UTF-8 code points
+  //
+  // NOTE: This rewrite is safe because:
+  // - Iceberg string comparisons are binary (byte-for-byte), no collation
+  // - STARTS_WITH uses the same binary comparison semantics as equality
+  // - truncate(col, w) == "value" ⟺ col STARTS_WITH "value" when len(value) 
== w
+  // - When source has < w code points, truncate returns full string; equality
+  //   implies exact match, so STARTS_WITH remains valid (short-string 
invariance)
+  if (BASE::op() == Expression::Operation::kEq &&
+      bound_term->kind() == Term::Kind::kTransform) {
+    // Safe to cast after kind check confirms it's a transform
+    auto* transform_term = dynamic_cast<BoundTransform*>(bound_term.get());
+    if (!transform_term) {
+      // Should never happen after kind check, but be defensive
+      return std::make_shared<BoundLiteralPredicate>(BASE::op(), 
std::move(bound_term),
+                                                     std::move(literal));
+    }

Review Comment:
   sure



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to