dimas-b commented on code in PR #1686:
URL: https://github.com/apache/polaris/pull/1686#discussion_r2124454587


##########
extension/persistence/relational-jdbc/src/test/java/org/apache/polaris/extension/persistence/relational/jdbc/QueryGeneratorTest.java:
##########
@@ -222,4 +222,29 @@ void testGenerateWhereClause_emptyMap() {
     Map<String, Object> whereClause = Collections.emptyMap();
     assertEquals("", QueryGenerator.generateWhereClause(whereClause));
   }
+
+  @Test
+  void testGenerateOverlapQuery() {
+    String realmId = "realmId";
+    int parentId = "polaris".hashCode();
+
+    assertEquals(
+        "SELECT entity_version, to_purge_timestamp, internal_properties, "
+            + "catalog_id, purge_timestamp, sub_type_code, create_timestamp, 
last_update_timestamp, "
+            + "parent_id, name, location, id, drop_timestamp, properties, 
grant_records_version, "
+            + "type_code FROM POLARIS_SCHEMA.ENTITIES WHERE realm_id = 
'realmId' AND parent_id = "
+            + "-398224152 AND (1 = 2 OR location = '/' OR location = '/tmp/' 
OR location = '/tmp/location/' "
+            + "OR location LIKE '/tmp/location/%')",
+        QueryGenerator.generateOverlapQuery(realmId, parentId, 
"/tmp/location/"));
+
+    assertEquals(
+        "SELECT entity_version, to_purge_timestamp, internal_properties, 
catalog_id, "
+            + "purge_timestamp, sub_type_code, create_timestamp, 
last_update_timestamp, parent_id, "
+            + "name, location, id, drop_timestamp, properties, 
grant_records_version, type_code "
+            + "FROM POLARIS_SCHEMA.ENTITIES WHERE realm_id = 'realmId' AND 
parent_id = -398224152 "
+            + "AND (1 = 2 OR location = 's3:/' OR location = 's3://' OR 
location = 's3://bucket/' OR "
+            + "location = 's3://bucket/tmp/' OR location = 
's3://bucket/tmp/location/' OR location "

Review Comment:
   > Do you mean what if a storage technology does that?
   
   If we later support a storage technology with case-insensitive file paths, 
the existing implementation, which is grounded in a database-level string 
comparison feature, will have a hard time supporting it (initially it will not 
work correctly, as far as I can tell).
   
   Conversely, if the admin user for some reason (accidentally?) configures the 
database to perform case-insensitive comparisons, it will break Polaris 
Management API contracts (behaviour) for the current set of storage 
technologies.
   
   All in all, current approach appear to introduce too many obscure 
dependencies between API-level behaviours and backend behaviours (IMHO).
   
   I support the idea of leveraging database capabilities to facilitate finding 
overlapping catalogs. However, I think it should be done in a way and any 
oddities at the database level do not affect the correctness of user-visible 
Polaris behaviour. In other words, if something is not aligned at the database 
level, it would be acceptable for Polaris performance to degrade as long as 
correctness is not affected.
   
   In that regard, the presence of any entity with a location, but without an 
indexed location, should probably invalidate the optimized lookup result. If 
this requires admin user intervention, we have to make it explicit.
   
   Also, the parsing of the URI into path components (prefixes) should probably 
be generalized outside the Persistence layer (for reuse by all Persistence 
implementations).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@polaris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to