dimas-b commented on code in PR #1686: URL: https://github.com/apache/polaris/pull/1686#discussion_r2124454587
########## extension/persistence/relational-jdbc/src/test/java/org/apache/polaris/extension/persistence/relational/jdbc/QueryGeneratorTest.java: ########## @@ -222,4 +222,29 @@ void testGenerateWhereClause_emptyMap() { Map<String, Object> whereClause = Collections.emptyMap(); assertEquals("", QueryGenerator.generateWhereClause(whereClause)); } + + @Test + void testGenerateOverlapQuery() { + String realmId = "realmId"; + int parentId = "polaris".hashCode(); + + assertEquals( + "SELECT entity_version, to_purge_timestamp, internal_properties, " + + "catalog_id, purge_timestamp, sub_type_code, create_timestamp, last_update_timestamp, " + + "parent_id, name, location, id, drop_timestamp, properties, grant_records_version, " + + "type_code FROM POLARIS_SCHEMA.ENTITIES WHERE realm_id = 'realmId' AND parent_id = " + + "-398224152 AND (1 = 2 OR location = '/' OR location = '/tmp/' OR location = '/tmp/location/' " + + "OR location LIKE '/tmp/location/%')", + QueryGenerator.generateOverlapQuery(realmId, parentId, "/tmp/location/")); + + assertEquals( + "SELECT entity_version, to_purge_timestamp, internal_properties, catalog_id, " + + "purge_timestamp, sub_type_code, create_timestamp, last_update_timestamp, parent_id, " + + "name, location, id, drop_timestamp, properties, grant_records_version, type_code " + + "FROM POLARIS_SCHEMA.ENTITIES WHERE realm_id = 'realmId' AND parent_id = -398224152 " + + "AND (1 = 2 OR location = 's3:/' OR location = 's3://' OR location = 's3://bucket/' OR " + + "location = 's3://bucket/tmp/' OR location = 's3://bucket/tmp/location/' OR location " Review Comment: > Do you mean what if a storage technology does that? If we later support a storage technology with case-insensitive file paths, the existing implementation, which is grounded in a database-level string comparison feature, will have a hard time supporting it (initially it will not work correctly, as far as I can tell). Conversely, if the admin user for some reason (accidentally?) configures the database to perform case-insensitive comparisons, it will break Polaris Management API contracts (behaviour) for the current set of storage technologies. All in all, current approach appear to introduce too many obscure dependencies between API-level behaviours and backend behaviours (IMHO). I support the idea of leveraging database capabilities to facilitate finding overlapping catalogs. However, I think it should be done in a way and any oddities at the database level do not affect the correctness of user-visible Polaris behaviour. In other words, if something is not aligned at the database level, it would be acceptable for Polaris performance to degrade as long as correctness is not affected. In that regard, the presence of any entity with a location, but without an indexed location, should probably invalidate the optimized lookup result. If this requires admin user intervention, we have to make it explicit. Also, the parsing of the URI into path components (prefixes) should probably be generalized outside the Persistence layer (for reuse by all Persistence implementations). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@polaris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org