dimas-b commented on code in PR #1686: URL: https://github.com/apache/polaris/pull/1686#discussion_r2121804883
########## extension/persistence/relational-jdbc/src/main/resources/h2/schema-v2.sql: ########## Review Comment: Could you summarize schema changes between v1 and v2? ########## extension/persistence/relational-jdbc/src/main/java/org/apache/polaris/extension/persistence/relational/jdbc/JdbcBasePersistenceImpl.java: ########## @@ -556,6 +563,51 @@ public boolean hasChildren( } } + private int loadVersion() { + String query = generateVersionQuery(); + try { + List<SchemaVersion> schemaVersion = + datasourceOperations.executeSelect(query, new SchemaVersion()); Review Comment: Using "capability" flags instead of (simple) versions would be preferable from my POV. In this case it looks like we need to check specifically for the presence of the new index, other schema changes (if they existed) would be irrelevant. ########## extension/persistence/relational-jdbc/src/main/java/org/apache/polaris/extension/persistence/relational/jdbc/DatabaseType.java: ########## @@ -44,6 +44,6 @@ public static DatabaseType fromDisplayName(String displayName) { } public String getInitScriptResource() { - return String.format("%s/schema-v1.sql", this.getDisplayName()); + return String.format("%s/schema-v2.sql", this.getDisplayName()); Review Comment: Schema changes (and upgrade implications) would be nice to discuss on the `dev` ML for awareness. ########## getting-started/jdbc/docker-compose.yml: ########## @@ -28,6 +28,7 @@ services: - "8182:8182" # Optional, allows attaching a debugger to the Polaris JVM - "5005:5005" + shm_size: 1gb Review Comment: I'm ok with this change, but I'd propose to make it in a dedicated PR to avoid conflating with the "optimization" changes. ########## polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java: ########## @@ -403,6 +404,20 @@ boolean hasChildren( long catalogId, long parentId); + /** + * Check if the specified IcebergTableLikeEntity has any same-namespace siblings which share a + * location + * + * @param callContext the polaris call context + * @param parentId the parent entity to look for duplicates inside + * @param location the location to check for overlaps against + * @return Optional.of(Optional.of(location)) if the parent entity has children, + * Optional.of(Optional.empty()) if not, and Optional.empty() if the metastore doesn't support + * this operation + */ + Optional<Optional<String>> hasOverlappingSiblings( + @Nonnull PolarisCallContext callContext, long parentId, String location); Review Comment: This method implies that all Persistence implementations has a common understanding of what `location` means in entities. However, ATM, the location is just an opaque entity property (part of a map of generic properties). I think, we need to, at a minimum, to add javadoc about how this location is defined. Ideally, since it is used as a first interface parameter, it should be exposes as a first class accessor method in entities. ########## polaris-core/src/main/java/org/apache/polaris/core/config/FeatureConfiguration.java: ########## @@ -287,4 +287,22 @@ public static void enforceFeatureEnabledOrThrow( + "This should only be set to 'true' for tests!") .defaultValue(false) .buildFeatureConfiguration(); + + public static final FeatureConfiguration<Boolean> ADD_TRAILING_SLASH_TO_LOCATION = + PolarisConfiguration.<Boolean>builder() + .key("ADD_TRAILING_SLASH_TO_LOCATION") + .catalogConfig("polaris.config.add-trailing-slash-to-location") + .description( + "When set, the base location for a table or namespace will have `/` appended to it") + .defaultValue(true) + .buildFeatureConfiguration(); + + public static final FeatureConfiguration<Boolean> OPTIMIZED_SIBLING_CHECK = Review Comment: Does this feature rely on `ADD_TRAILING_SLASH_TO_LOCATION` for correct operation? ########## extension/persistence/relational-jdbc/src/test/java/org/apache/polaris/extension/persistence/relational/jdbc/QueryGeneratorTest.java: ########## @@ -222,4 +222,29 @@ void testGenerateWhereClause_emptyMap() { Map<String, Object> whereClause = Collections.emptyMap(); assertEquals("", QueryGenerator.generateWhereClause(whereClause)); } + + @Test + void testGenerateOverlapQuery() { + String realmId = "realmId"; + int parentId = "polaris".hashCode(); + + assertEquals( + "SELECT entity_version, to_purge_timestamp, internal_properties, " + + "catalog_id, purge_timestamp, sub_type_code, create_timestamp, last_update_timestamp, " + + "parent_id, name, location, id, drop_timestamp, properties, grant_records_version, " + + "type_code FROM POLARIS_SCHEMA.ENTITIES WHERE realm_id = 'realmId' AND parent_id = " + + "-398224152 AND (1 = 2 OR location = '/' OR location = '/tmp/' OR location = '/tmp/location/' " + + "OR location LIKE '/tmp/location/%')", + QueryGenerator.generateOverlapQuery(realmId, parentId, "/tmp/location/")); + + assertEquals( + "SELECT entity_version, to_purge_timestamp, internal_properties, catalog_id, " + + "purge_timestamp, sub_type_code, create_timestamp, last_update_timestamp, parent_id, " + + "name, location, id, drop_timestamp, properties, grant_records_version, type_code " + + "FROM POLARIS_SCHEMA.ENTITIES WHERE realm_id = 'realmId' AND parent_id = -398224152 " + + "AND (1 = 2 OR location = 's3:/' OR location = 's3://' OR location = 's3://bucket/' OR " Review Comment: I believe the `s3:/` and `s3://` cases are conceptually incorrect. ########## extension/persistence/relational-jdbc/src/main/java/org/apache/polaris/extension/persistence/relational/jdbc/JdbcBasePersistenceImpl.java: ########## @@ -556,6 +563,51 @@ public boolean hasChildren( } } + private int loadVersion() { + String query = generateVersionQuery(); + try { + List<SchemaVersion> schemaVersion = + datasourceOperations.executeSelect(query, new SchemaVersion()); + if (schemaVersion == null || schemaVersion.size() != 1) { + throw new RuntimeException("Failed to retrieve schema version"); + } + return schemaVersion.getFirst().getValue(); + } catch (SQLException e) { + LOGGER.error("Failed to load schema version due to {}", e.getMessage(), e); + throw new RuntimeException("Failed to retrieve schema version", e); + } + } + + /** {@inheritDoc} */ + @Override + public Optional<Optional<String>> hasOverlappingSiblings( + @Nonnull PolarisCallContext callContext, long parentId, String location) { + if (this.version < 2) { + return Optional.empty(); + } + if (location.chars().filter(ch -> ch == '/').count() > MAX_LOCATION_COMPONENTS) { + return Optional.empty(); + } + + String query = QueryGenerator.generateOverlapQuery(realmId, parentId, location); + try { + var results = datasourceOperations.executeSelect(query, new ModelEntity()); + if (results.isEmpty()) { Review Comment: What if the version is `2` but the location data is not present in tables? ########## extension/persistence/relational-jdbc/src/test/java/org/apache/polaris/extension/persistence/relational/jdbc/QueryGeneratorTest.java: ########## @@ -222,4 +222,29 @@ void testGenerateWhereClause_emptyMap() { Map<String, Object> whereClause = Collections.emptyMap(); assertEquals("", QueryGenerator.generateWhereClause(whereClause)); } + + @Test + void testGenerateOverlapQuery() { + String realmId = "realmId"; + int parentId = "polaris".hashCode(); + + assertEquals( + "SELECT entity_version, to_purge_timestamp, internal_properties, " + + "catalog_id, purge_timestamp, sub_type_code, create_timestamp, last_update_timestamp, " + + "parent_id, name, location, id, drop_timestamp, properties, grant_records_version, " + + "type_code FROM POLARIS_SCHEMA.ENTITIES WHERE realm_id = 'realmId' AND parent_id = " + + "-398224152 AND (1 = 2 OR location = '/' OR location = '/tmp/' OR location = '/tmp/location/' " + + "OR location LIKE '/tmp/location/%')", + QueryGenerator.generateOverlapQuery(realmId, parentId, "/tmp/location/")); + + assertEquals( + "SELECT entity_version, to_purge_timestamp, internal_properties, catalog_id, " + + "purge_timestamp, sub_type_code, create_timestamp, last_update_timestamp, parent_id, " + + "name, location, id, drop_timestamp, properties, grant_records_version, type_code " + + "FROM POLARIS_SCHEMA.ENTITIES WHERE realm_id = 'realmId' AND parent_id = -398224152 " + + "AND (1 = 2 OR location = 's3:/' OR location = 's3://' OR location = 's3://bucket/' OR " + + "location = 's3://bucket/tmp/' OR location = 's3://bucket/tmp/location/' OR location " Review Comment: What about case differences in the URI scheme or bucket name? What is a particular storage technology treats path in a case-insensitive way? What about special chars in the URI? ########## extension/persistence/relational-jdbc/src/test/java/org/apache/polaris/extension/persistence/relational/jdbc/QueryGeneratorTest.java: ########## @@ -222,4 +222,29 @@ void testGenerateWhereClause_emptyMap() { Map<String, Object> whereClause = Collections.emptyMap(); assertEquals("", QueryGenerator.generateWhereClause(whereClause)); } + + @Test + void testGenerateOverlapQuery() { + String realmId = "realmId"; + int parentId = "polaris".hashCode(); + + assertEquals( + "SELECT entity_version, to_purge_timestamp, internal_properties, " + + "catalog_id, purge_timestamp, sub_type_code, create_timestamp, last_update_timestamp, " + + "parent_id, name, location, id, drop_timestamp, properties, grant_records_version, " + + "type_code FROM POLARIS_SCHEMA.ENTITIES WHERE realm_id = 'realmId' AND parent_id = " + + "-398224152 AND (1 = 2 OR location = '/' OR location = '/tmp/' OR location = '/tmp/location/' " Review Comment: Why `1 = 2`?.. it can be statically optimized out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@polaris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org