kbendick commented on a change in pull request #3326: URL: https://github.com/apache/iceberg/pull/3326#discussion_r739878080
########## File path: core/src/main/java/org/apache/iceberg/CatalogType.java ########## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg; + +import java.util.Arrays; +import java.util.Locale; +import java.util.stream.Collectors; + +public enum CatalogType { + HIVE("org.apache.iceberg.hive.HiveCatalog"), + HADOOP("org.apache.iceberg.hadoop.HadoopCatalog"), + GLUE("org.apache.iceberg.aws.glue.GlueCatalog"), + NESSIE("org.apache.iceberg.nessie.NessieCatalog"), + DYNAMODB("org.apache.iceberg.aws.dynamodb.DynamoDbCatalog"), + JDBC("org.apache.iceberg.jdbc.JdbcCatalog"); + + private final String catalogImpl; + + CatalogType(String catalogImpl) { + this.catalogImpl = catalogImpl; + } + + public String getTypeName() { + return name().toLowerCase(Locale.ENGLISH); + } + + public String getCatalogImpl() { + return catalogImpl; + } + + /** + * Returns catalog-impl class name for the input type name. + * + * @param inputType Non-null input type. + * @return catalog-impl class name + */ + public static String getCatalogImpl(String inputType) { Review comment: Same note here about separation of concerns between the config named `catalog-impl` and the values handled by this class. Maybe this is the one that could be named `CatalogType.of(String inputType)`? ########## File path: site/docs/spark-configuration.md ########## @@ -54,7 +54,7 @@ Both catalogs are configured using properties nested under the catalog name. Com | Property | Values | Description | | -------------------------------------------------- | ----------------------------- | -------------------------------------------------------------------- | -| spark.sql.catalog._catalog-name_.type | `hive` or `hadoop` | The underlying Iceberg catalog implementation, `HiveCatalog`, `HadoopCatalog` or left unset if using a custom catalog | +| spark.sql.catalog._catalog-name_.type | `hive`, `hadoop`, `nessie`, `jdbc`, `glue` or `dynamodb`| Represents the underlying Iceberg catalog implementation. For example - `HiveCatalog` for `hive` and `HadoopCatalog` for `hadoop`.<br>Leave unset if using a custom catalog via the conf `catalog-impl`. | Review comment: > Is there a place, where I can check the do's and don'ts for doc changes? Unfortunately, none that I'm aware of at present. As the docs are undergoing a large refactor lead by @samredai, that might be a good thing to bring up at the next community sync up. ########## File path: core/src/main/java/org/apache/iceberg/CatalogUtil.java ########## @@ -208,23 +201,14 @@ public static Catalog loadCatalog( */ public static Catalog buildIcebergCatalog(String name, Map<String, String> options, Configuration conf) { String catalogImpl = options.get(CatalogProperties.CATALOG_IMPL); + String catalogType = options.get(CatalogProperties.CATALOG_TYPE); + Preconditions.checkArgument(catalogImpl == null || catalogType == null, + "Cannot create catalog %s, both type and catalog-impl are set: type=%s, catalog-impl=%s", + name, catalogType, catalogImpl); + if (catalogImpl == null) { - String catalogType = PropertyUtil.propertyAsString(options, ICEBERG_CATALOG_TYPE, ICEBERG_CATALOG_TYPE_HIVE); - switch (catalogType.toLowerCase(Locale.ENGLISH)) { - case ICEBERG_CATALOG_TYPE_HIVE: - catalogImpl = ICEBERG_CATALOG_HIVE; - break; - case ICEBERG_CATALOG_TYPE_HADOOP: - catalogImpl = ICEBERG_CATALOG_HADOOP; - break; - default: - throw new UnsupportedOperationException("Unknown catalog type: " + catalogType); - } - } else { - String catalogType = options.get(ICEBERG_CATALOG_TYPE); - Preconditions.checkArgument(catalogType == null, - "Cannot create catalog %s, both type and catalog-impl are set: type=%s, catalog-impl=%s", - name, catalogType, catalogImpl); + catalogType = (catalogType == null) ? CatalogType.HIVE.getTypeName() : catalogType; Review comment: Nit: Can we add a log if neither of them are set and then mention that we will be using `hive`? I know that this isn't a change from the present behavior of falling back to `hive`, but I have heard from a few users seeking support in the Iceberg slack who didn't fully setup their catalog properly and I think the log would benefit them. Consider perhaps logging something like `Neither "type" nor "catalog-impl" were configured. Falling back to the default of "%s"`. Also, perhaps consider making `CatalogType.Hive` marked as a default value somewhere? Usually with configs, we have the config name and then the default value as well. I think that would make the code more readable and in line with the other code in Iceberg. Previously, the old code at least used the default value argument of `propertyAsString`. ########## File path: core/src/main/java/org/apache/iceberg/CatalogType.java ########## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg; + +import java.util.Arrays; +import java.util.Locale; +import java.util.stream.Collectors; + +public enum CatalogType { + HIVE("org.apache.iceberg.hive.HiveCatalog"), + HADOOP("org.apache.iceberg.hadoop.HadoopCatalog"), + GLUE("org.apache.iceberg.aws.glue.GlueCatalog"), + NESSIE("org.apache.iceberg.nessie.NessieCatalog"), + DYNAMODB("org.apache.iceberg.aws.dynamodb.DynamoDbCatalog"), + JDBC("org.apache.iceberg.jdbc.JdbcCatalog"); + + private final String catalogImpl; + + CatalogType(String catalogImpl) { + this.catalogImpl = catalogImpl; + } + + public String getTypeName() { + return name().toLowerCase(Locale.ENGLISH); + } + + public String getCatalogImpl() { + return catalogImpl; + } + + /** + * Returns catalog-impl class name for the input type name. + * + * @param inputType Non-null input type. + * @return catalog-impl class name + */ + public static String getCatalogImpl(String inputType) { + try { + CatalogType type = CatalogType.valueOf(inputType.toUpperCase(Locale.ENGLISH)); + return type.getCatalogImpl(); + } catch (IllegalArgumentException e) { + throw new UnsupportedOperationException( + String.format("Unknown catalog type: %s. Valid values are [%s]", inputType, + Arrays.stream(values()).map(CatalogType::getTypeName).collect(Collectors.joining(", ")))); Review comment: For exception messages, we usually try to also add an explanation of how the user could fix the error, as that's much more helpful. Specifically, for users who might implement a custom catalog, it would be good to call out that they should be setting `catalog-impl` or whatever config they should be setting. So maybe something to the effect of `Unknown catalog type: %s. Valid values are [%s]. To use a custom catalog, please use the field "catalog-impl with the fully qualified class name"? We should try to make it clear that custom catalogs are still supported and how to use them if a user configures their catalogs incorrectly - especially as this is a deviating change in the way that users will configure some of the commonly used catalogs starting with the next major release (e.g. people will need to change the way that they have their Dynamo catalogs configured, at least if they want to be in line with the assumptions we'll be making about configuration going forward). ########## File path: site/docs/spark-configuration.md ########## @@ -54,7 +54,7 @@ Both catalogs are configured using properties nested under the catalog name. Com | Property | Values | Description | | -------------------------------------------------- | ----------------------------- | -------------------------------------------------------------------- | -| spark.sql.catalog._catalog-name_.type | `hive` or `hadoop` | The underlying Iceberg catalog implementation, `HiveCatalog`, `HadoopCatalog` or left unset if using a custom catalog | +| spark.sql.catalog._catalog-name_.type | `hive`, `hadoop`, `nessie`, `jdbc`, `glue` or `dynamodb`| Represents the underlying Iceberg catalog implementation. For example - `HiveCatalog` for `hive` and `HadoopCatalog` for `hadoop`.<br>Leave unset if using a custom catalog via the conf `catalog-impl`. | Review comment: [nit] Looking at this one further, I think it's adding a bit more information than is needed. When I look at the rich diff, the usage of the class name (e.g. `HiveCatalog` for `hive`) given all of the additional values seems to add some noise. When I look at the rich diff, the usage of the class name (e.g. `HiveCatalog` for `hive`) doesn't necessarily add more value, but in my opinion could be somewhat confusing. I'd personally leave it as is and just use the classname in the same order as the values are given (like it is presently). Or possibly removing it entirely given how many there are. To that point, there's no additional description in the `hive.md` docs and I think they're relatively clear. There's also not a `<br>` used there. ########## File path: core/src/main/java/org/apache/iceberg/CatalogType.java ########## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iceberg; + +import java.util.Arrays; +import java.util.Locale; +import java.util.stream.Collectors; + +public enum CatalogType { + HIVE("org.apache.iceberg.hive.HiveCatalog"), + HADOOP("org.apache.iceberg.hadoop.HadoopCatalog"), + GLUE("org.apache.iceberg.aws.glue.GlueCatalog"), + NESSIE("org.apache.iceberg.nessie.NessieCatalog"), + DYNAMODB("org.apache.iceberg.aws.dynamodb.DynamoDbCatalog"), + JDBC("org.apache.iceberg.jdbc.JdbcCatalog"); + + private final String catalogImpl; + + CatalogType(String catalogImpl) { + this.catalogImpl = catalogImpl; + } + + public String getTypeName() { + return name().toLowerCase(Locale.ENGLISH); + } + + public String getCatalogImpl() { + return catalogImpl; Review comment: Given that `catalog-impl` is still a configuration value, is the naming of this method (and the corresponding private field name) possibly a bit confusing? I think it would be possible to avoid having this entirely, as well as the private field, and use a static method like `CatalogType.from(String inputType)`. This would convey the same thing but give a bit more separation from the existing config value of `catalog-impl`? Is there somewhere that I missed that we need to pass around a stateful / instantiated `CatalogType` outside of a single method call? Possible I just need more coffee 😅 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
