kbendick commented on a change in pull request #3326:
URL: https://github.com/apache/iceberg/pull/3326#discussion_r739878080



##########
File path: core/src/main/java/org/apache/iceberg/CatalogType.java
##########
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.Arrays;
+import java.util.Locale;
+import java.util.stream.Collectors;
+
+public enum CatalogType {
+  HIVE("org.apache.iceberg.hive.HiveCatalog"),
+  HADOOP("org.apache.iceberg.hadoop.HadoopCatalog"),
+  GLUE("org.apache.iceberg.aws.glue.GlueCatalog"),
+  NESSIE("org.apache.iceberg.nessie.NessieCatalog"),
+  DYNAMODB("org.apache.iceberg.aws.dynamodb.DynamoDbCatalog"),
+  JDBC("org.apache.iceberg.jdbc.JdbcCatalog");
+
+  private final String catalogImpl;
+
+  CatalogType(String catalogImpl) {
+    this.catalogImpl = catalogImpl;
+  }
+
+  public String getTypeName() {
+    return name().toLowerCase(Locale.ENGLISH);
+  }
+
+  public String getCatalogImpl() {
+    return catalogImpl;
+  }
+
+  /**
+   * Returns catalog-impl class name for the input type name.
+   *
+   * @param inputType Non-null input type.
+   * @return catalog-impl class name
+   */
+  public static String getCatalogImpl(String inputType) {

Review comment:
       Same note here about separation of concerns between the config named 
`catalog-impl` and the values handled by this class. Maybe this is the one that 
could be named `CatalogType.of(String inputType)`?

##########
File path: site/docs/spark-configuration.md
##########
@@ -54,7 +54,7 @@ Both catalogs are configured using properties nested under 
the catalog name. Com
 
 | Property                                           | Values                  
      | Description                                                          |
 | -------------------------------------------------- | 
----------------------------- | 
-------------------------------------------------------------------- |
-| spark.sql.catalog._catalog-name_.type              | `hive` or `hadoop`      
      | The underlying Iceberg catalog implementation, `HiveCatalog`, 
`HadoopCatalog` or left unset if using a custom catalog |
+| spark.sql.catalog._catalog-name_.type              | `hive`, `hadoop`, 
`nessie`, `jdbc`, `glue` or `dynamodb`| Represents the underlying Iceberg 
catalog implementation. For example - `HiveCatalog` for `hive` and 
`HadoopCatalog` for `hadoop`.<br>Leave unset if using a custom catalog via the 
conf `catalog-impl`. |

Review comment:
       > Is there a place, where I can check the do's and don'ts for doc 
changes?
   
   Unfortunately, none that I'm aware of at present. As the docs are undergoing 
a large refactor lead by @samredai, that might be a good thing to bring up at 
the next community sync up.

##########
File path: core/src/main/java/org/apache/iceberg/CatalogUtil.java
##########
@@ -208,23 +201,14 @@ public static Catalog loadCatalog(
    */
   public static Catalog buildIcebergCatalog(String name, Map<String, String> 
options, Configuration conf) {
     String catalogImpl = options.get(CatalogProperties.CATALOG_IMPL);
+    String catalogType = options.get(CatalogProperties.CATALOG_TYPE);
+    Preconditions.checkArgument(catalogImpl == null || catalogType == null,
+        "Cannot create catalog %s, both type and catalog-impl are set: 
type=%s, catalog-impl=%s",
+        name, catalogType, catalogImpl);
+
     if (catalogImpl == null) {
-      String catalogType = PropertyUtil.propertyAsString(options, 
ICEBERG_CATALOG_TYPE, ICEBERG_CATALOG_TYPE_HIVE);
-      switch (catalogType.toLowerCase(Locale.ENGLISH)) {
-        case ICEBERG_CATALOG_TYPE_HIVE:
-          catalogImpl = ICEBERG_CATALOG_HIVE;
-          break;
-        case ICEBERG_CATALOG_TYPE_HADOOP:
-          catalogImpl = ICEBERG_CATALOG_HADOOP;
-          break;
-        default:
-          throw new UnsupportedOperationException("Unknown catalog type: " + 
catalogType);
-      }
-    } else {
-      String catalogType = options.get(ICEBERG_CATALOG_TYPE);
-      Preconditions.checkArgument(catalogType == null,
-          "Cannot create catalog %s, both type and catalog-impl are set: 
type=%s, catalog-impl=%s",
-          name, catalogType, catalogImpl);
+      catalogType = (catalogType == null) ? CatalogType.HIVE.getTypeName() : 
catalogType;

Review comment:
       Nit: Can we add a log if neither of them are set and then mention that 
we will be using `hive`?
   
   I know that this isn't a change from the present behavior of falling back to 
`hive`, but I have heard from a few users seeking support in the Iceberg slack 
who didn't fully setup their catalog properly and I think the log would benefit 
them.
   
   Consider perhaps logging something like `Neither "type" nor "catalog-impl" 
were configured. Falling back to the default of "%s"`.
   
   Also, perhaps consider making `CatalogType.Hive` marked as a default value 
somewhere? Usually with configs, we have the config name and then the default 
value as well. I think that would make the code more readable and in line with 
the other code in Iceberg.
   
   Previously, the old code at least used the default value argument of 
`propertyAsString`.

##########
File path: core/src/main/java/org/apache/iceberg/CatalogType.java
##########
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.Arrays;
+import java.util.Locale;
+import java.util.stream.Collectors;
+
+public enum CatalogType {
+  HIVE("org.apache.iceberg.hive.HiveCatalog"),
+  HADOOP("org.apache.iceberg.hadoop.HadoopCatalog"),
+  GLUE("org.apache.iceberg.aws.glue.GlueCatalog"),
+  NESSIE("org.apache.iceberg.nessie.NessieCatalog"),
+  DYNAMODB("org.apache.iceberg.aws.dynamodb.DynamoDbCatalog"),
+  JDBC("org.apache.iceberg.jdbc.JdbcCatalog");
+
+  private final String catalogImpl;
+
+  CatalogType(String catalogImpl) {
+    this.catalogImpl = catalogImpl;
+  }
+
+  public String getTypeName() {
+    return name().toLowerCase(Locale.ENGLISH);
+  }
+
+  public String getCatalogImpl() {
+    return catalogImpl;
+  }
+
+  /**
+   * Returns catalog-impl class name for the input type name.
+   *
+   * @param inputType Non-null input type.
+   * @return catalog-impl class name
+   */
+  public static String getCatalogImpl(String inputType) {
+    try {
+      CatalogType type = 
CatalogType.valueOf(inputType.toUpperCase(Locale.ENGLISH));
+      return type.getCatalogImpl();
+    } catch (IllegalArgumentException e) {
+      throw new UnsupportedOperationException(
+          String.format("Unknown catalog type: %s. Valid values are [%s]", 
inputType,
+              
Arrays.stream(values()).map(CatalogType::getTypeName).collect(Collectors.joining(",
 "))));

Review comment:
       For exception messages, we usually try to also add an explanation of how 
the user could fix the error, as that's much more helpful.
   
   Specifically, for users who might implement a custom catalog, it would be 
good to call out that they should be setting `catalog-impl` or whatever config 
they should be setting.
   
   So maybe something to the effect of `Unknown catalog type: %s. Valid values 
are [%s]. To use a custom catalog, please use the field "catalog-impl with the 
fully qualified class name"?
   
   We should try to make it clear that custom catalogs are still supported and 
how to use them if a user configures their catalogs incorrectly - especially as 
this is a deviating change in the way that users will configure some of the 
commonly used catalogs starting with the next major release (e.g. people will 
need to change the way that they have their Dynamo catalogs configured, at 
least if they want to be in line with the assumptions we'll be making about 
configuration going forward).

##########
File path: site/docs/spark-configuration.md
##########
@@ -54,7 +54,7 @@ Both catalogs are configured using properties nested under 
the catalog name. Com
 
 | Property                                           | Values                  
      | Description                                                          |
 | -------------------------------------------------- | 
----------------------------- | 
-------------------------------------------------------------------- |
-| spark.sql.catalog._catalog-name_.type              | `hive` or `hadoop`      
      | The underlying Iceberg catalog implementation, `HiveCatalog`, 
`HadoopCatalog` or left unset if using a custom catalog |
+| spark.sql.catalog._catalog-name_.type              | `hive`, `hadoop`, 
`nessie`, `jdbc`, `glue` or `dynamodb`| Represents the underlying Iceberg 
catalog implementation. For example - `HiveCatalog` for `hive` and 
`HadoopCatalog` for `hadoop`.<br>Leave unset if using a custom catalog via the 
conf `catalog-impl`. |

Review comment:
       [nit] Looking at this one further, I think it's adding a bit more 
information than is needed. When I look at the rich diff, the usage of the 
class name (e.g. `HiveCatalog` for `hive`) given all of the additional values 
seems to add some noise.
   
   When I look at the rich diff, the usage of the class name (e.g. 
`HiveCatalog` for `hive`) doesn't necessarily add more value, but in my opinion 
could be somewhat confusing. I'd personally leave it as is and just use the 
classname in the same order as the values are given (like it is presently). Or 
possibly removing it entirely given how many there are.
   
   To that point, there's no additional description in the `hive.md` docs and I 
think they're relatively clear. There's also not a `<br>` used there.

##########
File path: core/src/main/java/org/apache/iceberg/CatalogType.java
##########
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.util.Arrays;
+import java.util.Locale;
+import java.util.stream.Collectors;
+
+public enum CatalogType {
+  HIVE("org.apache.iceberg.hive.HiveCatalog"),
+  HADOOP("org.apache.iceberg.hadoop.HadoopCatalog"),
+  GLUE("org.apache.iceberg.aws.glue.GlueCatalog"),
+  NESSIE("org.apache.iceberg.nessie.NessieCatalog"),
+  DYNAMODB("org.apache.iceberg.aws.dynamodb.DynamoDbCatalog"),
+  JDBC("org.apache.iceberg.jdbc.JdbcCatalog");
+
+  private final String catalogImpl;
+
+  CatalogType(String catalogImpl) {
+    this.catalogImpl = catalogImpl;
+  }
+
+  public String getTypeName() {
+    return name().toLowerCase(Locale.ENGLISH);
+  }
+
+  public String getCatalogImpl() {
+    return catalogImpl;

Review comment:
       Given that `catalog-impl` is still a configuration value, is the naming 
of this method (and the corresponding private field name) possibly a bit 
confusing?
   
   I think it would be possible to avoid having this entirely, as well as the 
private field, and use a static method like `CatalogType.from(String 
inputType)`. This would convey the same thing but give a bit more separation 
from the existing config value of `catalog-impl`?
   
   Is there somewhere that I missed that we need to pass around a stateful / 
instantiated `CatalogType` outside of a single method call? Possible I just 
need more coffee 😅 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to