johnclara commented on a change in pull request #1844:
URL: https://github.com/apache/iceberg/pull/1844#discussion_r532278408
##########
File path: core/src/main/java/org/apache/iceberg/CatalogUtil.java
##########
@@ -171,45 +172,57 @@ public static Catalog loadCatalog(
/**
* Load a custom {@link FileIO} implementation.
+ */
+ public static FileIO loadFileIO(
+ String impl,
+ Map<String, String> properties,
+ Configuration hadoopConf) {
+ return loadCatalogConfigurable(impl, properties, hadoopConf, FileIO.class);
+ }
+
+ /**
+ * Load a custom implementation of a {@link CatalogConfigurable}.
* <p>
* The implementation must have a no-arg constructor.
- * If the class implements {@link Configurable},
+ * If the class implements Hadoop's {@link Configurable},
* a Hadoop config will be passed using {@link
Configurable#setConf(Configuration)}.
- * {@link FileIO#initialize(Map properties)} is called to complete the
initialization.
+ * {@link CatalogConfigurable#initialize(Map properties)} is called to
complete the initialization.
*
* @param impl full class name of a custom FileIO implementation
* @param hadoopConf hadoop configuration
+ * @param resultClass the final return class type
* @return FileIO class
* @throws IllegalArgumentException if class path not found or
* right constructor not found or
* the loaded class cannot be casted to the given interface type
*/
- public static FileIO loadFileIO(
+ public static <T extends CatalogConfigurable> T loadCatalogConfigurable(
String impl,
Map<String, String> properties,
- Configuration hadoopConf) {
- LOG.info("Loading custom FileIO implementation: {}", impl);
- DynConstructors.Ctor<FileIO> ctor;
+ Configuration hadoopConf,
+ Class<T> resultClass) {
+ LOG.info("Loading custom {} implementation: {}", resultClass.getName(),
impl);
+ DynConstructors.Ctor<T> ctor;
try {
- ctor = DynConstructors.builder(FileIO.class).impl(impl).buildChecked();
+ ctor = DynConstructors.builder(resultClass).impl(impl).buildChecked();
Review comment:
it's also a little ridiculous having to serialize all the config for
FileIO and others into Map<String, String> so that it can be loaded from
properties if it's already serializable.
Is there some way Spark could support a Map<String, Serializable> which can
get passed through to DataSourceV2? Is that already getting discussed somewhere?
In the short term, can we make builders loadable which then build the object.
This way whether or not something is valid config/is null is up to the
object/builder during initialization instead of post initialization?
Also it lets there be final configuration variables instead of this weird,
private null for the first couple seconds of it's lifetime.
What I've been working on:
```
/**
* Should have an empty constructor.
*
* Necessary for getting past the Hadoop Configuration / Spark Options /
Iceberg Options configuration barriers.
*/
public interface LoadableBuilder<T extends LoadableBuilder<T, K>, K extends
Dumpable> extends Serializable {
/**
* Method name for our custom {@link com.box.dataplatform.util.Conf}
*
* Should load config from conf
* @param conf
*/
T load(Conf conf);
/**
* Method name for our custom {@link com.box.dataplatform.util.Conf}
*
* Should load config from conf
* @param conf
*/
K build();
}
public interface Dumpable extends Serializable {
/**
* Method name for our custom {@link com.box.dataplatform.util.Conf}
*
* Should add current config to conf
* @param conf
*/
void dump(Conf conf);
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]