snazy opened a new issue #3254:
URL: https://github.com/apache/iceberg/issues/3254
Idea: have a mechanism that allows plugging additional implementations of
`org.apache.spark.sql.connector.iceberg.catalog.Procedure` for all users of
`SparkCatalog` and `SparkSessionCatalog` by "just dropping an additional jar".
`Procedure`s allow a very flexible way (generic input parameters + result
row(s)) to implement additional functionality that is say "too advanced" for
everyday use and/or does not justify the overhead to implement the whole spiel
of a Spark SQL-extension.
The only way to add custom procedures I could find is to extend
`SparkCatalog` and/or `SparkSessionCatalog` and override `loadProcedure`, which
requires users to configure the subclasses of `Spark[Session]Catalog` in their
Spark configuration.
`Procedure`s already use namespaces, although `system` is the only supported
namespace via `BaseCatalog.loadProcedure`.
`BaseCatalog` could use Java's `java.util.ServiceLoader` to find
`ProcedureProvider`s. A `ProceduireProvuder` would be an interface like this:
```java
public interface ProcedureProvider {
/** Short string used to identify this provider, matching config items
from the
* catalog-configuration are passed into getProcedureBuilders.
*/
String getName();
/** Human readable description. */
String getDescription();
/** The namespace under which the procedures shall be available. */
Namespace getNamespace();
Map<String, Supplier<ProcedureBuilder>> getProcedureBuilders(Map<String,
String> config);
// ProcedureBuilder moved out of SparkProcedures
}
```
When `BaseCatalog` gets initialized, it uses `ServiceLoader` to find all
`ProcedureProvider`s and calls `getProcedureBuilders` with the configuration
extracted from `initialize()`'s `options` map (e.g. those that have
`procedures.${procedureBuilder.name}.` as the prefix).
Side notes:
* `org.apache.spark.sql.connector.iceberg.catalog.ProcedureCatalog` is only
implemented by `org.apache.iceberg.spark.BaseCatalog` (base class for
`SparkCatalog` + `SparkSessionCatalog`).
* `org.apache.iceberg.spark.procedures.SparkProcedures` is the only source
of procedures at the moment.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]