snazy opened a new issue #3254:
URL: https://github.com/apache/iceberg/issues/3254


   Idea: have a mechanism that allows plugging additional implementations of 
`org.apache.spark.sql.connector.iceberg.catalog.Procedure` for all users of 
`SparkCatalog` and `SparkSessionCatalog` by "just dropping an additional jar".
   
   `Procedure`s allow a very flexible way (generic input parameters + result 
row(s)) to implement additional functionality that is say "too advanced" for 
everyday use and/or does not justify the overhead to implement the whole spiel 
of a Spark SQL-extension.
   
   The only way to add custom procedures I could find is to extend 
`SparkCatalog` and/or `SparkSessionCatalog` and override `loadProcedure`, which 
requires users to configure the subclasses of `Spark[Session]Catalog` in their 
Spark configuration.
   
   `Procedure`s already use namespaces, although `system` is the only supported 
namespace via `BaseCatalog.loadProcedure`.
   
   `BaseCatalog` could use Java's `java.util.ServiceLoader` to find 
`ProcedureProvider`s.  A `ProceduireProvuder` would be an interface like this:
   ```java
   public interface ProcedureProvider {
     /** Short string used to identify this provider, matching config items 
from the
      * catalog-configuration are passed into getProcedureBuilders.
      */
     String getName();
   
     /** Human readable description. */
     String getDescription();
   
     /** The namespace under which the procedures shall be available. */
     Namespace getNamespace();
   
     Map<String, Supplier<ProcedureBuilder>> getProcedureBuilders(Map<String, 
String> config);
     // ProcedureBuilder moved out of SparkProcedures
   }
   ```
   When `BaseCatalog` gets initialized, it uses `ServiceLoader` to find all 
`ProcedureProvider`s and calls `getProcedureBuilders` with the configuration 
extracted from `initialize()`'s `options` map (e.g. those that have 
`procedures.${procedureBuilder.name}.` as the prefix).
   
   Side notes:
   * `org.apache.spark.sql.connector.iceberg.catalog.ProcedureCatalog` is only 
implemented by `org.apache.iceberg.spark.BaseCatalog` (base class for 
`SparkCatalog` + `SparkSessionCatalog`).
   * `org.apache.iceberg.spark.procedures.SparkProcedures` is the only source 
of procedures at the moment.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to