I think this proposal is a good set of trade-offs and has existed in the community for a long period of time. I especially appreciate how the design is focused on a minimal useful component, with future optimizations considered from a point of view of making sure it's flexible, but actual concrete decisions left for the future once we see how this API is used. I think if we try and optimize everything right out of the gate, we'll quickly get stuck (again) and not make any progress.
On Mon, Feb 8, 2021 at 10:46 AM Ryan Blue <b...@apache.org> wrote: > Hi everyone, > > I'd like to start a discussion for adding a FunctionCatalog interface to > catalog plugins. This will allow catalogs to expose functions to Spark, > similar to how the TableCatalog interface allows a catalog to expose > tables. The proposal doc is available here: > https://docs.google.com/document/d/1PLBieHIlxZjmoUB0ERF-VozCRJ0xw2j3qKvUNWpWA2U/edit > > Here's a high-level summary of some of the main design choices: > * Adds the ability to list and load functions, not to create or modify > them in an external catalog > * Supports scalar, aggregate, and partial aggregate functions > * Uses load and bind steps for better error messages and simpler > implementations > * Like the DSv2 table read and write APIs, it uses InternalRow to pass data > * Can be extended using mix-in interfaces to add vectorization, codegen, > and other future features > > There is also a PR with the proposed API: > https://github.com/apache/spark/pull/24559/files > > Let's discuss the proposal here rather than on that PR, to get better > visibility. Also, please take the time to read the proposal first. That > really helps clear up misconceptions. > > > > -- > Ryan Blue > -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau