I wrote a quick patch and attached it if anyone wants to think about this in context. I can always rebase this to master.
On Thu, Sep 27, 2018 at 1:39 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > And incase anyone is wondering, the reason I want this may be avoided with > DataSourceV2 depending on some of the function pushdown discussions. We > want to add functions which work only with the Cassandra DataSource (ttl > and writetime), I've done the work to add in the custom expressions and > analysis rules, but I want to make sure it gets into the SQL interface. > > On Thu, Sep 27, 2018 at 1:35 PM Russell Spitzer <russell.spit...@gmail.com> > wrote: > >> It would be a @dev internal api I think >> >> If we wanted to go extremely general with post session init, it could be >> added to SparkExtensions >> >> def postSessionInit(session: SparkSession) : Unit >> >> Which would allow you to do just about anything after sessionState was >> done initialized. >> >> Or if we specifically wanted to allow just functions >> >> def injectFunction(name: String, function: Seq[Expression] => >> [Expression]) { >> sparkSession.registerFunction(name, function) // Or add to a buffer >> which is registered later >> } >> >> >> >> On Thu, Sep 27, 2018 at 1:16 PM Reynold Xin <r...@databricks.com> wrote: >> >>> Thoughts on how the api would look like? >>> >>> On Thu, Sep 27, 2018 at 11:13 AM Russell Spitzer < >>> russell.spit...@gmail.com> wrote: >>> >>>> While that's easy for some users, we basically want to load up some >>>> functions by default into all session catalogues regardless of who made >>>> them. We do this with certain rules and strategies using the >>>> SparkExtensions, so all apps that run through our submit scripts get a >>>> config parameter added and it's transparent to the user. I think we'll >>>> probably have to do some forks (at least for the CliDriver), the >>>> thriftserver has a bunch of code which doesn't run under "startWithContext" >>>> so we may have an issue there as well. >>>> >>>> >>>> >>>> On Wed, Sep 26, 2018, 6:21 PM Mark Hamstra <m...@clearstorydata.com> >>>> wrote: >>>> >>>>> You're talking about users starting Thriftserver or SqlShell from the >>>>> command line, right? It's much easier if you are starting a Thriftserver >>>>> programmatically so that you can register functions when initializing a >>>>> SparkContext and then HiveThriftServer2.startWithContext using that >>>>> context. >>>>> >>>>> On Wed, Sep 26, 2018 at 3:30 PM Russell Spitzer < >>>>> russell.spit...@gmail.com> wrote: >>>>> >>>>>> I've been looking recently on possible avenues to load new functions >>>>>> into the Thriftserver and SqlShell at launch time. I basically want to >>>>>> preload a set of functions in addition to those already present in the >>>>>> Spark Code. I'm not sure there is at present a way to do this and I was >>>>>> wondering if anyone had any ideas. >>>>>> >>>>>> I would basically want to make it so that any user launching either >>>>>> of these tools would automatically have access to some custom functions. >>>>>> In >>>>>> the SparkShell I can do this by adding additional lines to the init >>>>>> section >>>>>> but I think It would be nice if we could pass in a parameter which would >>>>>> point to a class with a list of additional functions to add to all new >>>>>> session states. >>>>>> >>>>>> An interface like Spark Sessions Extensions but instead of running >>>>>> during Session Init, it would run after session init has completed. >>>>>> >>>>>> Thanks for your time and I would be glad to hear any opinions or >>>>>> ideas on this, >>>>>> >>>>> -- >>> -- >>> excuse the brevity and lower case due to wrist injury >>> >>
InjectFunctions.patch
Description: Binary data
--------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org