I wrote a quick patch and attached it if anyone wants to think about this
in context. I can always rebase this to master.

On Thu, Sep 27, 2018 at 1:39 PM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> And incase anyone is wondering, the reason I want this may be avoided with
> DataSourceV2 depending on some of the function pushdown discussions. We
> want to add functions which work only with the Cassandra DataSource (ttl
> and writetime), I've done the work to add in the custom expressions and
> analysis rules, but I want to make sure it gets into the SQL interface.
>
> On Thu, Sep 27, 2018 at 1:35 PM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> It would be a @dev internal api I think
>>
>> If we wanted to go extremely general with post session init, it could be
>> added to SparkExtensions
>>
>> def postSessionInit(session: SparkSession) : Unit
>>
>> Which would allow you to do just about anything after sessionState was
>> done initialized.
>>
>> Or if we specifically wanted to allow just functions
>>
>> def injectFunction(name: String, function: Seq[Expression] =>
>> [Expression]) {
>>   sparkSession.registerFunction(name, function) // Or add to a buffer
>> which is registered later
>> }
>>
>>
>>
>> On Thu, Sep 27, 2018 at 1:16 PM Reynold Xin <r...@databricks.com> wrote:
>>
>>> Thoughts on how the api would look like?
>>>
>>> On Thu, Sep 27, 2018 at 11:13 AM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>>
>>>> While that's easy for some users, we basically want to load up some
>>>> functions by default into all session catalogues regardless of who made
>>>> them. We do this with certain rules and strategies using the
>>>> SparkExtensions, so all apps that run through our submit scripts get a
>>>> config parameter added and it's transparent to the user. I think we'll
>>>> probably have to do some forks (at least for the CliDriver), the
>>>> thriftserver has a bunch of code which doesn't run under "startWithContext"
>>>> so we may have an issue there as well.
>>>>
>>>>
>>>>
>>>> On Wed, Sep 26, 2018, 6:21 PM Mark Hamstra <m...@clearstorydata.com>
>>>> wrote:
>>>>
>>>>> You're talking about users starting Thriftserver or SqlShell from the
>>>>> command line, right? It's much easier if you are starting a Thriftserver
>>>>> programmatically so that you can register functions when initializing a
>>>>> SparkContext and then  HiveThriftServer2.startWithContext using that
>>>>> context.
>>>>>
>>>>> On Wed, Sep 26, 2018 at 3:30 PM Russell Spitzer <
>>>>> russell.spit...@gmail.com> wrote:
>>>>>
>>>>>> I've been looking recently on possible avenues to load new functions
>>>>>> into the Thriftserver and SqlShell at launch time. I basically want to
>>>>>> preload a set of functions in addition to those already present in the
>>>>>> Spark Code. I'm not sure there is at present a way to do this and I was
>>>>>> wondering if anyone had any ideas.
>>>>>>
>>>>>> I would basically want to make it so that any user launching either
>>>>>> of these tools would automatically have access to some custom functions. 
>>>>>> In
>>>>>> the SparkShell I can do this by adding additional lines to the init 
>>>>>> section
>>>>>> but I think It would be nice if we could pass in a parameter which would
>>>>>> point to a class with a list of additional functions to add to all new
>>>>>> session states.
>>>>>>
>>>>>> An interface like Spark Sessions Extensions but instead of running
>>>>>> during Session Init, it would run after session init has completed.
>>>>>>
>>>>>> Thanks for your time and I would be glad to hear any opinions or
>>>>>> ideas on this,
>>>>>>
>>>>> --
>>> --
>>> excuse the brevity and lower case due to wrist injury
>>>
>>

Attachment: InjectFunctions.patch
Description: Binary data

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to