I'm assuming some things here, but hopefully I understand. So, basically
you have a big table of data distributed across a bunch of executors. And,
you want an efficient way to call a native method for each row.

It sounds similar to a dataframe writer to me. Except, instead of writing
to disk or network, you're 'writing' to a native function. Would a custom
dataframe writer work? That's what I'd try first.

If that doesn't work for your case, you could also try adding a column
where the column function does the native call. However, if doing it that
way, you'd have to ensure that the column function actually gets called for
all rows. (An interesting side effect of that is that you could JNI/WinAPI
errors there and set the column value to the result.)

There are other ways, too, if those options don't work...

On Sun, Jun 25, 2017 at 8:07 PM jeff saremi <jeffsar...@hotmail.com> wrote:

> My specific and immediate need is this: We have a native function wrapped
> in JNI. To increase performance we'd like to avoid calling it record by
> record. mapPartitions() give us the ability to invoke this in bulk. We're
> looking for a similar approach in SQL.
>
>
> ------------------------------
> *From:* Ryan <ryan.hd....@gmail.com>
> *Sent:* Sunday, June 25, 2017 7:18:32 PM
> *To:* jeff saremi
> *Cc:* user@spark.apache.org
> *Subject:* Re: What is the equivalent of mapPartitions in SpqrkSQL?
>
> Why would you like to do so? I think there's no need for us to explicitly
> ask for a forEachPartition in spark sql because tungsten is smart enough to
> figure out whether a sql operation could be applied on each partition or
> there has to be a shuffle.
>
> On Sun, Jun 25, 2017 at 11:32 PM, jeff saremi <jeffsar...@hotmail.com>
> wrote:
>
>> You can do a map() using a select and functions/UDFs. But how do you
>> process a partition using SQL?
>>
>>
>>
>

Reply via email to