[ 
https://issues.apache.org/jira/browse/SPARK-26331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Chirico updated SPARK-26331:
------------------------------------
    Description: 
{code:java}
As described here:{code}
[https://stackoverflow.com/q/53702727/3576984]

I have a UDF I would like to be flexible enough to accept 3 arguments (or in 
general n+k), but for the most part, only 2 (in general, n) are required. The 
natural approach to this is to implement the UDF with 3 arguments, one of which 
has a standard default value.

Copying a toy example from SO:
{code:java}
// Scala
package myUDFs
import org.apache.spark.sql.api.java.UDF3
class my_udf extends UDF3[Int, Int, Int, Int] {
  override def call(a: Int, b: Int, c: Int = 6): Int = { 
    c*(a + b)
  }
}{code}
I would prefer the following to give the expected output of 18:
{code:java}
# Python
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
from pyspark.sql.types import IntType

spark_conf = SparkConf().setAll([('spark.jars', 'myUDFs-assembly-0.1.1.jar')])
spark = SparkSession.builder.appName('my_app').config(conf = 
spark_conf).enableHiveSupport().getOrCreate()
spark.udf.registerJavaFunction("my_udf", "myUDFs.my_udf", IntType())

spark.sql('select my_udf(1, 2)').collect()
{code}
But it seems this is currently impossible.

  was:
{code:java}

As described here:{code}
[https://stackoverflow.com/q/53702727/3576984]

I have a UDF I would like to be flexible enough to accept 3 arguments (or in 
general n+k), but for the most part, only 2 (in general, n) are required. The 
natural approach to this is to implement the UDF with 3 arguments, one of which 
has a standard default value.

Copying a toy example from SO:

{\{package myUDFs }}

{\{import org.apache.spark.sql.api.java.UDF3 }}

{{class my_udf extends UDF3[Int, Int, Int, Int] \{ }}\{{  }}

{\{  override def call(a: Int, b: Int, c: Int = 6): Int ={ }}\{{    }}

{\{    c*(a + b) }}\{{  }}

{\\{  }

}}

{{}}}

I would prefer the following to give the expected output of 18:

{\{from pyspark.conf import SparkConf }}

{\{from pyspark.sql import SparkSession }}

{\{from pyspark.sql.types import IntType }}

{\{spark_conf = SparkConf().setAll([ ('spark.jars', 
'myUDFs-assembly-0.1.1.jar') ]) }}

{\{spark = SparkSession.builder.appName('my_app').config(conf = 
spark_conf).enableHiveSupport().getOrCreate() }}

{{spark.udf.registerJavaFunction("my_udf", "myUDFs.my_udf", IntType())}}

{{spark.sql('select my_udf(1, 2)').collect()}}

But it seems this is currently impossible.


> Allow SQL UDF registration to recognize default function values from Scala
> --------------------------------------------------------------------------
>
>                 Key: SPARK-26331
>                 URL: https://issues.apache.org/jira/browse/SPARK-26331
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>    Affects Versions: 2.4.0
>            Reporter: Michael Chirico
>            Priority: Minor
>
> {code:java}
> As described here:{code}
> [https://stackoverflow.com/q/53702727/3576984]
> I have a UDF I would like to be flexible enough to accept 3 arguments (or in 
> general n+k), but for the most part, only 2 (in general, n) are required. The 
> natural approach to this is to implement the UDF with 3 arguments, one of 
> which has a standard default value.
> Copying a toy example from SO:
> {code:java}
> // Scala
> package myUDFs
> import org.apache.spark.sql.api.java.UDF3
> class my_udf extends UDF3[Int, Int, Int, Int] {
>   override def call(a: Int, b: Int, c: Int = 6): Int = { 
>     c*(a + b)
>   }
> }{code}
> I would prefer the following to give the expected output of 18:
> {code:java}
> # Python
> from pyspark.conf import SparkConf
> from pyspark.sql import SparkSession
> from pyspark.sql.types import IntType
> spark_conf = SparkConf().setAll([('spark.jars', 'myUDFs-assembly-0.1.1.jar')])
> spark = SparkSession.builder.appName('my_app').config(conf = 
> spark_conf).enableHiveSupport().getOrCreate()
> spark.udf.registerJavaFunction("my_udf", "myUDFs.my_udf", IntType())
> spark.sql('select my_udf(1, 2)').collect()
> {code}
> But it seems this is currently impossible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to