Re: how to call udf with parameters

2017-06-18 Thread Yong Zhang
What version of spark you are using? I cannot reproduce your error:


scala> spark.version
res9: String = 2.1.1
scala> val dataset = Seq((0, "hello"), (1, "world")).toDF("id", "text")
dataset: org.apache.spark.sql.DataFrame = [id: int, text: string]
scala> import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.functions.udf

// define a method in similar way like you did
scala> def len = udf { (data: String) => data.length > 0 }
len: org.apache.spark.sql.expressions.UserDefinedFunction

// use it
scala> dataset.select(len($"text").as('length)).show
+--+
|length|
+--+
|  true|
|  true|
+--+


Yong




From: Pralabh Kumar <pralabhku...@gmail.com>
Sent: Friday, June 16, 2017 12:19 AM
To: lk_spark
Cc: user.spark
Subject: Re: how to call udf with parameters

sample UDF
val getlength=udf((data:String)=>data.length())
data.select(getlength(data("col1")))

On Fri, Jun 16, 2017 at 9:21 AM, lk_spark 
<lk_sp...@163.com<mailto:lk_sp...@163.com>> wrote:
hi,all
 I define a udf with multiple parameters  ,but I don't know how to call it 
with DataFrame

UDF:

def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, 
minTermLen: Int) =>
val terms = HanLP.segment(sentence).asScala
.

Call :

scala> val output = input.select(ssplit2($"text",true,true,2).as('words))
:40: error: type mismatch;
 found   : Boolean(true)
 required: org.apache.spark.sql.Column
   val output = input.select(ssplit2($"text",true,true,2).as('words))
 ^
:40: error: type mismatch;
 found   : Boolean(true)
 required: org.apache.spark.sql.Column
   val output = input.select(ssplit2($"text",true,true,2).as('words))
  ^
:40: error: type mismatch;
 found   : Int(2)
 required: org.apache.spark.sql.Column
   val output = input.select(ssplit2($"text",true,true,2).as('words))
   ^

scala> val output = 
input.select(ssplit2($"text",$"true",$"true",$"2").as('words))
org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input 
columns: [id, text];;
'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
+- Project [_1#2 AS id#5, _2#3 AS text#6]
   +- LocalRelation [_1#2, _2#3]

I need help!!


2017-06-16

lk_spark



Re: Re: Re: how to call udf with parameters

2017-06-15 Thread lk_spark
thanks Kumar ,  that really helpful !!


2017-06-16 

lk_spark 



发件人:Pralabh Kumar <pralabhku...@gmail.com>
发送时间:2017-06-16 18:30
主题:Re: Re: how to call udf with parameters
收件人:"lk_spark"<lk_sp...@163.com>
抄送:"user.spark"<user@spark.apache.org>

val getlength=udf((idx1:Int,idx2:Int, data : String)=> 
data.substring(idx1,idx2))



data.select(getlength(lit(1),lit(2),data("col1"))).collect



On Fri, Jun 16, 2017 at 10:22 AM, Pralabh Kumar <pralabhku...@gmail.com> wrote:

Use lit , give me some time , I'll provide an example


On 16-Jun-2017 10:15 AM, "lk_spark" <lk_sp...@163.com> wrote:

thanks Kumar , I want to know how to cao udf with multiple parameters , maybe 
an udf to make a substr function,how can I pass parameter with begin and end 
index ?  I try it with errors. Does the udf parameters could only be a column 
type?

2017-06-16 

lk_spark 



发件人:Pralabh Kumar <pralabhku...@gmail.com>
发送时间:2017-06-16 17:49
主题:Re: how to call udf with parameters
收件人:"lk_spark"<lk_sp...@163.com>
抄送:"user.spark"<user@spark.apache.org>

sample UDF
val getlength=udf((data:String)=>data.length())

data.select(getlength(data("col1")))



On Fri, Jun 16, 2017 at 9:21 AM, lk_spark <lk_sp...@163.com> wrote:

hi,all
 I define a udf with multiple parameters  ,but I don't know how to call it 
with DataFrame

UDF:

def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, 
minTermLen: Int) =>
val terms = HanLP.segment(sentence).asScala
.

Call :

scala> val output = input.select(ssplit2($"text",true,true,2).as('words))
:40: error: type mismatch;
 found   : Boolean(true)
 required: org.apache.spark.sql.Column
   val output = input.select(ssplit2($"text",true,true,2).as('words))
 ^
:40: error: type mismatch;
 found   : Boolean(true)
 required: org.apache.spark.sql.Column
   val output = input.select(ssplit2($"text",true,true,2).as('words))
  ^
:40: error: type mismatch;
 found   : Int(2)
 required: org.apache.spark.sql.Column
   val output = input.select(ssplit2($"text",true,true,2).as('words))
   ^

scala> val output = 
input.select(ssplit2($"text",$"true",$"true",$"2").as('words))
org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input 
columns: [id, text];;
'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
+- Project [_1#2 AS id#5, _2#3 AS text#6]
   +- LocalRelation [_1#2, _2#3]


I need help!!


2017-06-16


lk_spark 

Re: Re: how to call udf with parameters

2017-06-15 Thread Pralabh Kumar
val getlength=udf((idx1:Int,idx2:Int, data : String)=>
data.substring(idx1,idx2))

data.select(getlength(lit(1),lit(2),data("col1"))).collect

On Fri, Jun 16, 2017 at 10:22 AM, Pralabh Kumar <pralabhku...@gmail.com>
wrote:

> Use lit , give me some time , I'll provide an example
>
> On 16-Jun-2017 10:15 AM, "lk_spark" <lk_sp...@163.com> wrote:
>
>> thanks Kumar , I want to know how to cao udf with multiple parameters ,
>> maybe an udf to make a substr function,how can I pass parameter with begin
>> and end index ?  I try it with errors. Does the udf parameters could only
>> be a column type?
>>
>> 2017-06-16
>> --
>> lk_spark
>> ------
>>
>> *发件人:*Pralabh Kumar <pralabhku...@gmail.com>
>> *发送时间:*2017-06-16 17:49
>> *主题:*Re: how to call udf with parameters
>> *收件人:*"lk_spark"<lk_sp...@163.com>
>> *抄送:*"user.spark"<user@spark.apache.org>
>>
>> sample UDF
>> val getlength=udf((data:String)=>data.length())
>> data.select(getlength(data("col1")))
>>
>> On Fri, Jun 16, 2017 at 9:21 AM, lk_spark <lk_sp...@163.com> wrote:
>>
>>> hi,all
>>>  I define a udf with multiple parameters  ,but I don't know how to
>>> call it with DataFrame
>>>
>>> UDF:
>>>
>>> def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean,
>>> minTermLen: Int) =>
>>> val terms = HanLP.segment(sentence).asScala
>>> .
>>>
>>> Call :
>>>
>>> scala> val output = input.select(ssplit2($"text",t
>>> rue,true,2).as('words))
>>> :40: error: type mismatch;
>>>  found   : Boolean(true)
>>>  required: org.apache.spark.sql.Column
>>>val output = input.select(ssplit2($"text",t
>>> rue,true,2).as('words))
>>>  ^
>>> :40: error: type mismatch;
>>>  found   : Boolean(true)
>>>  required: org.apache.spark.sql.Column
>>>val output = input.select(ssplit2($"text",t
>>> rue,true,2).as('words))
>>>   ^
>>> :40: error: type mismatch;
>>>  found   : Int(2)
>>>  required: org.apache.spark.sql.Column
>>>val output = input.select(ssplit2($"text",t
>>> rue,true,2).as('words))
>>>^
>>>
>>> scala> val output = input.select(ssplit2($"text",$
>>> "true",$"true",$"2").as('words))
>>> org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given
>>> input columns: [id, text];;
>>> 'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
>>> +- Project [_1#2 AS id#5, _2#3 AS text#6]
>>>+- LocalRelation [_1#2, _2#3]
>>>
>>> I need help!!
>>>
>>>
>>> 2017-06-16
>>> --
>>> lk_spark
>>>
>>
>>


Re: Re: how to call udf with parameters

2017-06-15 Thread Pralabh Kumar
Use lit , give me some time , I'll provide an example

On 16-Jun-2017 10:15 AM, "lk_spark" <lk_sp...@163.com> wrote:

> thanks Kumar , I want to know how to cao udf with multiple parameters ,
> maybe an udf to make a substr function,how can I pass parameter with begin
> and end index ?  I try it with errors. Does the udf parameters could only
> be a column type?
>
> 2017-06-16
> --
> lk_spark
> --
>
> *发件人:*Pralabh Kumar <pralabhku...@gmail.com>
> *发送时间:*2017-06-16 17:49
> *主题:*Re: how to call udf with parameters
> *收件人:*"lk_spark"<lk_sp...@163.com>
> *抄送:*"user.spark"<user@spark.apache.org>
>
> sample UDF
> val getlength=udf((data:String)=>data.length())
> data.select(getlength(data("col1")))
>
> On Fri, Jun 16, 2017 at 9:21 AM, lk_spark <lk_sp...@163.com> wrote:
>
>> hi,all
>>  I define a udf with multiple parameters  ,but I don't know how to
>> call it with DataFrame
>>
>> UDF:
>>
>> def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean,
>> minTermLen: Int) =>
>> val terms = HanLP.segment(sentence).asScala
>> .
>>
>> Call :
>>
>> scala> val output = input.select(ssplit2($"text",true,true,2).as('words))
>> :40: error: type mismatch;
>>  found   : Boolean(true)
>>  required: org.apache.spark.sql.Column
>>val output = input.select(ssplit2($"text",true,true,2).as('words))
>>  ^
>> :40: error: type mismatch;
>>  found   : Boolean(true)
>>  required: org.apache.spark.sql.Column
>>val output = input.select(ssplit2($"text",true,true,2).as('words))
>>   ^
>> :40: error: type mismatch;
>>  found   : Int(2)
>>  required: org.apache.spark.sql.Column
>>val output = input.select(ssplit2($"text",true,true,2).as('words))
>>^
>>
>> scala> val output = input.select(ssplit2($"text",$
>> "true",$"true",$"2").as('words))
>> org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given
>> input columns: [id, text];;
>> 'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
>> +- Project [_1#2 AS id#5, _2#3 AS text#6]
>>+- LocalRelation [_1#2, _2#3]
>>
>> I need help!!
>>
>>
>> 2017-06-16
>> --
>> lk_spark
>>
>
>


Re: Re: how to call udf with parameters

2017-06-15 Thread lk_spark
thanks Kumar , I want to know how to cao udf with multiple parameters , maybe 
an udf to make a substr function,how can I pass parameter with begin and end 
index ?  I try it with errors. Does the udf parameters could only be a column 
type?

2017-06-16 

lk_spark 



发件人:Pralabh Kumar <pralabhku...@gmail.com>
发送时间:2017-06-16 17:49
主题:Re: how to call udf with parameters
收件人:"lk_spark"<lk_sp...@163.com>
抄送:"user.spark"<user@spark.apache.org>

sample UDF
val getlength=udf((data:String)=>data.length())

data.select(getlength(data("col1")))



On Fri, Jun 16, 2017 at 9:21 AM, lk_spark <lk_sp...@163.com> wrote:

hi,all
 I define a udf with multiple parameters  ,but I don't know how to call it 
with DataFrame

UDF:

def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, 
minTermLen: Int) =>
val terms = HanLP.segment(sentence).asScala
.

Call :

scala> val output = input.select(ssplit2($"text",true,true,2).as('words))
:40: error: type mismatch;
 found   : Boolean(true)
 required: org.apache.spark.sql.Column
   val output = input.select(ssplit2($"text",true,true,2).as('words))
 ^
:40: error: type mismatch;
 found   : Boolean(true)
 required: org.apache.spark.sql.Column
   val output = input.select(ssplit2($"text",true,true,2).as('words))
  ^
:40: error: type mismatch;
 found   : Int(2)
 required: org.apache.spark.sql.Column
   val output = input.select(ssplit2($"text",true,true,2).as('words))
   ^

scala> val output = 
input.select(ssplit2($"text",$"true",$"true",$"2").as('words))
org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input 
columns: [id, text];;
'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
+- Project [_1#2 AS id#5, _2#3 AS text#6]
   +- LocalRelation [_1#2, _2#3]


I need help!!


2017-06-16


lk_spark 

Re: how to call udf with parameters

2017-06-15 Thread Pralabh Kumar
sample UDF
val getlength=udf((data:String)=>data.length())
data.select(getlength(data("col1")))

On Fri, Jun 16, 2017 at 9:21 AM, lk_spark  wrote:

> hi,all
>  I define a udf with multiple parameters  ,but I don't know how to
> call it with DataFrame
>
> UDF:
>
> def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean,
> minTermLen: Int) =>
> val terms = HanLP.segment(sentence).asScala
> .
>
> Call :
>
> scala> val output = input.select(ssplit2($"text",true,true,2).as('words))
> :40: error: type mismatch;
>  found   : Boolean(true)
>  required: org.apache.spark.sql.Column
>val output = input.select(ssplit2($"text",true,true,2).as('words))
>  ^
> :40: error: type mismatch;
>  found   : Boolean(true)
>  required: org.apache.spark.sql.Column
>val output = input.select(ssplit2($"text",true,true,2).as('words))
>   ^
> :40: error: type mismatch;
>  found   : Int(2)
>  required: org.apache.spark.sql.Column
>val output = input.select(ssplit2($"text",true,true,2).as('words))
>^
>
> scala> val output = input.select(ssplit2($"text",$
> "true",$"true",$"2").as('words))
> org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given
> input columns: [id, text];;
> 'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
> +- Project [_1#2 AS id#5, _2#3 AS text#6]
>+- LocalRelation [_1#2, _2#3]
>
> I need help!!
>
>
> 2017-06-16
> --
> lk_spark
>


how to call udf with parameters

2017-06-15 Thread lk_spark
hi,all
 I define a udf with multiple parameters  ,but I don't know how to call it 
with DataFrame
 
UDF:

def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, 
minTermLen: Int) =>
val terms = HanLP.segment(sentence).asScala
.

Call :

scala> val output = input.select(ssplit2($"text",true,true,2).as('words))
:40: error: type mismatch;
 found   : Boolean(true)
 required: org.apache.spark.sql.Column
   val output = input.select(ssplit2($"text",true,true,2).as('words))
 ^
:40: error: type mismatch;
 found   : Boolean(true)
 required: org.apache.spark.sql.Column
   val output = input.select(ssplit2($"text",true,true,2).as('words))
  ^
:40: error: type mismatch;
 found   : Int(2)
 required: org.apache.spark.sql.Column
   val output = input.select(ssplit2($"text",true,true,2).as('words))
   ^

scala> val output = 
input.select(ssplit2($"text",$"true",$"true",$"2").as('words))
org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input 
columns: [id, text];;
'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
+- Project [_1#2 AS id#5, _2#3 AS text#6]
   +- LocalRelation [_1#2, _2#3]


I need help!!


2017-06-16


lk_spark