回复:Re: Re: Is there an operation to create multi record for every element in a RDD?

2017-08-09 Thread luohui20001
Thank you guys, I got my code worked like below:val record75df = sc.parallelize(listForRule75, numPartitions).map(x=> x.replace("|", ",")).map(_.split(",")).map(x => Mycaseclass4(x(0).toInt,x(1).toInt,x(2).toFloat,x(3).toInt)).toDF() val userids = 1 to 1 val uiddf =

Re: Re: Is there an operation to create multi record for every element in a RDD?

2017-08-09 Thread Ryan
rdd has a cartesian method On Wed, Aug 9, 2017 at 5:12 PM, ayan guha wrote: > If you use join without any condition in becomes cross join. In sql, it > looks like > > Select a.*,b.* from a join b > > On Wed, 9 Aug 2017 at 7:08 pm, wrote: > >> Riccardo

Re: Re: Is there an operation to create multi record for every element in a RDD?

2017-08-09 Thread ayan guha
If you use join without any condition in becomes cross join. In sql, it looks like Select a.*,b.* from a join b On Wed, 9 Aug 2017 at 7:08 pm, wrote: > Riccardo and Ryan >Thank you for your ideas.It seems that crossjoin is a new dataset api > after spark2.x. > my

回复:Re: Is there an operation to create multi record for every element in a RDD?

2017-08-09 Thread luohui20001
Riccardo and Ryan Thank you for your ideas.It seems that crossjoin is a new dataset api after spark2.x. my spark version is 1.6.3. Is there a relative api to do crossjoin?thank you. ThanksBest regards! San.Luo - 原始邮件 - 发件人:Riccardo

Re: Is there an operation to create multi record for every element in a RDD?

2017-08-09 Thread Riccardo Ferrari
Depends on your Spark version, have you considered the Dataset api? You can do something like: val df1 = rdd1.toDF("userid") val listRDD = sc.parallelize(listForRule77) val listDF = listRDD.toDF("data") df1.crossJoin(listDF).orderBy("userid").show(60, truncate=false)

Re: Is there an operation to create multi record for every element in a RDD?

2017-08-09 Thread Ryan
It's just sort of inner join operation... If the second dataset isn't very large it's ok(btw, you can use flatMap directly instead of map followed by flatmap/flattern), otherwise you can register the second one as a rdd/dataset, and join them on user id. On Wed, Aug 9, 2017 at 4:29 PM,