Thank you guys, I got my code worked like below:    val record75df = 
sc.parallelize(listForRule75, numPartitions).map(x=> x.replace("|", 
",")).map(_.split(",")).map(x => 
Mycaseclass4(x(0).toInt,x(1).toInt,x(2).toFloat,x(3).toInt)).toDF()
    val userids = 1 to 10000
    val uiddf = sc.parallelize(userids, numPartitions).toDF("userid")
    record75df.registerTempTable("b")
    uiddf.registerTempTable("a")
    val rule75df = sqlContext.sql("select a.*,b.* from a join b")
    rule75df.show


--------------------------------

 

Thanks&Best regards!
San.Luo

----- 原始邮件 -----
发件人:Ryan <ryan.hd....@gmail.com>
收件人:ayan guha <guha.a...@gmail.com>
抄送人:Riccardo Ferrari <ferra...@gmail.com>, luohui20...@sina.com, user 
<user@spark.apache.org>
主题:Re: Re: Is there an operation to create multi record for every element in a 
RDD?
日期:2017年08月09日 17点32分

rdd has a cartesian method

On Wed, Aug 9, 2017 at 5:12 PM, ayan guha <guha.a...@gmail.com> wrote:
If you use join without any condition in becomes cross join. In sql, it looks 
like
Select a.*,b.* from a join b
On Wed, 9 Aug 2017 at 7:08 pm, <luohui20...@sina.com> wrote:
Riccardo and Ryan   Thank you for your ideas.It seems that crossjoin is a new 
dataset api after spark2.x.     my spark version is 1.6.3. Is there a relative 
api to do crossjoin?    thank you.


--------------------------------

 

Thanks&amp;Best regards!
San.Luo

----- 原始邮件 -----
发件人:Riccardo Ferrari <ferra...@gmail.com>
收件人:Ryan <ryan.hd....@gmail.com>
抄送人:luohui20...@sina.com, user <user@spark.apache.org>
主题:Re: Is there an operation to create multi record for every element in a RDD?
日期:2017年08月09日 16点54分

Depends on your Spark version, have you considered the Dataset api?
You can do something like:







val df1 = rdd1.toDF("userid")







val listRDD = sc.parallelize(listForRule77)







val listDF = listRDD.toDF("data")








df1.crossJoin(listDF).orderBy("userid").show(60, 
truncate=false)+------+----------------------+|userid|data                  
|+------+----------------------+|1     |1,1,100.00|1483891200,||1     
|1,1,100.00|1483804800,|...|1     |1,1,100.00|1488902400,|
|1     |1,1,100.00|1489075200,||1     |1,1,100.00|1488470400,|...
On Wed, Aug 9, 2017 at 10:44 AM, Ryan <ryan.hd....@gmail.com> wrote:
It's just sort of inner join operation... If the second dataset isn't very 
large it's ok(btw, you can use flatMap directly instead of map followed by 
flatmap/flattern), otherwise you can register the second one as a rdd/dataset, 
and join them on user id.
On Wed, Aug 9, 2017 at 4:29 PM,  <luohui20...@sina.com> wrote:
hello guys:      I have a simple rdd like :val userIDs = 1 to 10000val rdd1 = 
sc.parallelize(userIDs , 16)   //this rdd has 10000 user id      And I have a 
List[String] like below:scala> listForRule77
res76: List[String] = List(1,1,100.00|1483286400, 1,1,100.00|1483372800, 
1,1,100.00|1483459200, 1,1,100.00|1483545600, 1,1,100.00|1483632000, 
1,1,100.00|1483718400, 1,1,100.00|1483804800, 1,1,100.00|1483891200, 
1,1,100.00|1483977600, 3,1,200.00|1485878400, 1,1,100.00|1485964800, 
1,1,100.00|1486051200, 1,1,100.00|1488384000, 1,1,100.00|1488470400, 
1,1,100.00|1488556800, 1,1,100.00|1488643200, 1,1,100.00|1488729600, 
1,1,100.00|1488816000, 1,1,100.00|1488902400, 1,1,100.00|1488988800, 
1,1,100.00|1489075200, 1,1,100.00|1489161600, 1,1,100.00|1489248000, 
1,1,100.00|1489334400, 1,1,100.00|1489420800, 1,1,100.00|1489507200, 
1,1,100.00|1489593600, 1,1,100.00|1489680000, 1,1,100.00|1489766400)
scala> listForRule77.length
res77: Int = 29
      I need to create a rdd containing  290000 records. for every userid in 
rdd1 , I need to create 29 records according to listForRule77, each record 
start with the userid, for example 1(the userid),1,1,100.00|1483286400.       
My idea is like below:1.write a udfto add the userid to the beginning of every 
string element of listForRule77.2.use val rdd2 = rdd1.map{x=> 
List_udf(x))}.flatmap(), the result rdd2 maybe what I need.
      My question: Are there any problems in my idea? Is there a better way to 
do this ?             

--------------------------------

 

Thanks&amp;Best regards!
San.Luo





-- 
Best Regards,
Ayan Guha




Reply via email to