Thank you guys, I got my code worked like below:val record75df =
sc.parallelize(listForRule75, numPartitions).map(x=> x.replace("|",
",")).map(_.split(",")).map(x =>
Mycaseclass4(x(0).toInt,x(1).toInt,x(2).toFloat,x(3).toInt)).toDF()
val userids = 1 to 1
val uiddf =
rdd has a cartesian method
On Wed, Aug 9, 2017 at 5:12 PM, ayan guha wrote:
> If you use join without any condition in becomes cross join. In sql, it
> looks like
>
> Select a.*,b.* from a join b
>
> On Wed, 9 Aug 2017 at 7:08 pm, wrote:
>
>> Riccardo
If you use join without any condition in becomes cross join. In sql, it
looks like
Select a.*,b.* from a join b
On Wed, 9 Aug 2017 at 7:08 pm, wrote:
> Riccardo and Ryan
>Thank you for your ideas.It seems that crossjoin is a new dataset api
> after spark2.x.
> my
Riccardo and Ryan Thank you for your ideas.It seems that crossjoin is a new
dataset api after spark2.x. my spark version is 1.6.3. Is there a relative
api to do crossjoin?thank you.
ThanksBest regards!
San.Luo
- 原始邮件 -
发件人:Riccardo
Depends on your Spark version, have you considered the Dataset api?
You can do something like:
val df1 = rdd1.toDF("userid")
val listRDD = sc.parallelize(listForRule77)
val listDF = listRDD.toDF("data")
df1.crossJoin(listDF).orderBy("userid").show(60, truncate=false)
It's just sort of inner join operation... If the second dataset isn't very
large it's ok(btw, you can use flatMap directly instead of map followed by
flatmap/flattern), otherwise you can register the second one as a
rdd/dataset, and join them on user id.
On Wed, Aug 9, 2017 at 4:29 PM,