Re: How to work with a joined rdd in pyspark?

2015-11-30 Thread arnalone
ahhh I get it thx!! I did not know that we can use "double index" I used x[0] to point on shows, x[1][0] to point on channels x[1][1] to point on views. I feel terribly noob. Thank you all :) -- View this message in context:

Re: How to work with a joined rdd in pyspark?

2015-11-29 Thread arnalone
Thanks for replying so fast! it was not clear. my code is : joined_dataset = show_channel.join(show_views) for your knowledge, the first lines are joined_dataset.take(4) Out[93]: [(u'PostModern_Cooking', (u'DEF', 1038)), (u'PostModern_Cooking', (u'DEF', 415)), (u'PostModern_Cooking', (u'DEF',

Re: How to work with a joined rdd in pyspark?

2015-11-29 Thread arnalone
Yes that 's what I am trying to do, but I do not manage to "point" on the channel field to filter on "ABC" and then in the map step to get only shows and views. In scala you do it with (_._2._1 == "ABC") and (_._1, _._2._2), but I don't find the right syntax in python to do the same :( -- View