Hi, I'm new to spark, I'm trying to compute similarity between users/products. I've a huge table which I can't do a self join with the cluster I have.
I'm trying to implement do self join using random walk methodology which will approximately give the results. The table is a bipartite graph with 2 columns Idea: - take any element(t1) in the first column in random - picking the corresponding element(t2) in for the element(t1) in the graph. - lookup for possible elements in the graph for t2 in random say t3 - create a edge between t1 and t3 - Iterate it in the order of atleat n*n so that results will be approximate Questions - Is spark a suitable environment to do this? - I've coded logic for picking elements in random but facing issue when building graph - Should consider graphx? Any help is highly appreciated. Regards, Naveen