I have generated a sparse matrix by python, which has the size of 4000*174000 (.pkl), the following is a small part of this matrix : (0, 45) 1 (0, 413) 1 (0, 445) 1 (0, 107) 4 (0, 80) 2 (0, 352) 1 (0, 157) 1 (0, 191) 1 (0, 315) 1 (0, 395) 4 (0, 282) 3 (0, 184) 1 (0, 403) 1 (0, 169) 1 (0, 267) 1 (0, 148) 1 (0, 449) 1 (0, 241) 1 (0, 303) 1 (0, 364) 1 (0, 257) 1 (0, 372) 1 (0, 73) 1 (0, 64) 1 (0, 427) 1 : : (2, 399) 1 (2, 277) 1 (2, 229) 1 (2, 255) 1 (2, 409) 1 (2, 355) 1 (2, 391) 1 (2, 28) 1 (2, 384) 1 (2, 86) 1 (2, 285) 2 (2, 166) 1 (2, 165) 1 (2, 419) 1 (2, 367) 2 (2, 133) 1 (2, 61) 1 (2, 434) 1 (2, 51) 1 (2, 423) 1 (2, 398) 1 (2, 438) 1 (2, 389) 1 (2, 26) 1 (2, 455) 1 I am new in Spark and would like to cluster this matrix by k-means algorithm. Can anyone explain to me what kind of problems I might be faced. Please note that I do not want to use Mllib and would like to write my own k-means. Best Regards
....................................................... Amin Mohebbi PhD candidate in Software Engineering at university of Malaysia Tel : +60 18 2040 017 E-Mail : tp025...@ex.apiit.edu.my amin_...@me.com