Hi ALL,
I am trying to implement a mlllib spark job, to find the similarity between
documents(for my case is basically home addess).
i believe i cannot use DIMSUM for my use case as, DIMSUM is works well only
with matrix with thin columns and more rows in matrix.
matrix example format, for my use case:
doc1(address1) doc2(address2) .......... m is
going to be huge as i have more add.
san mateo 0.73462 0
san fransico .. ..
san bruno .. ..
.
.
.
.
and n is going to be thin compared to m
I would like to know if there is way to leverage DIMSUM to work on my use
case, and if not what other alogrithm i can try that is available in spark
mlllib.
Regards,
Satyajit.