Hello, It seems to me that there is an issue in MinHashMapper class. In map method, the loop goes over the elements in the vector. In many cases the instance of Vector abstract class is a SparseVector and iteration would meant to be over non-zeros values (e.g., documents as a sparse vector of words). However, in current implementation the iteration will go over all the elements including zero-valued (as using the vector iterator by default). This can produce meaningless clustering. In addition, in this case I think we should hash the index of the element rather than it's value.
Can somebody confirm or disprove this? Thanks, Best regards, Elena Smirnova.
