Hello, My name is Arnoldo Muller, I am a final year PhD candidate. I am working on similarity search for detecting Open Source license violations (www.furiachan.org). In my spare time, I also code a similarity search engine (www.obsearch.net).
In am interested in the Apache Hadoop Open Source Student Project: "Performance evaluation of existing Locality Sensitive Hashing schemes. Research on new hashing schemes for filesystem namespace partitioning" If nobody is working on this, I would like to know more about the scope of the project. Does it make sense to define a distance function so that similar namespaces are grouped together into the same "bucket"? If so, I have three or four metric trees that could be used for the comparison. Thanks, Arnoldo Muller
