----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20608/#review41558 -----------------------------------------------------------
Still going through this but here are some initial comments. datafu-pig/src/main/java/datafu/pig/hash/lsh/cosine/HyperplaneLSH.java <https://reviews.apache.org/r/20608/#comment75031> Would be helpful to mention that sRepeat is a param of the UDFs, not found here. datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/MetricUDF.java <https://reviews.apache.org/r/20608/#comment75029> If this is a vector, then couldn't it be either a tuple or a bag? The doc says that a vector could be represented sparsely using a bag of position and value tuples. datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/MetricUDF.java <https://reviews.apache.org/r/20608/#comment75030> Just to make sure I understand, the vectorBag here can either be a bag of tuples or a bag of bag of tuples right? For the latter case it is a sparse representation. datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/MetricUDF.java <https://reviews.apache.org/r/20608/#comment75028> It would be valuable to validate the input here and throw helpful error messages when invalid input is given. There are couple forms vectors can take and it's helpful if users get errors up front when the script is validated rather than later while the MR jobs are running. datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/MetricUDF.java <https://reviews.apache.org/r/20608/#comment75027> We should throw an exception so issues can be found in the frontend before jobs are submitted. datafu-pig/src/main/java/datafu/pig/hash/lsh/util/DataTypeUtil.java <https://reviews.apache.org/r/20608/#comment75025> Could use firstElement right? datafu-pig/src/main/java/datafu/pig/hash/lsh/util/DataTypeUtil.java <https://reviews.apache.org/r/20608/#comment75026> Shouldn't size be exactly 2? - Matthew Hayes On April 23, 2014, 1:14 p.m., Casey Stella wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/20608/ > ----------------------------------------------------------- > > (Updated April 23, 2014, 1:14 p.m.) > > > Review request for DataFu. > > > Repository: datafu > > > Description > ------- > > From DATAFU-37: Create a set of UDFs to implement Locality Sensitive Hashing > in support of finding k-near neighbors. Initially, hashes associated with L1, > L2 and Cosine similarity should be supported. > > > Diffs > ----- > > datafu-pig/src/main/java/datafu/pig/hash/lsh/CosineDistanceHash.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/L1PStableHash.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/L2PStableHash.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/LSHFamily.java PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/LSHFunc.java PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/RepeatingLSH.java PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/cosine/HyperplaneLSH.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/cosine/package-info.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/interfaces/LSH.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/interfaces/LSHCreator.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/interfaces/Sampler.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/interfaces/package-info.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/Cosine.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/L1.java PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/L2.java PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/MetricUDF.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/package-info.java > PRE-CREATION > > datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/AbstractStableDistributionFunction.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/L1LSH.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/L2LSH.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/package-info.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/package-info.java PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/util/DataTypeUtil.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/util/package-info.java > PRE-CREATION > datafu-pig/src/test/java/datafu/test/pig/hash/lsh/LSHPigTest.java > PRE-CREATION > datafu-pig/src/test/java/datafu/test/pig/hash/lsh/LSHTest.java PRE-CREATION > > Diff: https://reviews.apache.org/r/20608/diff/ > > > Testing > ------- > > 2 unit tests. One pigunit for the UDFs and one regular JUnit test to test > functionality. > > > Thanks, > > Casey Stella > >