----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20608/#review42412 -----------------------------------------------------------
It looks like most of my feedback was addressed. A couple remaining items: 1) The UDFs still require a seed. Should this seed be optional as I brought up earlier? 2) LSHFunc doesn't validate the input schema in any way. Can you go through the issues and mark as fixed the ones that have been addressed? I think it was all but those two. datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/MetricUDF.java <https://reviews.apache.org/r/20608/#comment76181> Do you think it'd be helpful to add a method to DataTypeUtil that validates a schema is a valid vector representation? For the tuple case we can validate the size matches the dimension from the constructor. For the bag case we can at least verify that the inner tuple is a pair. Maybe we can file a follow up JIRA to do more schema verification. datafu-pig/src/test/java/datafu/test/pig/hash/lsh/LSHPigTest.java <https://reviews.apache.org/r/20608/#comment76182> You should fix your IDE. The indentation is really screwy. - Matthew Hayes On May 5, 2014, 8:38 p.m., Casey Stella wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/20608/ > ----------------------------------------------------------- > > (Updated May 5, 2014, 8:38 p.m.) > > > Review request for DataFu. > > > Repository: datafu > > > Description > ------- > > From DATAFU-37: Create a set of UDFs to implement Locality Sensitive Hashing > in support of finding k-near neighbors. Initially, hashes associated with L1, > L2 and Cosine similarity should be supported. > > > Diffs > ----- > > datafu-pig/src/main/java/datafu/pig/hash/lsh/CosineDistanceHash.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/L1PStableHash.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/L2PStableHash.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/LSHFamily.java PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/LSHFunc.java PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/RepeatingLSH.java PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/cosine/HyperplaneLSH.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/cosine/package-info.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/interfaces/LSH.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/interfaces/LSHCreator.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/interfaces/Sampler.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/interfaces/package-info.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/Cosine.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/L1.java PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/L2.java PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/MetricUDF.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/package-info.java > PRE-CREATION > > datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/AbstractStableDistributionFunction.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/L1LSH.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/L2LSH.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/package-info.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/package-info.java PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/util/DataTypeUtil.java > PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/lsh/util/package-info.java > PRE-CREATION > datafu-pig/src/test/java/datafu/test/pig/hash/lsh/LSHPigTest.java > PRE-CREATION > datafu-pig/src/test/java/datafu/test/pig/hash/lsh/LSHTest.java PRE-CREATION > > Diff: https://reviews.apache.org/r/20608/diff/ > > > Testing > ------- > > 2 unit tests. One pigunit for the UDFs and one regular JUnit test to test > functionality. > > > Thanks, > > Casey Stella > >