-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20608/#review41559
-----------------------------------------------------------


This is pretty cool, nicely done!  I still need to read the paper to understand 
the theory behind it but the code looks good overall.


datafu-pig/src/main/java/datafu/pig/hash/lsh/CosineDistanceHash.java
<https://reviews.apache.org/r/20608/#comment75043>

    In practice do you think we probably don't want to provide a seed?  It 
seems like if a seed is used every mapper instance will get the same RNG 
initialized with the same seed and therefore generate the same random numbers 
to initialize the hash functions.  Should we make the seed optional?  I can see 
it being useful for unit testing though.



datafu-pig/src/main/java/datafu/pig/hash/lsh/CosineDistanceHash.java
<https://reviews.apache.org/r/20608/#comment75033>

    Is it possible that a point could appear in more than one family by chance? 
 Would we need to apply distinct to be safe? This question applies to the other 
examples as well.



datafu-pig/src/main/java/datafu/pig/hash/lsh/L1PStableHash.java
<https://reviews.apache.org/r/20608/#comment75042>

    doc: reword this sentence



datafu-pig/src/main/java/datafu/pig/hash/lsh/L1PStableHash.java
<https://reviews.apache.org/r/20608/#comment75044>

    typo: L1, not Cosine



datafu-pig/src/main/java/datafu/pig/hash/lsh/L2PStableHash.java
<https://reviews.apache.org/r/20608/#comment75045>

    typo: L1 => L2



datafu-pig/src/main/java/datafu/pig/hash/lsh/L2PStableHash.java
<https://reviews.apache.org/r/20608/#comment75046>

    reword this too



datafu-pig/src/main/java/datafu/pig/hash/lsh/L2PStableHash.java
<https://reviews.apache.org/r/20608/#comment75047>

    typo: L2StableHash



datafu-pig/src/main/java/datafu/pig/hash/lsh/L2PStableHash.java
<https://reviews.apache.org/r/20608/#comment75048>

    typo: should be L2 distance



datafu-pig/src/main/java/datafu/pig/hash/lsh/LSHFunc.java
<https://reviews.apache.org/r/20608/#comment75032>

    would be good to validate the input schema



datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/AbstractStableDistributionFunction.java
<https://reviews.apache.org/r/20608/#comment75036>

    typo: with a => are



datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/AbstractStableDistributionFunction.java
<https://reviews.apache.org/r/20608/#comment75037>

    typo: of for



datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/L2LSH.java
<https://reviews.apache.org/r/20608/#comment75038>

    typo: L2 and 2-stable



datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/L2LSH.java
<https://reviews.apache.org/r/20608/#comment75040>

    typo: 2-stable



datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/L2LSH.java
<https://reviews.apache.org/r/20608/#comment75041>

    typo: gaussian



datafu-pig/src/test/java/datafu/test/pig/hash/lsh/LSHPigTest.java
<https://reviews.apache.org/r/20608/#comment75049>

    some of the indentation could be cleaned up here


- Matthew Hayes


On April 23, 2014, 1:14 p.m., Casey Stella wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20608/
> -----------------------------------------------------------
> 
> (Updated April 23, 2014, 1:14 p.m.)
> 
> 
> Review request for DataFu.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> From DATAFU-37: Create a set of UDFs to implement Locality Sensitive Hashing 
> in support of finding k-near neighbors. Initially, hashes associated with L1, 
> L2 and Cosine similarity should be supported.
> 
> 
> Diffs
> -----
> 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/CosineDistanceHash.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/L1PStableHash.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/L2PStableHash.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/LSHFamily.java PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/LSHFunc.java PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/RepeatingLSH.java PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/cosine/HyperplaneLSH.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/cosine/package-info.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/interfaces/LSH.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/interfaces/LSHCreator.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/interfaces/Sampler.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/interfaces/package-info.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/Cosine.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/L1.java PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/L2.java PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/MetricUDF.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/package-info.java 
> PRE-CREATION 
>   
> datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/AbstractStableDistributionFunction.java
>  PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/L1LSH.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/L2LSH.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/package-info.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/package-info.java PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/util/DataTypeUtil.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/util/package-info.java 
> PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/hash/lsh/LSHPigTest.java 
> PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/hash/lsh/LSHTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/20608/diff/
> 
> 
> Testing
> -------
> 
> 2 unit tests.  One pigunit for the UDFs and one regular JUnit test to test 
> functionality.
> 
> 
> Thanks,
> 
> Casey Stella
> 
>

Reply via email to