-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20608/#review41558
-----------------------------------------------------------


Still going through this but here are some initial comments.


datafu-pig/src/main/java/datafu/pig/hash/lsh/cosine/HyperplaneLSH.java
<https://reviews.apache.org/r/20608/#comment75031>

    Would be helpful to mention that sRepeat is a param of the UDFs, not found 
here.



datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/MetricUDF.java
<https://reviews.apache.org/r/20608/#comment75029>

    If this is a vector, then couldn't it be either a tuple or a bag?  The doc 
says that a vector could be represented sparsely using a bag of position and 
value tuples.



datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/MetricUDF.java
<https://reviews.apache.org/r/20608/#comment75030>

    Just to make sure I understand, the vectorBag here can either be a bag of 
tuples or a bag of bag of tuples right?  For the latter case it is a sparse 
representation.



datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/MetricUDF.java
<https://reviews.apache.org/r/20608/#comment75028>

    It would be valuable to validate the input here and throw helpful error 
messages when invalid input is given.  There are couple forms vectors can take 
and it's helpful if users get errors up front when the script is validated 
rather than later while the MR jobs are running.



datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/MetricUDF.java
<https://reviews.apache.org/r/20608/#comment75027>

    We should throw an exception so issues can be found in the frontend before 
jobs are submitted.



datafu-pig/src/main/java/datafu/pig/hash/lsh/util/DataTypeUtil.java
<https://reviews.apache.org/r/20608/#comment75025>

    Could use firstElement right?



datafu-pig/src/main/java/datafu/pig/hash/lsh/util/DataTypeUtil.java
<https://reviews.apache.org/r/20608/#comment75026>

    Shouldn't size be exactly 2?


- Matthew Hayes


On April 23, 2014, 1:14 p.m., Casey Stella wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20608/
> -----------------------------------------------------------
> 
> (Updated April 23, 2014, 1:14 p.m.)
> 
> 
> Review request for DataFu.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> From DATAFU-37: Create a set of UDFs to implement Locality Sensitive Hashing 
> in support of finding k-near neighbors. Initially, hashes associated with L1, 
> L2 and Cosine similarity should be supported.
> 
> 
> Diffs
> -----
> 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/CosineDistanceHash.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/L1PStableHash.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/L2PStableHash.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/LSHFamily.java PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/LSHFunc.java PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/RepeatingLSH.java PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/cosine/HyperplaneLSH.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/cosine/package-info.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/interfaces/LSH.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/interfaces/LSHCreator.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/interfaces/Sampler.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/interfaces/package-info.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/Cosine.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/L1.java PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/L2.java PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/MetricUDF.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/metric/package-info.java 
> PRE-CREATION 
>   
> datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/AbstractStableDistributionFunction.java
>  PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/L1LSH.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/L2LSH.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/p_stable/package-info.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/package-info.java PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/util/DataTypeUtil.java 
> PRE-CREATION 
>   datafu-pig/src/main/java/datafu/pig/hash/lsh/util/package-info.java 
> PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/hash/lsh/LSHPigTest.java 
> PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/hash/lsh/LSHTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/20608/diff/
> 
> 
> Testing
> -------
> 
> 2 unit tests.  One pigunit for the UDFs and one regular JUnit test to test 
> functionality.
> 
> 
> Thanks,
> 
> Casey Stella
> 
>

Reply via email to