> On May 19, 2014, 11:56 p.m., Matthew Hayes wrote:
> > datafu-pig/src/main/java/datafu/pig/hash/Hasher.java, line 119
> > <https://reviews.apache.org/r/21618/diff/2/?file=584769#file584769line119>
> >
> >     I think it's important that each backend task uses the same seed when 
> > the 'rand' option is used.  Otherwise the same data will hash to different 
> > values.  You can ensure this by setting the seed in the front end in the 
> > UDF context.  We have a ContextualEvalFunc that makes this easier.  You can 
> > derive from this, then in getOutputSchema you can call 
> > getInstanceProperties() and put the seed value into the map.  To get the 
> > seed, within the exec method you can call getInstanceProperties() and lazy 
> > load the seed from it. See EmptyBagToNullFields for an example using this 
> > pattern.

Awesome, I was wondering how to do this. One of the best things about 
submitting patches back to open-source projects is getting to learn stuff...

Will make the fixes and re-submit.


- Philip (flip)


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21618/#review43444
-----------------------------------------------------------


On May 19, 2014, 11:12 p.m., Philip (flip) Kromer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21618/
> -----------------------------------------------------------
> 
> (Updated May 19, 2014, 11:12 p.m.)
> 
> 
> Review request for DataFu and Matthew Hayes.
> 
> 
> Bugs: DATAFU-47
>     https://issues.apache.org/jira/browse/DATAFU-47
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> Accompanies DATAFU-47 https://issues.apache.org/jira/browse/DATAFU-47 -- make 
> sure to apply the patch from DATAFU-46 too first
> 
> Questions for reviewers:
> 
> * If we upgrade Guava, we'd get sip24 (a fast cryptographically secure hash), 
> crc32 and adler32 (occasionally useful checksums). I can put the update in as 
> another patch. Should we upgrade?
> * This UDF provides the same hashes as MD5 and SHA udfs. Should those be 
> deprecated in favor of this? I can add the binhex functionality so that 
> nothing is lost.
> * If there's a standard way to do the dependency injection of a fixed random 
> number generator for the tests please advise.
> 
> 
> Diffs
> -----
> 
>   datafu-pig/src/main/java/datafu/pig/hash/Hasher.java PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/hash/HashTests.java 7ff8fb9 
> 
> Diff: https://reviews.apache.org/r/21618/diff/
> 
> 
> Testing
> -------
> 
>  ./gradlew :datafu-pig:test -Dtest.single=HashTests 
> 
> 
> Thanks,
> 
> Philip (flip) Kromer
> 
>

Reply via email to