> On May 19, 2014, 11:56 p.m., Matthew Hayes wrote: > > datafu-pig/src/main/java/datafu/pig/hash/Hasher.java, line 119 > > <https://reviews.apache.org/r/21618/diff/2/?file=584769#file584769line119> > > > > I think it's important that each backend task uses the same seed when > > the 'rand' option is used. Otherwise the same data will hash to different > > values. You can ensure this by setting the seed in the front end in the > > UDF context. We have a ContextualEvalFunc that makes this easier. You can > > derive from this, then in getOutputSchema you can call > > getInstanceProperties() and put the seed value into the map. To get the > > seed, within the exec method you can call getInstanceProperties() and lazy > > load the seed from it. See EmptyBagToNullFields for an example using this > > pattern.
Awesome, I was wondering how to do this. One of the best things about submitting patches back to open-source projects is getting to learn stuff... Will make the fixes and re-submit. - Philip (flip) ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21618/#review43444 ----------------------------------------------------------- On May 19, 2014, 11:12 p.m., Philip (flip) Kromer wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/21618/ > ----------------------------------------------------------- > > (Updated May 19, 2014, 11:12 p.m.) > > > Review request for DataFu and Matthew Hayes. > > > Bugs: DATAFU-47 > https://issues.apache.org/jira/browse/DATAFU-47 > > > Repository: datafu > > > Description > ------- > > Accompanies DATAFU-47 https://issues.apache.org/jira/browse/DATAFU-47 -- make > sure to apply the patch from DATAFU-46 too first > > Questions for reviewers: > > * If we upgrade Guava, we'd get sip24 (a fast cryptographically secure hash), > crc32 and adler32 (occasionally useful checksums). I can put the update in as > another patch. Should we upgrade? > * This UDF provides the same hashes as MD5 and SHA udfs. Should those be > deprecated in favor of this? I can add the binhex functionality so that > nothing is lost. > * If there's a standard way to do the dependency injection of a fixed random > number generator for the tests please advise. > > > Diffs > ----- > > datafu-pig/src/main/java/datafu/pig/hash/Hasher.java PRE-CREATION > datafu-pig/src/test/java/datafu/test/pig/hash/HashTests.java 7ff8fb9 > > Diff: https://reviews.apache.org/r/21618/diff/ > > > Testing > ------- > > ./gradlew :datafu-pig:test -Dtest.single=HashTests > > > Thanks, > > Philip (flip) Kromer > >