> On May 19, 2014, 11:56 p.m., Matthew Hayes wrote: > > datafu-pig/src/main/java/datafu/pig/hash/Hasher.java, line 77 > > <https://reviews.apache.org/r/21618/diff/2/?file=584769#file584769line77> > > > > Maybe a cleaner way to do this is to create a protected getRandom() > > method that you derive from and override. You can create this test UDF > > under the test directory and it should be usable within the pig script. > > This would actually be a good pattern for us to follow elsewhere too. We > > have a JIRA to improve testing of UDFs that rely on randomness.
I added a trivial subclass in a separate file in that directory. > On May 19, 2014, 11:56 p.m., Matthew Hayes wrote: > > datafu-pig/src/main/java/datafu/pig/hash/Hasher.java, line 119 > > <https://reviews.apache.org/r/21618/diff/2/?file=584769#file584769line119> > > > > I think it's important that each backend task uses the same seed when > > the 'rand' option is used. Otherwise the same data will hash to different > > values. You can ensure this by setting the seed in the front end in the > > UDF context. We have a ContextualEvalFunc that makes this easier. You can > > derive from this, then in getOutputSchema you can call > > getInstanceProperties() and put the seed value into the map. To get the > > seed, within the exec method you can call getInstanceProperties() and lazy > > load the seed from it. See EmptyBagToNullFields for an example using this > > pattern. > > Philip (flip) Kromer wrote: > Awesome, I was wondering how to do this. One of the best things about > submitting patches back to open-source projects is getting to learn stuff... > > Will make the fixes and re-submit. ended up having SimpleEvalFunc extend ContextualEvalFunc as per DATAFU-50 - Philip (flip) ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21618/#review43444 ----------------------------------------------------------- On May 20, 2014, 9:19 a.m., Philip (flip) Kromer wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/21618/ > ----------------------------------------------------------- > > (Updated May 20, 2014, 9:19 a.m.) > > > Review request for DataFu and Matthew Hayes. > > > Bugs: DATAFU-47 > https://issues.apache.org/jira/browse/DATAFU-47 > > > Repository: datafu > > > Description > ------- > > Accompanies DATAFU-47 https://issues.apache.org/jira/browse/DATAFU-47 -- make > sure to apply the patch from DATAFU-46 too first > > Questions for reviewers: > > * If we upgrade Guava, we'd get sip24 (a fast cryptographically secure hash), > crc32 and adler32 (occasionally useful checksums). I can put the update in as > another patch. Should we upgrade? > * This UDF provides the same hashes as MD5 and SHA udfs. Should those be > deprecated in favor of this? I can add the binhex functionality so that > nothing is lost. > * If there's a standard way to do the dependency injection of a fixed random > number generator for the tests please advise. > > > Diffs > ----- > > datafu-pig/src/main/java/datafu/pig/hash/Hasher.java PRE-CREATION > datafu-pig/src/main/java/datafu/pig/hash/HasherRand.java PRE-CREATION > datafu-pig/src/test/java/datafu/test/pig/hash/HashTests.java 7ff8fb9 > datafu-pig/src/test/java/datafu/test/pig/hash/HasherRandForTesting.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/21618/diff/ > > > Testing > ------- > > ./gradlew :datafu-pig:test -Dtest.single=HashTests > > > Thanks, > > Philip (flip) Kromer > >