Josh,
If you look at the current piece of code then it can be. But in general
I want it to work on a PCollection. This was just a sample testbed where
I was playing with it.
If it works an a PCollection then it can be more useful, I am thinking
of a Aggregation function which can do this.
Also what you said about building filters for a bunch of files/folder
looks an interesting use case to me. I can add something on the lines of
piggybank and make it there. J
regards
Rahul
On 20-08-2012 20:29, Josh Wills wrote:
Hey Rahul,
Very cool use case. A thought: isn't the name of the file that
contains the bloom filter a better key than the boolean? That way, I
could point the input at an entire directory of files and have it
build bloom filters for all of them for me.
It seems useful to me in general, but I'm not quite sure where to put
it-- it's more useful than an example, but not such a common use case
that we would put it in core. We need something like the equivalent of
Pig's piggybank.
J
On Mon, Aug 20, 2012 at 12:58 AM, Rahul <[email protected]> wrote:
Hi,
Today I tried to create BloomFilters using Crunch, attached is the testcase
for the same. I do not know if there is a better way of accomplishing the
same.
I think APIs to create/load BloomFilters could be a good add-on to Crunch's
existing set. If people feel like it could be added then I can make a patch
for the same.
regards,
Rahul