Re: How to select random n records using mapreduce ?

Niels Basjes Mon, 27 Jun 2011 12:29:15 -0700

The only solution I can think of is by creating a counter in Hadoop
that is incremented each time a mapper lets a record through.
As soon as the value reaches a preselected value the mappers simply
discard the additional input they receive.


Note that this will not at all be random.... yet it's the best I can
come up with right now.

HTH

On Mon, Jun 27, 2011 at 09:11, Jeff Zhang <zjf...@gmail.com> wrote:
>
> Hi all,
> I'd like to select random N records from a large amount of data using
> hadoop, just wonder how can I archive this ? Currently my idea is that let
> each mapper task select N / mapper_number records. Does anyone has such
> experience ?
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: How to select random n records using mapreduce ?

Reply via email to