Re: How to select random n records using mapreduce ?

Anthony Urso Mon, 27 Jun 2011 13:11:27 -0700

On Mon, Jun 27, 2011 at 12:11 AM, Jeff Zhang <zjf...@gmail.com> wrote:
>
> Hi all,
> I'd like to select random N records from a large amount of data using
> hadoop, just wonder how can I archive this ? Currently my idea is that let
> each mapper task select N / mapper_number records. Does anyone has such 
> experience ?


I've done this before, and it will work fine as long as all of your
splits have identical numbers of records.

Re: How to select random n records using mapreduce ?

Reply via email to