Re: I keep getting multiple values for unique reduce keys

Rick Ross Sun, 04 Sep 2011 22:15:48 -0700

Thanks, but unless I misread you, that didn't do it.     Naturally the object 
that I am creating just has a couple of ArrayLists to gather up Name and Type 
objects.


I suspect I need to extend ArrayWritable instead.   I'll try that next.  

Cheers.

R

On Sep 4, 2011, at 9:37 PM, Sudharsan Sampath wrote:

> Hi,
> 
> I suspect it's something to do with your custom Writable. Do you have a clear 
> method on your container? If so, that should be used before the obj is 
> initialized every time to avoid retaining previous values due to object reuse 
> during ser-de process.
> 
> Thanks
> Sudhan S
> 
> 
> 
> On Mon, Sep 5, 2011 at 6:11 AM, Rick Ross <r...@semanticresearch.com> wrote:
> Hi all,
> 
> I have ensured that my mapper produces a unique key for every value it writes 
> and further more that each map() call only writes one value.    I note here 
> that the value is a custom for which I handle the Writable interface methods.
> 
> I realize that it isn't very real world to have (well, want) no combining 
> done prior to reducing, but I'm still getting my feet wet.
> 
> When the reducer runs, I expected to see one reduce() call for every map() 
> call, and I do.    However, the value I get is the composite of all the 
> reduce() calls that came before it.
> 
> So, for example, the mapper gets data like this :
> 
>   ID,     Name,          Type,          Other stuff...
>   A000,   Cream,         Group,         ...
>   B231,   Led Zeppelin,  Group,         ...
>   A044,   Liberace,      Individual,    ...
> 
> 
> ID is the external key from the source data and is guaranteed to be unique.
> 
> When I map it, I create a container for the row data and output that 
> container with all the data from that row only and use the ID field as a key.
> 
> Since the key is always unique I expected the sort/shuffle step to never 
> coalesce any two values.    So I expected my reduce() method to be called 
> once per mapped input row, and it is.
> 
> The problem is, as each row is processed, the reducer sees a set of 
> cumulative value data instead of a container with a row of data in it.  So 
> the 'value' parameter to reduce always has the information from previous 
> reduce steps.
> 
> For example, given the data above :
> 
> 1st Reducer Call :
>   Key = A000
>   Value =
>       Container :
>          (object 1) : Name = Cream, Type = Group, MBID = A000, ...
> 
> 2nd Reducer Call :
>   Key = B231
>   Value =
>       Container :
>          (object 1) : Name = Led Zeppelin, Type = Group, MBID = B231, ...
>          (object 2) : Name = Cream, Type = Group, MBID = A000, ...
> 
> So the second reduce call has data in it from the first reduce call.   Very 
> strange!   At a guess I would say the reducer is re-using the object when it 
> reads the objects back from the mapping step.  I dunno..
> 
> If anyone has any ideas, I'm open to suggestions.      0.20.2-cdh3u0
> 
> Thanks!
> 
> R
> 
> 
> 
>

Re: I keep getting multiple values for unique reduce keys

Reply via email to