Rachit Soni created CRUNCH-373:
----------------------------------

             Summary: Problem while Performing MapSide join with 
ImmutableBytesWritable/Text
                 Key: CRUNCH-373
                 URL: https://issues.apache.org/jira/browse/CRUNCH-373
             Project: Crunch
          Issue Type: Bug
          Components: Core
            Reporter: Rachit Soni
            Assignee: Josh Wills


I have been having issues performing MapSide Join with ImmutableBytesWritable 
as the join key and it always have only 1 value in the map created in the 
initialize method of MapSideJoinDoFn[1]. With the same set of data if I perform 
reduce side join it works perfectly fine giving me the correct result.

Additionally, I am making sure the map can be loaded in memory.

The result in both the above cases are different.  When I dug up the code where 
Map side join is being performed in MapSideDoFn [1] when the right side is 
taken in memory and converted to map [2] all the keys get over written with the 
last key that is being updated on the map. Seems like there it is referencing 
the same memory location each and every time and is not cloning it properly. 
This only happens when I use ImmutableBytesWritable/Text, anything except
ImmutableBytesWritable/Text works perfectly fine.
 
It looks like SeqFileReaderFactory (which I believe implements the PTable under 
the hood for writables) does indeed reuse keys/values [3] in much the same ways 
reducers do.  So, I think in this code [4] it needs to clone the keys/values 
rather than just store them in a map
 
Also, I am attaching a test which I wrote to reproduce the issue. 

[1] 
https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L131
 
[2] 
https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L153

[3] 
https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/seq/SeqFileReaderFactory.java#L88

[4] 
https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L153



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to