[ 
https://issues.apache.org/jira/browse/CRUNCH-373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid resolved CRUNCH-373.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 0.8.3
                   0.10.0
         Assignee: Gabriel Reid  (was: Josh Wills)

Squashed into a single commit and pushed to master and 0.8 branch.

> Problem while Performing MapSide join with ImmutableBytesWritable/Text
> ----------------------------------------------------------------------
>
>                 Key: CRUNCH-373
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-373
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.9.0, 0.8.2
>            Reporter: Rachit Soni
>            Assignee: Gabriel Reid
>             Fix For: 0.10.0, 0.8.3
>
>         Attachments: CRUNCH-371_test.patch, CRUNCH-373b.patch, 
> CRUNCH-373c.patch, CrunchHBaseIT.java
>
>
> I have been having issues performing MapSide Join with ImmutableBytesWritable 
> as the join key and it always have only 1 value in the map created in the 
> initialize method of MapSideJoinDoFn[1]. With the same set of data if I 
> perform reduce side join it works perfectly fine giving me the correct result.
> Additionally, I am making sure the map can be loaded in memory.
> The result in both the above cases are different.  When I dug up the code 
> where Map side join is being performed in MapSideDoFn [1] when the right side 
> is taken in memory and converted to map [2] all the keys get over written 
> with the last key that is being updated on the map. Seems like there it is 
> referencing the same memory location each and every time and is not cloning 
> it properly. This only happens when I use ImmutableBytesWritable/Text, 
> anything except
> ImmutableBytesWritable/Text works perfectly fine.
>  
> It looks like SeqFileReaderFactory (which I believe implements the PTable 
> under the hood for writables) does indeed reuse keys/values [3] in much the 
> same ways reducers do.  So, I think in this code [4] it needs to clone the 
> keys/values rather than just store them in a map
>  
> Also, I am attaching a test which I wrote to reproduce the issue. 
> [1] 
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L131
>  
> [2] 
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L153
> [3] 
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/seq/SeqFileReaderFactory.java#L88
> [4] 
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L153



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to