[
https://issues.apache.org/jira/browse/CRUNCH-373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriel Reid resolved CRUNCH-373.
---------------------------------
Resolution: Fixed
Fix Version/s: 0.8.3
0.10.0
Assignee: Gabriel Reid (was: Josh Wills)
Squashed into a single commit and pushed to master and 0.8 branch.
> Problem while Performing MapSide join with ImmutableBytesWritable/Text
> ----------------------------------------------------------------------
>
> Key: CRUNCH-373
> URL: https://issues.apache.org/jira/browse/CRUNCH-373
> Project: Crunch
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.9.0, 0.8.2
> Reporter: Rachit Soni
> Assignee: Gabriel Reid
> Fix For: 0.10.0, 0.8.3
>
> Attachments: CRUNCH-371_test.patch, CRUNCH-373b.patch,
> CRUNCH-373c.patch, CrunchHBaseIT.java
>
>
> I have been having issues performing MapSide Join with ImmutableBytesWritable
> as the join key and it always have only 1 value in the map created in the
> initialize method of MapSideJoinDoFn[1]. With the same set of data if I
> perform reduce side join it works perfectly fine giving me the correct result.
> Additionally, I am making sure the map can be loaded in memory.
> The result in both the above cases are different. When I dug up the code
> where Map side join is being performed in MapSideDoFn [1] when the right side
> is taken in memory and converted to map [2] all the keys get over written
> with the last key that is being updated on the map. Seems like there it is
> referencing the same memory location each and every time and is not cloning
> it properly. This only happens when I use ImmutableBytesWritable/Text,
> anything except
> ImmutableBytesWritable/Text works perfectly fine.
>
> It looks like SeqFileReaderFactory (which I believe implements the PTable
> under the hood for writables) does indeed reuse keys/values [3] in much the
> same ways reducers do. So, I think in this code [4] it needs to clone the
> keys/values rather than just store them in a map
>
> Also, I am attaching a test which I wrote to reproduce the issue.
> [1]
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L131
>
> [2]
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L153
> [3]
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/seq/SeqFileReaderFactory.java#L88
> [4]
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L153
--
This message was sent by Atlassian JIRA
(v6.2#6252)