[
https://issues.apache.org/jira/browse/CRUNCH-373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957274#comment-13957274
]
Micah Whitacre commented on CRUNCH-373:
---------------------------------------
The patch passes the test for me. I tried tweaking it to use the detached
values if the PType was an instance of PTableType. I ended up hitting an
exception saying the ptype wasn't fully initialized when creating the detached
value.
> Problem while Performing MapSide join with ImmutableBytesWritable/Text
> ----------------------------------------------------------------------
>
> Key: CRUNCH-373
> URL: https://issues.apache.org/jira/browse/CRUNCH-373
> Project: Crunch
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.9.0, 0.8.2
> Reporter: Rachit Soni
> Assignee: Josh Wills
> Attachments: CRUNCH-371_test.patch, CRUNCH-373b.patch,
> CrunchHBaseIT.java
>
>
> I have been having issues performing MapSide Join with ImmutableBytesWritable
> as the join key and it always have only 1 value in the map created in the
> initialize method of MapSideJoinDoFn[1]. With the same set of data if I
> perform reduce side join it works perfectly fine giving me the correct result.
> Additionally, I am making sure the map can be loaded in memory.
> The result in both the above cases are different. When I dug up the code
> where Map side join is being performed in MapSideDoFn [1] when the right side
> is taken in memory and converted to map [2] all the keys get over written
> with the last key that is being updated on the map. Seems like there it is
> referencing the same memory location each and every time and is not cloning
> it properly. This only happens when I use ImmutableBytesWritable/Text,
> anything except
> ImmutableBytesWritable/Text works perfectly fine.
>
> It looks like SeqFileReaderFactory (which I believe implements the PTable
> under the hood for writables) does indeed reuse keys/values [3] in much the
> same ways reducers do. So, I think in this code [4] it needs to clone the
> keys/values rather than just store them in a map
>
> Also, I am attaching a test which I wrote to reproduce the issue.
> [1]
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L131
>
> [2]
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L153
> [3]
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/seq/SeqFileReaderFactory.java#L88
> [4]
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L153
--
This message was sent by Atlassian JIRA
(v6.2#6252)