[ 
https://issues.apache.org/jira/browse/CRUNCH-373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957274#comment-13957274
 ] 

Micah Whitacre commented on CRUNCH-373:
---------------------------------------

The patch passes the test for me.  I tried tweaking it to use the detached 
values if the PType was an instance of PTableType.  I ended up hitting an 
exception saying the ptype wasn't fully initialized when creating the detached 
value.

> Problem while Performing MapSide join with ImmutableBytesWritable/Text
> ----------------------------------------------------------------------
>
>                 Key: CRUNCH-373
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-373
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.9.0, 0.8.2
>            Reporter: Rachit Soni
>            Assignee: Josh Wills
>         Attachments: CRUNCH-371_test.patch, CRUNCH-373b.patch, 
> CrunchHBaseIT.java
>
>
> I have been having issues performing MapSide Join with ImmutableBytesWritable 
> as the join key and it always have only 1 value in the map created in the 
> initialize method of MapSideJoinDoFn[1]. With the same set of data if I 
> perform reduce side join it works perfectly fine giving me the correct result.
> Additionally, I am making sure the map can be loaded in memory.
> The result in both the above cases are different.  When I dug up the code 
> where Map side join is being performed in MapSideDoFn [1] when the right side 
> is taken in memory and converted to map [2] all the keys get over written 
> with the last key that is being updated on the map. Seems like there it is 
> referencing the same memory location each and every time and is not cloning 
> it properly. This only happens when I use ImmutableBytesWritable/Text, 
> anything except
> ImmutableBytesWritable/Text works perfectly fine.
>  
> It looks like SeqFileReaderFactory (which I believe implements the PTable 
> under the hood for writables) does indeed reuse keys/values [3] in much the 
> same ways reducers do.  So, I think in this code [4] it needs to clone the 
> keys/values rather than just store them in a map
>  
> Also, I am attaching a test which I wrote to reproduce the issue. 
> [1] 
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L131
>  
> [2] 
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L153
> [3] 
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/seq/SeqFileReaderFactory.java#L88
> [4] 
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L153



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to