[ 
https://issues.apache.org/jira/browse/CRUNCH-373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated CRUNCH-373:
--------------------------------

    Attachment: CRUNCH-373c.patch

Yes, it's possible to handle this using PType#getDetachedValue. Here's a patch 
the requires a smaller change (in MapsideJoin itself).

[~mkwhitacre] The problem you ran into with the PType not being properly 
initialized just means you need to call PType#initialize in the DoFn#initialize 
of the DoFn where it's being used. There is a strict check on the PTypes being 
initialized before being used everywhere, although it's usually not really 
necessary (the check is there just to make sure it's not forgotten in the cases 
where it really is necessary).

> Problem while Performing MapSide join with ImmutableBytesWritable/Text
> ----------------------------------------------------------------------
>
>                 Key: CRUNCH-373
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-373
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.9.0, 0.8.2
>            Reporter: Rachit Soni
>            Assignee: Josh Wills
>         Attachments: CRUNCH-371_test.patch, CRUNCH-373b.patch, 
> CRUNCH-373c.patch, CrunchHBaseIT.java
>
>
> I have been having issues performing MapSide Join with ImmutableBytesWritable 
> as the join key and it always have only 1 value in the map created in the 
> initialize method of MapSideJoinDoFn[1]. With the same set of data if I 
> perform reduce side join it works perfectly fine giving me the correct result.
> Additionally, I am making sure the map can be loaded in memory.
> The result in both the above cases are different.  When I dug up the code 
> where Map side join is being performed in MapSideDoFn [1] when the right side 
> is taken in memory and converted to map [2] all the keys get over written 
> with the last key that is being updated on the map. Seems like there it is 
> referencing the same memory location each and every time and is not cloning 
> it properly. This only happens when I use ImmutableBytesWritable/Text, 
> anything except
> ImmutableBytesWritable/Text works perfectly fine.
>  
> It looks like SeqFileReaderFactory (which I believe implements the PTable 
> under the hood for writables) does indeed reuse keys/values [3] in much the 
> same ways reducers do.  So, I think in this code [4] it needs to clone the 
> keys/values rather than just store them in a map
>  
> Also, I am attaching a test which I wrote to reproduce the issue. 
> [1] 
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L131
>  
> [2] 
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L153
> [3] 
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/seq/SeqFileReaderFactory.java#L88
> [4] 
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/lib/join/MapsideJoinStrategy.java#L153



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to