[ 
https://issues.apache.org/jira/browse/CRUNCH-350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908080#comment-13908080
 ] 

Gabriel Reid commented on CRUNCH-350:
-------------------------------------

[~jwills] Any idea how this bug was getting triggered? The BloomFilter in 
question is only created in the initialize() method, so the occurrence of this 
bug means that initialize() is being called more than once on the same DoFn. 
The DoFn in question is created within the BloomFilterJoinStrategy#join method, 
so there's no way that it could be reused by some other code as far as I can 
see.

The BloomFilter should definitely be transient, but it feels to me like this 
could be a sign of another issue that should be looked into (or it just means 
that I'm confused about something). I think that there are some other places in 
the code where non-serializable fields may not be marked as transient with the 
assumption that initialize will only be called once, and I have the feeling 
that initialize probably shouldn't be called multiple times on the same DoFn.

> Non-serializable BloomFilter field in BloomFilterJoinStrategy should be 
> marked transient
> ----------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-350
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-350
>             Project: Crunch
>          Issue Type: Bug
>          Components: MapReduce Patterns
>    Affects Versions: 0.9.0, 0.8.2
>            Reporter: Josh Wills
>             Fix For: 0.10.0, 0.8.3
>
>         Attachments: CRUNCH-350.patch
>
>
> Got a notice from the user mailing list that the BloomFilterJoinStrategy was 
> throwing a NotSerializableException. I took a look at the code and noticed a 
> DoFn field that should be marked as transient.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to