[
https://issues.apache.org/jira/browse/CRUNCH-350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908080#comment-13908080
]
Gabriel Reid commented on CRUNCH-350:
-------------------------------------
[~jwills] Any idea how this bug was getting triggered? The BloomFilter in
question is only created in the initialize() method, so the occurrence of this
bug means that initialize() is being called more than once on the same DoFn.
The DoFn in question is created within the BloomFilterJoinStrategy#join method,
so there's no way that it could be reused by some other code as far as I can
see.
The BloomFilter should definitely be transient, but it feels to me like this
could be a sign of another issue that should be looked into (or it just means
that I'm confused about something). I think that there are some other places in
the code where non-serializable fields may not be marked as transient with the
assumption that initialize will only be called once, and I have the feeling
that initialize probably shouldn't be called multiple times on the same DoFn.
> Non-serializable BloomFilter field in BloomFilterJoinStrategy should be
> marked transient
> ----------------------------------------------------------------------------------------
>
> Key: CRUNCH-350
> URL: https://issues.apache.org/jira/browse/CRUNCH-350
> Project: Crunch
> Issue Type: Bug
> Components: MapReduce Patterns
> Affects Versions: 0.9.0, 0.8.2
> Reporter: Josh Wills
> Fix For: 0.10.0, 0.8.3
>
> Attachments: CRUNCH-350.patch
>
>
> Got a notice from the user mailing list that the BloomFilterJoinStrategy was
> throwing a NotSerializableException. I took a look at the code and noticed a
> DoFn field that should be marked as transient.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)