[
https://issues.apache.org/jira/browse/CRUNCH-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriel Reid updated CRUNCH-71:
-------------------------------
Attachment: CRUNCH-71.patch
Patch to resolve the issue attached.
I'm not 100% happy with this solution, as I would prefer that the PType would
be supplied to the DoFn at runtime instead of the DoFn being responsible for
calling PType#initialize.
However, that approach could bring a lot of extra work along with it in job
setup as there is not a 1-to-1 relationship between DoFns and PTypes.
As object reuse issues are pretty isolated in MR contexts (joins are the main
place where I see them occurring) then this fix feels ok to me for now. Any
objections to this patch?
> PType mapping functions are not initialized before being used for deep copying
> ------------------------------------------------------------------------------
>
> Key: CRUNCH-71
> URL: https://issues.apache.org/jira/browse/CRUNCH-71
> Project: Crunch
> Issue Type: Bug
> Affects Versions: 0.3.0
> Reporter: Gabriel Reid
> Assignee: Gabriel Reid
> Attachments: CRUNCH-71.patch
>
>
> The PType#getDetachedValue method performs a deep copy (if needed) in order
> to allow DoFns to hold on to values that have been passed through them (for
> example, in join functions).
> The WritablePType class uses the built-in input and output MapFns in the
> PType to handle this deep copying, but the input and output MapFns don't get
> initialized (i.e. initialize isn't called on them) after they are
> deserialized along with the DoFn that is using them. In some rare cases (at
> least for tuples), this can result in NullPointerExceptions or other
> nastiness.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira