[ 
https://issues.apache.org/jira/browse/CRUNCH-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471333#comment-13471333
 ] 

Gabriel Reid commented on CRUNCH-90:
------------------------------------

I've just discovered that this patch breaks the PageRankClassTest in scrunch. 
Unfortunately, I only ran the crunch integration tests (instead of the full 
suite) before committing it. The exception being thrown is as follows:

java.lang.IllegalArgumentException: Can not set final [Ljava.lang.String; field 
org.apache.crunch.scrunch.PageRankData.urls to 
org.apache.avro.generic.GenericData$Array

It appears that the deep copying is running into an issue with reading in a 
PageRankData object using reflection-based serialization. I'm not sure why this 
is only coming out now, as I would have thought that existing serialization 
logic would have caused an issue with it without even using deep copying. This 
appears to be a bug in Avro (similar to AVRO-1046).

[~jwills] I'm totally clueless when it comes to Scala -- any chance you could 
take a look and possibly try changing the type of the urls field to something 
that Avro can deal with, or give me any pointers on what you think might be 
going on here? I definitely want to report this to the Avro JIRA as well if it 
is indeed an Avro bug, but it would be good to work around it for now.
                
> Object reuse is not accounted for in mapper fusion
> --------------------------------------------------
>
>                 Key: CRUNCH-90
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-90
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Gabriel Reid
>            Assignee: Gabriel Reid
>             Fix For: 0.4.0
>
>         Attachments: CRUNCH-90.patch
>
>
> When multiple DoFns are run over the same output (i.e. in the case of mapper 
> fusion), the same value object is passed to multiple underlying DoFns. If the 
> state of that value object is changed by one DoFn, other DoFns are called 
> with the updated object.
> This is a situation that can happen quite easily when the input of a DoFn is 
> simply updated and then emitted. In general, this bug will only affect values 
> whose type is the same as the underlying serialization type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to