[
https://issues.apache.org/jira/browse/PIG-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-3789:
----------------------------
Attachment: PIG-3789-2.patch
The problem is in POValueInputTez. When we read tuple from edge, tuples are
produced by BinSedesTuple.readFields. It reuses the tuple and mFields will be
cleared and rebuild for every tuple. When running streaming operation
asynchronously, tuple saved to binaryInputQueue keeps changing. Checked all
other TezLoad, seems fine. POShuffleTezLoad already made a copy
(Packager.getValueTuple), POSimpleTezLoad relies on loader to create new tuple.
Other TezLoad will not send input tuple to binaryInputQueue.
Attach patch.
> tuple in POStream binaryInputQueue keep changing
> ------------------------------------------------
>
> Key: PIG-3789
> URL: https://issues.apache.org/jira/browse/PIG-3789
> Project: Pig
> Issue Type: Sub-task
> Components: tez
> Affects Versions: tez-branch
> Reporter: Daniel Dai
> Assignee: Daniel Dai
> Fix For: tez-branch
>
> Attachments: PIG-3789-1.patch, PIG-3789-2.patch
>
>
> Similar to the comments in POSimpleTezLoad:
> {code}
> /**
> * Previously, we reused the same Result object for all results, but we
> found
> * certain operators (e.g. POStream) save references to the Result object
> and
> * expect it to be constant.
> */
> {code}
> Tuples put into binaryInputQueue get changed when it is actually processed.
> Not exactly sure why, but make a copy of the tuple solves the issue.
--
This message was sent by Atlassian JIRA
(v6.2#6252)