[
https://issues.apache.org/jira/browse/PIG-4691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy updated PIG-4691:
------------------------------------
Resolution: Fixed
Hadoop Flags: Reviewed
Release Note:
Union optimization (pig.tez.opt.union=true) in tez uses vertex groups to store
output from different vertices into one final output location. If a StoreFunc's
OutputCommitter does not honor mapreduce.output.basename or has other issues
with multiple vertices writing to the destination location at the same time,
then you can disable union optimization just for that StoreFunc. Refer
PIG-4649. You can also specify a whitelist of StoreFuncs that are known to work
with multiple vertices writing to same location instead of a blacklist.
#pig.tez.opt.union.unsupported.storefuncs=org.apache.hcatalog.pig.HCatStorer,org.apache.hive.hcatalog.pig.HCatStorer
#pig.tez.opt.union.supported.storefuncs=
Status: Resolved (was: Patch Available)
Committed to trunk. Thanks for the review Daniel.
> [Pig on Tez] Support for whitelisting storefuncs for union optimization
> -----------------------------------------------------------------------
>
> Key: PIG-4691
> URL: https://issues.apache.org/jira/browse/PIG-4691
> Project: Pig
> Issue Type: Bug
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4691-1.patch
>
>
> PIG-4649 added support for blacklisting some storefuncs when applying
> union+store vertex group optimization as HCatStorer was not honoring
> mapreduce.output.basename and hardcoding part file names. Found that some of
> our user StoreFuncs also do that and ended up with partial results. So would
> be good to have a whitelist option as well where you can put StoreFuncs that
> do not mess with mapreduce.output.basename.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)