[
https://issues.apache.org/jira/browse/PIG-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204670#comment-13204670
]
Thomas Weise commented on PIG-2508:
-----------------------------------
All unit tests pass with .4 patch.
@Daniel: Key is to set properties through an instance of JobConf, as this is
the only way to apply deprecation logic. The interface contracts of
java.util.Properties and JobConf variant of Configuration are not compatible.
What we put into JobConf is not necessarily what we get back for a given key. A
cleaner solution could be to change all relate code to work with Configuration
instead of Properties and use a JobConf instance as prototype.
> PIG can unpredictably ignore deprecated Hadoop config options
> -------------------------------------------------------------
>
> Key: PIG-2508
> URL: https://issues.apache.org/jira/browse/PIG-2508
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.9.2, 0.10
> Reporter: Anupam Seth
> Assignee: Thomas Weise
> Priority: Blocker
> Fix For: 0.10, 0.9.3
>
> Attachments: PIG-2508.3.patch, PIG-2508.4.patch, PIG-2508.patch
>
>
> When deprecated config options are passed to a Pig job, it can unpredictably
> ignore them and override them with values provided in the defaults due to a
> "race condition"-like issue.
> This problem was first noticed as part of MAPREDUCE-3665, which was re-filed
> as HADOOP-7993 so as for it to fall in the right component bucket of the code
> being fixed. This JIRA fixed the bug on the Hadoop side of the code that
> caused older deprecated config options to be ignored when they were also
> specified in the defaults xml file with the newer config name or vice versa.
> However, the problem seemed to persist with Pig jobs and HADOOP-8021 was
> filed to address the issue.
> A careful step-by-step execution of the code in a debugger reveals an second
> overlapping bug because of the way PIG is dealing with the configs.
> Not sure how / why this was not seen earlier, but the code in
> HExecutionEngine.java#recomputeProperties currently mashes together the
> default Hadoop configs and the user-specified properties into a Properties
> object. Given that it uses a HashTable to store the properties, if we have a
> config called "old.config.name" which is now deprecated and replaced by
> "new.config.name" and if one type is specified in the defaults and another by
> the user, we get a strange condition in which the repopulated Properties
> object has [in an unpredictable ordering] the following:
> {code}
> config1.name=config1.value
> config2.name=config2.value
> ...
> old.config.name=old.config.value
> ...
> new.config.name=new.config.value
> ...
> configx.name=configx.value
> {code}
> When this Properties object gets converted into a Configuration object by the
> ConfigurationUtil#toConfiguration() routine, the deprecation kicks in and
> tries to resolve all old configs. Because the ordering is not guaranteed (and
> because in the case of compress, the hash function consistently gives the new
> config loaded from the defaults after the old one), the user-specified config
> is ignored in favor of the default config (which from the point of view of
> the Hadoop Configuration object is expected standard behavior to replace an
> earlier specification of a config value with a later one).
> The fix for this is probably straightforward, but will require a re-write of
> the a chunk of code in HExecutionEngine.java. Instead of mashing together a
> JobConf object and a Properties object into a Configuration object that is
> finally re-converted into a JobConf object, the code simply needs to
> consistently and correctly populate a JobConf / Configuration object that can
> handle deprecation instead of a "dumb" Java Properties object.
> We recently saw another potential occurrence of this bug where Pig seems to
> honor only mapreduce.job.queuename parameter for specifying queue name and
> ignores the parameter mapred.job.queue.name.
> Since this can break a lot of existing jobs that run fine on 0.20, marking
> this as a blocker.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira