[ https://issues.apache.org/jira/browse/PIG-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202924#comment-13202924 ]
Anupam Seth commented on PIG-2508: ---------------------------------- Tested Thomas' new patch on a 10-node cluster and ee the following: On 0.23: ======== In local mode while setting configuration with deprecated name in script: ------------------------------------------------------------------------- Fails with Kerberos exception as follows {code} 2012-02-07 22:00:08,696 [main] INFO org.apache.pig.Main - Logging error messages to: /homes/<user>/pig_1328652008690.log 2012-02-07 22:00:09,010 [main] WARN org.apache.hadoop.conf.Configuration - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used 2012-02-07 22:00:09,011 [main] WARN org.apache.hadoop.conf.Configuration - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2012-02-07 22:00:09,011 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS 2012-02-07 22:00:09,011 [main] WARN org.apache.hadoop.conf.Configuration - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2012-02-07 22:00:09,011 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS 2012-02-07 22:00:09,011 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// 2012-02-07 22:00:09,404 [main] WARN org.apache.hadoop.conf.Configuration - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress 2012-02-07 22:00:09,405 [main] WARN org.apache.hadoop.conf.Configuration - mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec 2012-02-07 22:00:10,386 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2012-02-07 22:00:10,540 [main] WARN org.apache.hadoop.conf.Configuration - mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator 2012-02-07 22:00:10,553 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 6000: <file script2-hadoop.pig, line 7, column 0> Output Location Validation Failed for: 'file:///homes/ghimport/script2-hadoop-results More info to follow: Can't get Master Kerberos principal for use as renewer Details at logfile: /homes/<user>/pig_1328652008690.log {code} Contents of pig log file {code} Pig Stack Trace --------------- ERROR 6000: <file script2-hadoop.pig, line 7, column 0> Output Location Validation Failed for: 'file:///homes/<user>/script2-hadoop-results More info to follow: Can't get Master Kerberos principal for use as renewer org.apache.pig.impl.plan.VisitorException: ERROR 6000: <file script2-hadoop.pig, line 7, column 0> Output Location Validation Failed for: 'file:///homes/<user>/script2-hadoop-results More info to follow: Can't get Master Kerberos principal for use as renewer at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:95) at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66) at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66) at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:293) at org.apache.pig.PigServer.compilePp(PigServer.java:1360) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1297) at org.apache.pig.PigServer.execute(PigServer.java:1289) at org.apache.pig.PigServer.executeBatch(PigServer.java:360) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:130) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:191) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:561) at org.apache.pig.Main.main(Main.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:200) Caused by: java.io.IOException: Can't get Master Kerberos principal for use as renewer at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:104) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:87) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137) at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:80) ... 23 more ================================================================================ {code} In cluster mode while setting configuration with deprecated name in script: --------------------------------------------------------------------------- Passes In local mode while setting configuration with new name in script: ------------------------------------------------------------------ Same issue as with local mode above In cluster mode while setting configuration with new name in script: -------------------------------------------------------------------- Fails as below {code} 2012-02-07 22:08:27,164 [main] INFO org.apache.pig.Main - Logging error messages to: /homes/<user>/pig_1328652507159.log 2012-02-07 22:08:27,663 [main] WARN org.apache.hadoop.conf.Configuration - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used 2012-02-07 22:08:27,665 [main] WARN org.apache.hadoop.conf.Configuration - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2012-02-07 22:08:27,665 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS 2012-02-07 22:08:27,665 [main] WARN org.apache.hadoop.conf.Configuration - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2012-02-07 22:08:27,665 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS 2012-02-07 22:08:27,665 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://<host> 2012-02-07 22:08:32,052 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2012-02-07 22:08:32,349 [main] WARN org.apache.hadoop.conf.Configuration - mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator 2012-02-07 22:08:32,360 [main] INFO org.apache.hadoop.hdfs.DFSClient - Created HDFS_DELEGATION_TOKEN token 28 for <user> on <host> 2012-02-07 22:08:32,360 [main] INFO org.apache.hadoop.mapreduce.security.TokenCache - Got dt for hdfs://<host> 2012-02-07 22:08:32,787 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2012-02-07 22:08:32,964 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2012-02-07 22:08:32,964 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2012-02-07 22:08:33,919 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2012-02-07 22:08:33,963 [main] WARN org.apache.hadoop.conf.Configuration - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent 2012-02-07 22:08:33,963 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2012-02-07 22:08:33,963 [main] WARN org.apache.hadoop.conf.Configuration - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent 2012-02-07 22:08:33,963 [main] WARN org.apache.hadoop.conf.Configuration - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress 2012-02-07 22:08:33,964 [main] WARN org.apache.hadoop.conf.Configuration - mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec 2012-02-07 22:08:33,974 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 0: 'mapred.output.compress' is set but no value is specified for 'mapred.output.compression.codec'. Details at logfile: /homes/<user>/pig_1328652507159.log {code} Contents of log file: {code} Pig Stack Trace --------------- ERROR 0: 'mapred.output.compress' is set but no value is specified for 'mapred.output.compression.codec'. org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 0: 'mapred.output.compress' is set but no value is specified for 'mapred.output.compression.codec'. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:365) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:258) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:150) at org.apache.pig.PigServer.launchPlan(PigServer.java:1314) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1299) at org.apache.pig.PigServer.execute(PigServer.java:1289) at org.apache.pig.PigServer.executeBatch(PigServer.java:360) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:130) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:191) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:561) at org.apache.pig.Main.main(Main.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:200) ================================================================================ {code} In local mode while setting configuration with deprecated name from cmd line: ----------------------------------------------------------------------------- Same issue as with local mode above In cluster mode while setting configuration with deprecated name from cmd line: ------------------------------------------------------------------------------- Passes In local mode while setting configuration with new name from cmd line: ---------------------------------------------------------------------- Same issue as with local mode above In cluster mode while setting configuration with new name from cmd line: ------------------------------------------------------------------------ Passes On 0.20.2xx: ============ Cannot get it to work at all (tried removing my ivy2 directory, doing ant clean, and then re-compiling the tarball for 0.20 - still, it smells like I have 0.23 libs being referenced somewhere!) {code} 2012-02-07 23:08:46,237 [main] INFO org.apache.pig.Main - Logging error messages to: /homes/ghimport/pig_1328656126229.log 2012-02-07 23:08:46,716 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// 2012-02-07 23:08:47,140 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl Details at logfile: /homes/<user>/pig_1328656126229.log Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class org.apache.pig.tools.pigstats.PigStatsUtil at org.apache.pig.Main.run(Main.java:593) at org.apache.pig.Main.main(Main.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {code} Contents of pig log file: {code}Pig Stack Trace --------------- ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/task/JobContextImpl at org.apache.pig.tools.pigstats.PigStatsUtil.<clinit>(PigStatsUtil.java:54) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82) at org.apache.pig.Main.run(Main.java:561) at org.apache.pig.Main.main(Main.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.task.JobContextImpl at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 9 more ================================================================================ {code} > PIG can unpredictably ignore deprecated Hadoop config options > ------------------------------------------------------------- > > Key: PIG-2508 > URL: https://issues.apache.org/jira/browse/PIG-2508 > Project: Pig > Issue Type: Bug > Affects Versions: 0.9.2, 0.10 > Reporter: Anupam Seth > Assignee: Thomas Weise > Priority: Blocker > Fix For: 0.10, 0.9.3 > > Attachments: PIG-2508.3.patch, PIG-2508.patch > > > When deprecated config options are passed to a Pig job, it can unpredictably > ignore them and override them with values provided in the defaults due to a > "race condition"-like issue. > This problem was first noticed as part of MAPREDUCE-3665, which was re-filed > as HADOOP-7993 so as for it to fall in the right component bucket of the code > being fixed. This JIRA fixed the bug on the Hadoop side of the code that > caused older deprecated config options to be ignored when they were also > specified in the defaults xml file with the newer config name or vice versa. > However, the problem seemed to persist with Pig jobs and HADOOP-8021 was > filed to address the issue. > A careful step-by-step execution of the code in a debugger reveals an second > overlapping bug because of the way PIG is dealing with the configs. > Not sure how / why this was not seen earlier, but the code in > HExecutionEngine.java#recomputeProperties currently mashes together the > default Hadoop configs and the user-specified properties into a Properties > object. Given that it uses a HashTable to store the properties, if we have a > config called "old.config.name" which is now deprecated and replaced by > "new.config.name" and if one type is specified in the defaults and another by > the user, we get a strange condition in which the repopulated Properties > object has [in an unpredictable ordering] the following: > {code} > config1.name=config1.value > config2.name=config2.value > ... > old.config.name=old.config.value > ... > new.config.name=new.config.value > ... > configx.name=configx.value > {code} > When this Properties object gets converted into a Configuration object by the > ConfigurationUtil#toConfiguration() routine, the deprecation kicks in and > tries to resolve all old configs. Because the ordering is not guaranteed (and > because in the case of compress, the hash function consistently gives the new > config loaded from the defaults after the old one), the user-specified config > is ignored in favor of the default config (which from the point of view of > the Hadoop Configuration object is expected standard behavior to replace an > earlier specification of a config value with a later one). > The fix for this is probably straightforward, but will require a re-write of > the a chunk of code in HExecutionEngine.java. Instead of mashing together a > JobConf object and a Properties object into a Configuration object that is > finally re-converted into a JobConf object, the code simply needs to > consistently and correctly populate a JobConf / Configuration object that can > handle deprecation instead of a "dumb" Java Properties object. > We recently saw another potential occurrence of this bug where Pig seems to > honor only mapreduce.job.queuename parameter for specifying queue name and > ignores the parameter mapred.job.queue.name. > Since this can break a lot of existing jobs that run fine on 0.20, marking > this as a blocker. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira