[
https://issues.apache.org/jira/browse/IMPALA-7170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562298#comment-16562298
]
ASF subversion and git services commented on IMPALA-7170:
---------------------------------------------------------
Commit 10a67509f283e1434aaed0f5d5e03937d3b76aa9 in impala's branch
refs/heads/master from [~twmarshall]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=10a6750 ]
IMPALA-7170: Update data_generator.py for Hadoop 3
After the move to Hadoop 3, data_generator.py was broken. The issue
seems to be that we rely on additional jars not in the classpath. The
solution is to pass the location of these jars into the 'hadoop'
command using the '-libjars' parameter.
This patch also updates tests/comparison/README to add instructions
for dealing with Yarn, since during the move to Hadoop 3 we switched
to no longer running Yarn as part of the minicluster by default.
Change-Id: I47b7d663174dbd38a5d9c98f1a88f0ebab726d5a
Reviewed-on: http://gerrit.cloudera.org:8080/11041
Reviewed-by: Thomas Marshall <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> "tests/comparison/data_generator.py populate" is broken
> -------------------------------------------------------
>
> Key: IMPALA-7170
> URL: https://issues.apache.org/jira/browse/IMPALA-7170
> Project: IMPALA
> Issue Type: Bug
> Components: Infrastructure
> Affects Versions: Impala 3.0
> Reporter: Tianyi Wang
> Assignee: Thomas Tauber-Marshall
> Priority: Major
> Fix For: Impala 3.1.0
>
>
> test/comparison in Impala 3.x is broken, presumably by the switch to Hadoop 3.
> Firstly, to run the tests in impala 3.x, the mini-cluster needs to be started
> with YARN, which is not documented anywhere.
> Then, data_generator.py will exit with the following error:
> {noformat}
> 2018-04-23 23:15:46,065 INFO:db_connection[752]:Dropping database randomness
> 2018-04-23 23:15:46,095 INFO:db_connection[234]:Creating database randomness
> 2018-04-23 23:15:52,390 INFO:data_generator[235]:Starting MR job to generate
> data for randomness
> Traceback (most recent call last):
> File "tests/comparison/data_generator.py", line 339, in <module>
> populator.populate_db(args.table_count, postgresql_conn=postgresql_conn)
> File "tests/comparison/data_generator.py", line 134, in populate_db
> self._run_data_generator_mr_job([g for _, g in table_and_generators],
> self.db_name)
> File "tests/comparison/data_generator.py", line 244, in
> _run_data_generator_mr_job
> % (reducer_count, ','.join(files), mapper_input_file, hdfs_output_dir))
> File "/home/impdev/projects/impala/tests/comparison/cluster.py", line 476,
> in run_mr_job
> stderr=subprocess.STDOUT, env=env)
> File "/home/impdev/projects/impala/tests/util/shell_util.py", line 113, in
> shell
> "\ncmd: %s\nstdout: %s\nstderr: %s") % (retcode, cmd, output, err))
> Exception: Command returned non-zero exit code: 1
> cmd: set -euo pipefail
> hadoop jar
> /home/impdev/projects/impala/toolchain/cdh_components/hadoop-3.0.0-cdh6.x-SNAPSHOT/share/hadoop/tools/lib/hadoop-streaming-3.0.0-cdh6.x-SNAPSHOT.jar
> -D mapred.reduce.tasks=36 \
> -D stream.num.map.output.key.fields=2 \
> -files
> tests/comparison/common.py,tests/comparison/db_types.py,tests/comparison/data_generator_mapred_common.py,tests/comparison/data_generator_mapper.py,tests/comparison/data_generator_reducer.py,tests/comparison/random_val_generator.py
> \
> -input /tmp/data_gen_randomness_mr_input_1524525348 \
> -output /tmp/data_gen_randomness_mr_output_1524525348 \
> -mapper data_generator_mapper.py \
> -reducer data_generator_reducer.py
> stdout: packageJobJar: []
> [/home/impdev/projects/impala/toolchain/cdh_components/hadoop-3.0.0-cdh6.x-SNAPSHOT/share/hadoop/tools/lib/hadoop-streaming-3.0.0-cdh6.x-SNAPSHOT.jar]
> /tmp/streamjob2990195923122538287.jar tmpDir=null
> 18/04/23 23:15:53 INFO client.RMProxy: Connecting to ResourceManager at
> /0.0.0.0:8032
> 18/04/23 23:15:53 INFO client.RMProxy: Connecting to ResourceManager at
> /0.0.0.0:8032
> 18/04/23 23:15:54 INFO mapreduce.JobResourceUploader: Disabling Erasure
> Coding for path:
> /tmp/hadoop-yarn/staging/impdev/.staging/job_1524519161700_0002
> 18/04/23 23:15:54 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
> 18/04/23 23:15:54 INFO lzo.LzoCodec: Successfully loaded & initialized
> native-lzo library [hadoop-lzo rev 2b3bd7731ff3ef5d8585a004b90696630e5cea96]
> 18/04/23 23:15:54 INFO mapred.FileInputFormat: Total input files to process :
> 1
> 18/04/23 23:15:54 INFO mapreduce.JobSubmitter: number of splits:2
> 18/04/23 23:15:54 INFO Configuration.deprecation: mapred.reduce.tasks is
> deprecated. Instead, use mapreduce.job.reduces
> 18/04/23 23:15:54 INFO Configuration.deprecation:
> yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead,
> use yarn.system-metrics-publisher.enabled
> 18/04/23 23:15:54 INFO mapreduce.JobSubmitter: Submitting tokens for job:
> job_1524519161700_0002
> 18/04/23 23:15:54 INFO mapreduce.JobSubmitter: Executing with tokens: []
> 18/04/23 23:15:54 INFO conf.Configuration: resource-types.xml not found
> 18/04/23 23:15:54 INFO resource.ResourceUtils: Unable to find
> 'resource-types.xml'.
> 18/04/23 23:15:54 INFO impl.YarnClientImpl: Submitted application
> application_1524519161700_0002
> 18/04/23 23:15:54 INFO mapreduce.Job: The url to track the job:
> http://c37e0835e988:8088/proxy/application_1524519161700_0002/
> 18/04/23 23:15:54 INFO mapreduce.Job: Running job: job_1524519161700_0002
> 18/04/23 23:16:00 INFO mapreduce.Job: Job job_1524519161700_0002 running in
> uber mode : false
> 18/04/23 23:16:00 INFO mapreduce.Job: map 0% reduce 0%
> 18/04/23 23:16:06 INFO mapreduce.Job: Job job_1524519161700_0002 failed with
> state FAILED due to: Application application_1524519161700_0002 failed 2
> times due to AM Container for appattempt_1524519161700_0002_000002 exited
> with exitCode: 255
> Failing this attempt.Diagnostics: [2018-04-23 23:16:06.473]Exception from
> container-launch.
> Container id: container_1524519161700_0002_02_000001
> Exit code: 255
> [2018-04-23 23:16:06.475]Container exited with a non-zero exit code 255.
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> Last 4096 bytes of stderr :
> Apr 23, 2018 11:16:03 PM
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
> INFO: Registering
> org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider
> class
> Apr 23, 2018 11:16:03 PM
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
> INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a
> provider class
> Apr 23, 2018 11:16:03 PM
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
> INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as
> a root resource class
> Apr 23, 2018 11:16:03 PM
> com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
> INFO: Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:25
> AM'
> Apr 23, 2018 11:16:03 PM
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
> getComponentProvider
> INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver
> to GuiceManagedComponentProvider with the scope "Singleton"
> Apr 23, 2018 11:16:03 PM
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
> getComponentProvider
> INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to
> GuiceManagedComponentProvider with the scope "Singleton"
> Apr 23, 2018 11:16:03 PM
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
> getComponentProvider
> INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to
> GuiceManagedComponentProvider with the scope "PerRequest"
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more
> info.
> [2018-04-23 23:16:06.476]Container exited with a non-zero exit code 255.
> Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> Last 4096 bytes of stderr :
> Apr 23, 2018 11:16:03 PM
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
> INFO: Registering
> org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider
> class
> Apr 23, 2018 11:16:03 PM
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
> INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a
> provider class
> Apr 23, 2018 11:16:03 PM
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
> INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as
> a root resource class
> Apr 23, 2018 11:16:03 PM
> com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
> INFO: Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:25
> AM'
> Apr 23, 2018 11:16:03 PM
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
> getComponentProvider
> INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver
> to GuiceManagedComponentProvider with the scope "Singleton"
> Apr 23, 2018 11:16:03 PM
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
> getComponentProvider
> INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to
> GuiceManagedComponentProvider with the scope "Singleton"
> Apr 23, 2018 11:16:03 PM
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
> getComponentProvider
> INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to
> GuiceManagedComponentProvider with the scope "PerRequest"
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more
> info.
> For more detailed output, check the application tracking page:
> http://localhost:8088/cluster/app/application_1524519161700_0002 Then click
> on links to logs of each attempt.
> . Failing the application.
> 18/04/23 23:16:06 INFO mapreduce.Job: Counters: 0
> 18/04/23 23:16:06 ERROR streaming.StreamJob: Job not successful!
> Streaming Command Failed!
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]