How about attaching 'strace' to the catalogd startup and see where it crashes (if its reproducible on demand) ? May be others have better ideas.
On Sat, Jul 29, 2017 at 3:14 PM, Jim Apple <[email protected]> wrote: > To be specific about "no error message": the logs written in the logs > directory near the time of the crash are nearly identical to those of a > process that got much further on a machine with a configuration that I do > not know how to reproduce. The one that ended earlier has output like: > > Creating /test-warehouse HDFS directory (logging to > /home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)... > OK (Took: 0 min 2 sec) > Derived params for create-load-data.sh: > EXPLORATION_STRATEGY=exhaustive > SKIP_METADATA_LOAD=0 > SKIP_SNAPSHOT_LOAD=0 > SNAPSHOT_FILE= > CM_HOST= > REMOTE_LOAD= > Starting Impala cluster (logging to > /home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)... > FAILED (Took: 0 min 11 sec) > '/home/ubuntu/Impala/bin/start-impala-cluster.py > --log_dir=/home/ubuntu/Impala/logs/data_loading -s 3' failed. Tail of log: > Log for command '/home/ubuntu/Impala/bin/start-impala-cluster.py > --log_dir=/home/ubuntu/Impala/logs/data_loading -s 3' > Starting State Store logging to > /home/ubuntu/Impala/logs/data_loading/statestored.INFO > Starting Catalog Service logging to > /home/ubuntu/Impala/logs/data_loading/catalogd.INFO > Error starting cluster: Unable to start catalogd. Check log or file > permissions for more details. > Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48: > LOAD_DATA_ARGS="" > + cleanup > + rm -rf /tmp/tmp.HVkbPNl08R > > > The one that got further in the process (and I think may be dying due to a > spurious out-of-disk failure that I am putting on the back-burner for the > moment) has the following output: > > Creating /test-warehouse HDFS directory (logging to > /home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)... > OK (Took: 0 min 2 sec) > Derived params for create-load-data.sh: > EXPLORATION_STRATEGY=exhaustive > SKIP_METADATA_LOAD=0 > SKIP_SNAPSHOT_LOAD=0 > SNAPSHOT_FILE= > CM_HOST= > REMOTE_LOAD= > Starting Impala cluster (logging to > /home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)... > OK (Took: 0 min 11 sec) > Setting up HDFS environment (logging to > /home/ubuntu/Impala/logs/data_loading/setup-hdfs-env.log)... > OK (Took: 0 min 8 sec) > Loading custom schemas (logging to > /home/ubuntu/Impala/logs/data_loading/load-custom-schemas.log)... > OK (Took: 0 min 35 sec) > Loading functional-query data (logging to > /home/ubuntu/Impala/logs/data_loading/load-functional-query.log)... > OK (Took: 37 min 14 sec) > Loading TPC-H data (logging to > /home/ubuntu/Impala/logs/data_loading/load-tpch.log)... > OK (Took: 14 min 11 sec) > Loading nested data (logging to > /home/ubuntu/Impala/logs/data_loading/load-nested.log)... > OK (Took: 3 min 41 sec) > Loading TPC-DS data (logging to > /home/ubuntu/Impala/logs/data_loading/load-tpcds.log)... > FAILED (Took: 5 min 50 sec) > 'load-data tpcds core' failed. Tail of log: > ss_net_paid_inc_tax, > ss_net_profit, > ss_sold_date_sk > from store_sales_unpartitioned > WHERE ss_sold_date_sk < 2451272 > distribute by ss_sold_date_sk > INFO : Query ID = > ubuntu_20170729150909_583df9cf-e54b-44bf-a104-ef5e690cfa0d > INFO : Total jobs = 1 > INFO : Launching Job 1 out of 1 > INFO : Starting task [Stage-1:MAPRED] in serial mode > INFO : Number of reduce tasks not specified. Estimated from input data > size: 2 > INFO : In order to change the average load for a reducer (in bytes): > INFO : set hive.exec.reducers.bytes.per.reducer=<number> > INFO : In order to limit the maximum number of reducers: > INFO : set hive.exec.reducers.max=<number> > INFO : In order to set a constant number of reducers: > INFO : set mapreduce.job.reduces=<number> > INFO : number of splits:2 > INFO : Submitting tokens for job: job_local1041198115_0826 > INFO : The url to track the job: http://localhost:8080/ > INFO : Job running in-process (local Hadoop) > INFO : 2017-07-29 15:09:25,495 Stage-1 map = 0%, reduce = 0% > INFO : 2017-07-29 15:09:32,498 Stage-1 map = 100%, reduce = 0% > ERROR : Ended Job = job_local1041198115_0826 with errors > ERROR : FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.mr.MapRedTask > INFO : MapReduce Jobs Launched: > INFO : Stage-Stage-1: HDFS Read: 17615502357 HDFS Write: 12907849658 FAIL > INFO : Total MapReduce CPU Time Spent: 0 msec > INFO : Completed executing > command(queryId=ubuntu_20170729150909_583df9cf-e54b- > 44bf-a104-ef5e690cfa0d); > Time taken: 18.314 seconds > Error: Error while processing statement: FAILED: Execution Error, return > code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask > (state=08S01,code=2) > java.sql.SQLException: Error while processing statement: FAILED: Execution > Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask > at > org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:292) > at > org.apache.hive.beeline.Commands.executeInternal(Commands.java:989) > at org.apache.hive.beeline.Commands.execute(Commands.java:1203) > at org.apache.hive.beeline.Commands.sql(Commands.java:1117) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1176) > at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1010) > at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:987) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:914) > at > org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: > 57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > > Closing: 0: jdbc:hive2://localhost:11050/default;auth=none > Error executing file from Hive: load-tpcds-core-hive-generated.sql > Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48: > LOAD_DATA_ARGS="" > + cleanup > + rm -rf /tmp/tmp.Yfeh8QGfi1 > > > > > On Sat, Jul 29, 2017 at 12:47 AM, Jim Apple <[email protected]> wrote: > > > I'm seeing https://issues.apache.org/jira/browse/IMPALA-5700 when trying > > to bootstrap a new development environment on an EC2 machine with Ubuntu > > 14.04, 250GB of free disk space and over 60GB of free memory. I've seen > > this with and without the -so flag. > > > > I'm running the below script, which I thought was the canonical way to > > bootstrap a development environment. When catalog doesn't start, I don't > > see anything amiss in any of the logs. I was thinking that maybe a port > is > > closed that should be open? I only have port 22 open in my ec2 config. > > > > Has anyone else fixed a problem like this before? > > > > #!/bin/bash -eux > > > > IMPALA_REPO_URL=https://git-wip-us.apache.org/repos/asf/ > > incubator-impala.git > > IMPALA_REPO_BRANCH=master > > > > sudo apt-get install --yes git > > > > sudo apt-get install --yes openjdk-7-jdk > > > > # JAVA_HOME needed by chef scripts > > export JAVA_HOME="/usr/lib/jvm/$(ls -tr /usr/lib/jvm/ | tail -1)" > > $JAVA_HOME/bin/javac -version > > > > # TODO: check that df . is large enough. > > df -h . > > > > IMPALA_LOCATION=Impala > > > > cd "/home/$(whoami)" > > > > git clone "${IMPALA_REPO_URL}" "${IMPALA_LOCATION}" > > cd "${IMPALA_LOCATION}" > > git checkout "${IMPALA_REPO_BRANCH}" > > GIT_LOG_FILE=$(mktemp) > > git log --pretty=oneline >"${GIT_LOG_FILE}" > > head "${GIT_LOG_FILE}" > > > > ./bin/bootstrap_development.sh > > >
