Hi Tim,
Thanks for you reply! I'll try these scripts later. One more question. Is the latest Impala compatible with components in CDH-5.7.3? For example, Hadoop-2.6.0 and Hive-1.1.0? We use the old version cdh-5.7.3-release just due to the concern of incompatibility. Thanks ---- Quanlong At 2017-06-01 21:31:17, "Tim Armstrong" <[email protected]> wrote: >Hi Quanlong, > It looks like you're missing the TPC-H data. In older versions of Impala >you had to generate the data manually and put it in that directory. We've >automated that in more recent versions (I think probably since a year ago). >If you can switch to a newer version, then this will just work. Data >loading is a lot more reliable now. > >Otherwise this is the script that generates the data. You can probably copy >this script to your repository and run it by hand: > >https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpch/preload > >You will also need to do the same for TPC-DS: >https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpcds/preload > > >Cheers, >Tim > >On Thu, Jun 1, 2017 at 12:54 AM, 黄权隆 <[email protected]> wrote: > >> Hi friends, >> >> >> I'm trying to run the impala tests. What I referred is the wiki 'How to >> load and run Impala tests'. >> Although I just want to run some end-to-end tests, I know I should load >> the test data first. So I use >> | >> ./buildall.sh -noclean -testdata >> | >> It succeeded to load the functional test data, but failed to load the tpch >> data set. Here are some related logs: >> >> >> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3- >> release/testdata/target >> SUCCESS, data generated into /home/CORP/quanlong.huang/ >> workspace/Impala-cdh5.7.3-release/testdata/target >> Loading Hive Builtins (logging to load-hive-builtins.log)... OK >> Generating HBase data (logging to create-hbase.log)... OK >> Creating /test-warehouse HDFS directory (logging to >> create-test-warehouse-dir.log)... OK >> Starting Impala cluster (logging to start-impala-cluster.log)... OK >> Setting up HDFS environment (logging to setup-hdfs-env.log)... OK >> Loading custom schemas (logging to load-custom-schemas.log)... OK >> Loading functional-query data (logging to load-functional-query.log)... OK >> Loading TPC-H data (logging to load-tpch.log)... FAILED >> 'load-data tpch core' failed. Tail of log: >> Log for command 'load-data tpch core' >> Loading workload 'tpch' Using exploration strategy 'core'. Logging to >> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3- >> release/cluster_logs/data_loading/data-load-tpch-core.log >> Error loading data. The end of the log file is: >> at org.apache.thrift.ProcessFunction.process( >> ProcessFunction.java:39) >> at org.apache.thrift.TBaseProcessor.process( >> TBaseProcessor.java:39) >> at org.apache.hive.service.auth.TSetIpAddressProcessor.process( >> TSetIpAddressProcessor.java:56) >> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run( >> TThreadPoolServer.java:285) >> at java.util.concurrent.ThreadPoolExecutor.runWorker( >> ThreadPoolExecutor.java:1145) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run( >> ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:23 >> Invalid path ''/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3- >> release/testdata/impala-data/tpch/lineitem'': No files matching path >> file:/home/CORP/quanlong.huang/workspace/Impala-cdh5.7. >> 3-release/testdata/impala-data/tpch/lineitem >> at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer. >> applyConstraints(LoadSemanticAnalyzer.java:139) >> at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer. >> analyzeInternal(LoadSemanticAnalyzer.java:230) >> at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer. >> analyze(BaseSemanticAnalyzer.java:222) >> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:445) >> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) >> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver. >> java:1189) >> at org.apache.hadoop.hive.ql.Driver.compileAndRespond( >> Driver.java:1176) >> at org.apache.hive.service.cli.operation.SQLOperation. >> prepare(SQLOperation.java:134) >> ... 26 more >> >> >> Closing: 0: jdbc:hive2://localhost:11050/default;auth=none >> Error executing file from Hive: load-tpch-core-hive-generated.sql >> Error in /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3- >> release/testdata/bin/create-load-data.sh at line 41: while [ -n "$*" ] >> Error in ./buildall.sh at line 368: >> ${IMPALA_HOME}/testdata/bin/create-load-data.sh >> ${CREATE_LOAD_DATA_ARGS} <<< Y >> >> >> I'm using version cdh5.7.3-release. The directory >> ${IMPALA_HOME}/testdata/impala-data >> dose not exist. >> >> >> Could you tell me how to generate this data set? Or where can I download >> the snapshot file of test-warehouse so I can skip this step? >> >> >> Thanks >> ---- >> Quanlong >> >> >> >> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>> >> >> >> >> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>> >> >> >> >> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>
