Hi Valencia, I have an update on the TPC-H/TPC-DS test data - I'm looking at automating that part of data generation. I was able to verify that it is the unmodified output of the TPC-H/TPC-DS data generator utilties (the versions we have in native-toolchain). The only change is to move each generated file into a subdirectory.
- Tim On Tue, Jul 19, 2016 at 9:23 PM, Valencia Serrao <[email protected]> wrote: > Hi Tim, > > Thanks for the update. > > Regards, > Valencia > > [image: Inactive hide details for Tim Armstrong ---07/20/2016 02:35:47 > AM---Hi Valencia, I wasn't able to get a clear answer, but as]Tim > Armstrong ---07/20/2016 02:35:47 AM---Hi Valencia, I wasn't able to get a > clear answer, but as far as we know it hasn't been > > From: Tim Armstrong <[email protected]> > To: Valencia Serrao/Austin/Contr/IBM@IBMUS > Cc: Alex Behm <[email protected]>, Casey Ching <[email protected]>, > [email protected], Manish Patil/Austin/Contr/IBM@IBMUS, > Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan > Jagadale/Austin/Contr/IBM@IBMUS > Date: 07/20/2016 02:35 AM > > Subject: Re: Fw: Issues with generating testdata for Impala > ------------------------------ > > > > Hi Valencia, > I wasn't able to get a clear answer, but as far as we know it hasn't > been modified. > > - Tim > > On Tue, Jul 12, 2016 at 4:59 AM, Valencia Serrao <*[email protected]* > <[email protected]>> wrote: > > Hi Tim, > > Thank you for responding. > > Please do let me know if any post-processing was done on the data at > > *https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp* > > <https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp> > *.* > > Regards, > Valencia > > > [image: Inactive hide details for Tim Armstrong ---07/08/2016 01:31:46 > AM---Hi Valencia, The data is scale factor 1 for the TPC-H and]Tim > Armstrong ---07/08/2016 01:31:46 AM---Hi Valencia, The data is scale factor > 1 for the TPC-H and TPC-DS benchmarks: > > From: Tim Armstrong <*[email protected]* > <[email protected]>> > To: Valencia Serrao/Austin/Contr/IBM@IBMUS > Cc: Casey Ching <*[email protected]* <[email protected]>>, Alex Behm > <*[email protected]* <[email protected]>>, > *[email protected]* <[email protected]>, > Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan > Jagadale/Austin/Contr/IBM@IBMUS, Manish Patil/Austin/Contr/IBM@IBMUS > Date: 07/08/2016 01:31 AM > > > Subject: Re: Fw: Issues with generating testdata for Impala > ------------------------------ > > > > Hi Valencia, > The data is scale factor 1 for the TPC-H and TPC-DS benchmarks: > > *http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp* > > <http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp> > > I imagine you could reconstruct it using their data generators. > > I'm unsure if we modified those data generators at all or did any > postprocessing. I'm going to check if anyone knows exactly how that data > was generated originally. > > On Wed, Jul 6, 2016 at 10:52 PM, Valencia Serrao <*[email protected]* > <[email protected]>> wrote: > Hi Casey/Alex/Tim, > > I need to know whether it is possible to generate the tpch and > tpcds data without using the tar's you provided at > > *https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp* > > <https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp>. > Because when i tried to load data without using the tpch and tpcds > tars, > though functional-query data loaded successfully, I got the > following error > during the TPC-H data load step: > > > > > > > > > > > > > > > > > > > > > > * Error: Error while compiling statement: FAILED: SemanticException Line > 1:23 Invalid path ''/ImpalaPPC/testdata/impala-data/tpch/lineitem'': > No > files matching path file: > /ImpalaPPC/testdata/impala-data/tpch/lineitem > (state=42000,code=40000) > org.apache.hive.service.cli.HiveSQLException: > Error while compiling statement: FAILED: SemanticException Line 1:23 > Invalid path ''/ImpalaPPC/testdata/impala-data/tpch/lineitem'': No > files > matching path file:/ImpalaPPC/testdata/impala-data/tpch/lineitem at > org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:235) at > org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:221) at > org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:244) at > org.apache.hive.beeline.Commands.executeInternal(Commands.java:893) > at > org.apache.hive.beeline.Commands.execute(Commands.java:1079) at > org.apache.hive.beeline.Commands.sql(Commands.java:976) at > org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1085) at > org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917) at > org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:895) at > org.apache.hive.beeline.BeeLine.begin(BeeLine.java:837) at > > org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465) at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) at > org.apache.hadoop.util.RunJar.run(RunJar.java:221) at > org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: > org.apache.hive.service.cli.HiveSQLException: Error while compiling > statement: FAILED: SemanticException Line 1:23 Invalid path > ''/ImpalaPPC/testdata/impala-data/tpch/lineitem'': No files matching > path > file:/ImpalaPPC/testdata/impala-data/tpch/lineitem* > > > Regards, > Valencia > > [image: Inactive hide details for Casey Ching ---05/04/2016 > 11:51:39 AM---Comment inline below On May 3, 2016 at 11:18:06 PM, > Alex Behm]Casey > Ching ---05/04/2016 11:51:39 AM---Comment inline below On May 3, > 2016 at > 11:18:06 PM, Alex Behm (*[email protected]* > <[email protected]>) wrote: > > From: Casey Ching <*[email protected]* <[email protected]>> > To: Alex Behm <*[email protected]* <[email protected]>>, > *[email protected]* > <[email protected]> > Cc: Sudarshan Jagadale/Austin/Contr/IBM@IBMUS, Nishidha > Panpaliya/Austin/Contr/IBM@IBMUS, Valencia > Serrao/Austin/Contr/IBM@IBMUS > Date: 05/04/2016 11:51 AM > Subject: Re: Fw: Issues with generating testdata for Impala > ------------------------------ > > > > > Comment inline below > > On May 3, 2016 at 11:18:06 PM, Alex Behm ( > *[email protected]* <[email protected]>) wrote: > Hi Valencia, > > I'm sorry you are having so much trouble > with our setup. Let's see what we > can do. > > There was an infra issue with receiving > the logs you sent me. The > email/attachment got rejected on our > side. Maybe you can upload the logs > somewhere so I can grab them? > > See more responses inline below. > > On Sat, Apr 30, 2016 at 5:01 AM, > Valencia Serrao <*[email protected]* > <[email protected]>> wrote: > > > Hi Alex, > > > > I was going more deeper through the > logs. I have some findings and queries: > > > > 1. At the "Invalidating Metadata" step > (as mentioned in below mail), i > > noticed that, it is trying to use > kerberos. Perhaps, this is preventing the > > testdata generation from proceeding, > as we are not using Kerberos. > > I need to know how this can be done > without involving Kerberos support ? > > > Kerberos is certainly not needed to > build and run tests. > > > > > 2. I had executed the fe tests despite > the incomplete testdata generation, > > the tests started and surely have > failed. Many of these (null pointer > > exception in AuthorzationTests) have a > common cause: "tpch database does > > not exist." > > e.g. as shown in > > .Impala/cluster_logs/query_tests/test-run-workload.log. > > > > Does the "tpch" database gets created > after the current blocker step > > "Invalidating Metadata" ? > > > > Yes, the TPCH database is created and > loaded as part of that first phase. > However, the data files are not yet > publicly accessible. Let me work on > that from my side, and get back to you > soon. One way or the other we'll be > able to provide you with the data. > > The data is at > > *https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp* > > <https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp> > . The files are split into 50 MB pieces for git. You can put them > back > together as is done in > > *https://github.com/cloudera/Impala-docker-hub/blob/master/complete/Dockerfile* > > <https://github.com/cloudera/Impala-docker-hub/blob/master/complete/Dockerfile> > > > > > 3. In the fe test console output log, > another error shown: > > ============================= test > session starts > > ============================== > > platform linux2 -- Python 2.7.5 -- > py-1.4.30 -- pytest-2.7.2 > > rootdir: /work/, inifile: > > plugins: random, xdist > > ERROR: file not found:/work/I > > > > mpala/../Impala-auxiliary-tests/tests/aux_custom_cluster_tests/ > > > > These are not present/created on my > vm. May i know when these get created ? > > > > 4. Could you also share the total > number of fe tests ? > > > > I'll privately send you the console > output from a successful FE run. > Hopefully that can help. > > Cheers, > > Alex > > > > > > > Looking forward to your reply. > > > > Regards, > > Valencia > > > > > > [image: Inactive hide details for > Valencia Serrao---04/30/2016 09:05:54 > > AM---Hi Alex, I've been able to make > some progress on testdata]Valencia > > Serrao---04/30/2016 09:05:54 AM---Hi > Alex, I've been able to make some > > progress on testdata generation, > however, i still face the foll > > > > From: Valencia Serrao/Austin/Contr/IBM > > To: *[email protected]* > <[email protected]>, Alex > Behm <*[email protected]* > <[email protected]>> > > Cc: Sudarshan > Jagadale/Austin/Contr/IBM@IBMUS, > Nishidha > > Panpaliya/Austin/Contr/IBM@IBMUS, > Valencia Serrao/Austin/Contr/IBM@IBMUS > > Date: 04/30/2016 09:05 AM > > Subject: Fw: Issues with generating > testdata for Impala > > ------------------------------ > > > > > > > > Hi Alex, > > > > I've been able to make some progress > on testdata generation, however, i > > still face the following issues: > > > > > > > > ******************************************************************************************************************************************************************* > > > Invalidating Metadata > > > > > > (load-functional-query-exhaustive-impala-load-generated-parquet-none-none.sql): > > > INSERT OVERWRITE TABLE > functional_parquet.alltypes partition (year, > month) > > SELECT id, bool_col, tinyint_col, > smallint_col, int_col, bigint_col, > > float_col, double_col, > date_string_col, string_col, timestamp_col, > year, > > month > > FROM functional.alltypes > > > > Data Loading from Impala failed with > error: ImpalaBeeswaxException: > > INNER EXCEPTION: <class > 'socket.error'> > > MESSAGE: [Errno 104] Connection reset > by peer > > Error in > > /root/nishidha/Impala/testdata/bin/create-load-data.sh at line > > 41: while [ -n "$*" ] > > Error in > /root/nishidha/Impala/buildall.sh at line > 368: > > > > ${IMPALA_HOME}/testdata/bin/create-load-data.sh ${CREATE_LOAD_DATA_ARGS} > > <<< Y > > > > > > ************************************************************************************************************************************************************************* > > > > > i continued with fe tests as is. Here > is the complete output log. > > [attachment "fe_test_output.zip" > deleted by Valencia > > Serrao/Austin/Contr/IBM] > > > > Cluster logs: [attachment > "cluster_logs.7z" deleted by Valencia > > Serrao/Austin/Contr/IBM] > > > > Kindly guide me on the same. > > > > Regards, > > Valencia > > ----- Forwarded by Valencia > Serrao/Austin/Contr/IBM on 04/29/2016 10:57 > AM > > ----- > > > > From: Sudarshan > Jagadale/Austin/Contr/IBM > > To: Valencia > Serrao/Austin/Contr/IBM@IBMUS > > Date: 04/29/2016 10:49 AM > > Subject: Fw: Issues with generating > testdata for Impala > > ------------------------------ > > > > > > FYI > > Thanks and Regards > > Sudarshan Jagadale > > Power Open Source Solutions > > ----- Forwarded by Sudarshan > Jagadale/Austin/Contr/IBM on 04/29/2016 10:48 > > AM ----- > > > > From: Alex Behm < > *[email protected]* > <[email protected]>> > > To: *[email protected]* > <[email protected]> > > Cc: Sudarshan > Jagadale/Austin/Contr/IBM@IBMUS, > Nishidha > > Panpaliya/Austin/Contr/IBM@IBMUS > > Date: 04/28/2016 09:34 PM > > Subject: Re: Issues with generating > testdata for Impala > > ------------------------------ > > > > > > > > Hi Valencia, > > > > sorry I did not get the attachment. > Would you be able to tar.gz and attach > > the whole cluster_logs directory? > > > > Alex > > > > On Thu, Apr 28, 2016 at 6:23 AM, > Valencia Serrao <**[email protected]* > <[email protected]>* > > <*[email protected]* > <[email protected]>>> wrote: > > > > Hi Alex, > > > > I tried building impala again with the > following: > > HDFS CDH 5.7.0 ( > > * > > *http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_57.html#topic_3** > > <http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_57.html#topic_3*> > > < > > *http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_57.html#topic_3* > > <http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_57.html#topic_3>> > > > ) > > HBASE CDH 5.7.0 SNAPSHOT ( > > * > > *http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.7.0.tar.gz** > > <http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.7.0.tar.gz*> > > < > > *http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.7.0.tar.gz* > > <http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.7.0.tar.gz>> > ) > > - this required to patch in a fix ( > > * > > *https://issues.apache.org/jira/secure/attachment/12792536/HBASE-15322-branch-1.2.patch** > > <https://issues.apache.org/jira/secure/attachment/12792536/HBASE-15322-branch-1.2.patch*> > > < > > *https://issues.apache.org/jira/secure/attachment/12792536/HBASE-15322-branch-1.2.patch* > > <https://issues.apache.org/jira/secure/attachment/12792536/HBASE-15322-branch-1.2.patch>> > > > ) > > HIVE CDH 5.8.0 SNAPSHOT > > > > With the above combination, i'm able > to move past the exception and > > also have the RegionServer service up > and running. However, it now gives > > error as below: > > > > > > > > ******************************************************************************************************************** > > > > > (load-functional-query-exhaustive-impala-generated-text-none-none.sql): > > CREATE EXTERNAL TABLE IF NOT EXISTS > functional.decimal_tbl ( > > d1 DECIMAL, > > d2 DECIMAL(10, 0), > > d3 DECIMAL(20, 10), > > d4 DECIMAL(38, 38), > > d5 DECIMAL(10, 5)) > > PARTITIONED BY (d6 DECIMAL(9, 0)) > > ROW FORMAT delimited fields terminated > by ',' > > STORED AS TEXTFILE > > LOCATION '/test-warehouse/decimal_tbl' > > > > > > (load-functional-query-exhaustive-impala-generated-text-none-none.sql): > > USE functional > > > > > > (load-functional-query-exhaustive-impala-generated-text-none-none.sql): > > ALTER TABLE decimal_tbl ADD IF NOT > EXISTS PARTITION(d6=1) > > > > Data Loading from Impala failed with > error: ImpalaBeeswaxException: > > INNER EXCEPTION: <class > > > > 'impala._thrift_gen.beeswax.ttypes.BeeswaxException'> > > MESSAGE: > > Error: null > > > > > > ****************************************************************************************************************** > > > > > Here is the complete log for the same. > *(See attached file: > > data-load-functional-exhaustive.log)* > > > > It would great if you could guide me > on this issue, so i could proceed > > with the fe tests. > > > > Still awaiting link to the source code > of HDFS CDH 5.8.0 > > > > Regards, > > Valencia > > > > > > > > > > > > >
