Hi Valencia,
  I wasn't able to get a clear answer, but as far as we know it hasn't been
modified.

- Tim

On Tue, Jul 12, 2016 at 4:59 AM, Valencia Serrao <[email protected]> wrote:

> Hi Tim,
>
> Thank you for responding.
>
> Please do let me know if any post-processing was done on the data at
> *https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp*
> <https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp>
> *.*
>
> Regards,
> Valencia
>
>
> [image: Inactive hide details for Tim Armstrong ---07/08/2016 01:31:46
> AM---Hi Valencia, The data is scale factor 1 for the TPC-H and]Tim
> Armstrong ---07/08/2016 01:31:46 AM---Hi Valencia, The data is scale factor
> 1 for the TPC-H and TPC-DS benchmarks:
>
> From: Tim Armstrong <[email protected]>
> To: Valencia Serrao/Austin/Contr/IBM@IBMUS
> Cc: Casey Ching <[email protected]>, Alex Behm <[email protected]>,
> [email protected], Nishidha Panpaliya/Austin/Contr/IBM@IBMUS,
> Sudarshan Jagadale/Austin/Contr/IBM@IBMUS, Manish
> Patil/Austin/Contr/IBM@IBMUS
> Date: 07/08/2016 01:31 AM
>
> Subject: Re: Fw: Issues with generating testdata for Impala
> ------------------------------
>
>
>
> Hi Valencia,
>   The data is scale factor 1 for the TPC-H and TPC-DS benchmarks:
> *http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp*
> <http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp>
>
> I imagine you could reconstruct it using their data generators.
>
> I'm unsure if we modified those data generators at all or did any
> postprocessing. I'm going to check if anyone knows exactly how that data
> was generated originally.
>
> On Wed, Jul 6, 2016 at 10:52 PM, Valencia Serrao <*[email protected]*
> <[email protected]>> wrote:
>
>    Hi Casey/Alex/Tim,
>
>    I need to know whether it is possible to generate the tpch and tpcds
>    data without using the tar's you provided at
>    
> *https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp*
>    
> <https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp>.
>    Because when i tried to load data without using the tpch and tpcds tars,
>    though functional-query data loaded successfully, I got the following error
>    during the TPC-H data load step:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> * Error: Error while compiling statement: FAILED: SemanticException Line
>    1:23 Invalid path ''/ImpalaPPC/testdata/impala-data/tpch/lineitem'': No
>    files matching path file: /ImpalaPPC/testdata/impala-data/tpch/lineitem
>    (state=42000,code=40000) org.apache.hive.service.cli.HiveSQLException:
>    Error while compiling statement: FAILED: SemanticException Line 1:23
>    Invalid path ''/ImpalaPPC/testdata/impala-data/tpch/lineitem'': No files
>    matching path file:/ImpalaPPC/testdata/impala-data/tpch/lineitem at
>    org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:235) at
>    org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:221) at
>    org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:244) at
>    org.apache.hive.beeline.Commands.executeInternal(Commands.java:893) at
>    org.apache.hive.beeline.Commands.execute(Commands.java:1079) at
>    org.apache.hive.beeline.Commands.sql(Commands.java:976) at
>    org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1085) at
>    org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917) at
>    org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:895) at
>    org.apache.hive.beeline.BeeLine.begin(BeeLine.java:837) at
>    org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482)
>    at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465) at
>    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
>    
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>    at
>    
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>    at java.lang.reflect.Method.invoke(Method.java:606) at
>    org.apache.hadoop.util.RunJar.run(RunJar.java:221) at
>    org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by:
>    org.apache.hive.service.cli.HiveSQLException: Error while compiling
>    statement: FAILED: SemanticException Line 1:23 Invalid path
>    ''/ImpalaPPC/testdata/impala-data/tpch/lineitem'': No files matching path
>    file:/ImpalaPPC/testdata/impala-data/tpch/lineitem*
>
>
>    Regards,
>    Valencia
>
>    [image: Inactive hide details for Casey Ching ---05/04/2016 11:51:39
>    AM---Comment inline below On May 3, 2016 at 11:18:06 PM, Alex Behm]Casey
>    Ching ---05/04/2016 11:51:39 AM---Comment inline below On May 3, 2016 at
>    11:18:06 PM, Alex Behm (*[email protected]*
>    <[email protected]>) wrote:
>
>    From: Casey Ching <*[email protected]* <[email protected]>>
>    To: Alex Behm <*[email protected]* <[email protected]>>,
>    *[email protected]* <[email protected]>
>    Cc: Sudarshan Jagadale/Austin/Contr/IBM@IBMUS, Nishidha
>    Panpaliya/Austin/Contr/IBM@IBMUS, Valencia
>    Serrao/Austin/Contr/IBM@IBMUS
>    Date: 05/04/2016 11:51 AM
>    Subject: Re: Fw: Issues with generating testdata for Impala
>    ------------------------------
>
>
>
>
>    Comment inline below
>
>    On May 3, 2016 at 11:18:06 PM, Alex Behm (*[email protected]*
>    <[email protected]>) wrote:
>    Hi Valencia,
>
>                I'm sorry you are having so much trouble with our setup.
>                Let's see what we
>                can do.
>
>                There was an infra issue with receiving the logs you sent
>                me. The
>                email/attachment got rejected on our side. Maybe you can
>                upload the logs
>                somewhere so I can grab them?
>
>                See more responses inline below.
>
>                On Sat, Apr 30, 2016 at 5:01 AM, Valencia Serrao <
>                *[email protected]* <[email protected]>> wrote:
>
>                > Hi Alex,
>                >
>                > I was going more deeper through the logs. I have some
>                findings and queries:
>                >
>                > 1. At the "Invalidating Metadata" step (as mentioned in
>                below mail), i
>                > noticed that, it is trying to use kerberos. Perhaps,
>                this is preventing the
>                > testdata generation from proceeding, as we are not using
>                Kerberos.
>                > I need to know how this can be done without involving
>                Kerberos support ?
>                >
>                Kerberos is certainly not needed to build and run tests.
>
>                >
>                > 2. I had executed the fe tests despite the incomplete
>                testdata generation,
>                > the tests started and surely have failed. Many of these
>                (null pointer
>                > exception in AuthorzationTests) have a common cause:
>                "tpch database does
>                > not exist."
>                > e.g. as shown in
>                .Impala/cluster_logs/query_tests/test-run-workload.log.
>                >
>                > Does the "tpch" database gets created after the current
>                blocker step
>                > "Invalidating Metadata" ?
>                >
>
>                Yes, the TPCH database is created and loaded as part of
>                that first phase.
>                However, the data files are not yet publicly accessible.
>                Let me work on
>                that from my side, and get back to you soon. One way or
>                the other we'll be
>                able to provide you with the data.
>
>    The data is at
>    
> *https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp*
>    
> <https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp>
>    . The files are split into 50 MB pieces for git. You can put them back
>    together as is done in
>    
> *https://github.com/cloudera/Impala-docker-hub/blob/master/complete/Dockerfile*
>    
> <https://github.com/cloudera/Impala-docker-hub/blob/master/complete/Dockerfile>
>
>                >
>                > 3. In the fe test console output log, another error
>                shown:
>                > ============================= test session starts
>                > ==============================
>                > platform linux2 -- Python 2.7.5 -- py-1.4.30 --
>                pytest-2.7.2
>                > rootdir: /work/, inifile:
>                > plugins: random, xdist
>                > ERROR: file not found:/work/I
>                >
>                mpala/../Impala-auxiliary-tests/tests/aux_custom_cluster_tests/
>                >
>                > These are not present/created on my vm. May i know when
>                these get created ?
>                >
>                > 4. Could you also share the total number of fe tests ?
>                >
>
>                I'll privately send you the console output from a
>                successful FE run.
>                Hopefully that can help.
>
>                Cheers,
>
>                Alex
>
>                >
>                >
>                > Looking forward to your reply.
>                >
>                > Regards,
>                > Valencia
>                >
>                >
>                > [image: Inactive hide details for Valencia
>                Serrao---04/30/2016 09:05:54
>                > AM---Hi Alex, I've been able to make some progress on
>                testdata]Valencia
>                > Serrao---04/30/2016 09:05:54 AM---Hi Alex, I've been
>                able to make some
>                > progress on testdata generation, however, i still face
>                the foll
>                >
>                > From: Valencia Serrao/Austin/Contr/IBM
>                > To: *[email protected]*
>                <[email protected]>, Alex Behm <
>                *[email protected]* <[email protected]>>
>                > Cc: Sudarshan Jagadale/Austin/Contr/IBM@IBMUS, Nishidha
>                > Panpaliya/Austin/Contr/IBM@IBMUS, Valencia
>                Serrao/Austin/Contr/IBM@IBMUS
>                > Date: 04/30/2016 09:05 AM
>                > Subject: Fw: Issues with generating testdata for Impala
>                > ------------------------------
>                >
>                >
>                >
>                > Hi Alex,
>                >
>                > I've been able to make some progress on testdata
>                generation, however, i
>                > still face the following issues:
>                >
>                >
>                >
>                
> *******************************************************************************************************************************************************************
>
>                > Invalidating Metadata
>                >
>                >
>                
> (load-functional-query-exhaustive-impala-load-generated-parquet-none-none.sql):
>
>                > INSERT OVERWRITE TABLE functional_parquet.alltypes
>                partition (year, month)
>                > SELECT id, bool_col, tinyint_col, smallint_col, int_col,
>                bigint_col,
>                > float_col, double_col, date_string_col, string_col,
>                timestamp_col, year,
>                > month
>                > FROM functional.alltypes
>                >
>                > Data Loading from Impala failed with error:
>                ImpalaBeeswaxException:
>                > INNER EXCEPTION: <class 'socket.error'>
>                > MESSAGE: [Errno 104] Connection reset by peer
>                > Error in
>                /root/nishidha/Impala/testdata/bin/create-load-data.sh at line
>                > 41: while [ -n "$*" ]
>                > Error in /root/nishidha/Impala/buildall.sh at line 368:
>                > ${IMPALA_HOME}/testdata/bin/create-load-data.sh
>                ${CREATE_LOAD_DATA_ARGS}
>                > <<< Y
>                >
>                >
>                
> *************************************************************************************************************************************************************************
>
>                >
>                > i continued with fe tests as is. Here is the complete
>                output log.
>                > [attachment "fe_test_output.zip" deleted by Valencia
>                > Serrao/Austin/Contr/IBM]
>                >
>                > Cluster logs: [attachment "cluster_logs.7z" deleted by
>                Valencia
>                > Serrao/Austin/Contr/IBM]
>                >
>                > Kindly guide me on the same.
>                >
>                > Regards,
>                > Valencia
>                > ----- Forwarded by Valencia Serrao/Austin/Contr/IBM on
>                04/29/2016 10:57 AM
>                > -----
>                >
>                > From: Sudarshan Jagadale/Austin/Contr/IBM
>                > To: Valencia Serrao/Austin/Contr/IBM@IBMUS
>                > Date: 04/29/2016 10:49 AM
>                > Subject: Fw: Issues with generating testdata for Impala
>                > ------------------------------
>                >
>                >
>                > FYI
>                > Thanks and Regards
>                > Sudarshan Jagadale
>                > Power Open Source Solutions
>                > ----- Forwarded by Sudarshan Jagadale/Austin/Contr/IBM
>                on 04/29/2016 10:48
>                > AM -----
>                >
>                > From: Alex Behm <*[email protected]*
>                <[email protected]>>
>                > To: *[email protected]*
>                <[email protected]>
>                > Cc: Sudarshan Jagadale/Austin/Contr/IBM@IBMUS, Nishidha
>                > Panpaliya/Austin/Contr/IBM@IBMUS
>                > Date: 04/28/2016 09:34 PM
>                > Subject: Re: Issues with generating testdata for Impala
>                > ------------------------------
>                >
>                >
>                >
>                > Hi Valencia,
>                >
>                > sorry I did not get the attachment. Would you be able to
>                tar.gz and attach
>                > the whole cluster_logs directory?
>                >
>                > Alex
>                >
>                > On Thu, Apr 28, 2016 at 6:23 AM, Valencia Serrao <*
>                *[email protected]* <[email protected]>*
>                > <*[email protected]* <[email protected]>>> wrote:
>                >
>                > Hi Alex,
>                >
>                > I tried building impala again with the following:
>                > HDFS CDH 5.7.0 (
>                > *
>                
> *http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_57.html#topic_3**
>                
> <http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_57.html#topic_3*>
>                > <
>                
> *http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_57.html#topic_3*
>                
> <http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_57.html#topic_3>>
>
>                > )
>                > HBASE CDH 5.7.0 SNAPSHOT (
>                > *
>                
> *http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.7.0.tar.gz**
>                
> <http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.7.0.tar.gz*>
>                > <
>                
> *http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.7.0.tar.gz*
>                
> <http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.7.0.tar.gz>>
>                )
>                > - this required to patch in a fix (
>                > *
>                
> *https://issues.apache.org/jira/secure/attachment/12792536/HBASE-15322-branch-1.2.patch**
>                
> <https://issues.apache.org/jira/secure/attachment/12792536/HBASE-15322-branch-1.2.patch*>
>                > <
>                
> *https://issues.apache.org/jira/secure/attachment/12792536/HBASE-15322-branch-1.2.patch*
>                
> <https://issues.apache.org/jira/secure/attachment/12792536/HBASE-15322-branch-1.2.patch>>
>
>                > )
>                > HIVE CDH 5.8.0 SNAPSHOT
>                >
>                > With the above combination, i'm able to move past the
>                exception and
>                > also have the RegionServer service up and running.
>                However, it now gives
>                > error as below:
>                >
>                >
>                >
>                
> ********************************************************************************************************************
>
>                >
>                
> (load-functional-query-exhaustive-impala-generated-text-none-none.sql):
>                > CREATE EXTERNAL TABLE IF NOT EXISTS
>                functional.decimal_tbl (
>                > d1 DECIMAL,
>                > d2 DECIMAL(10, 0),
>                > d3 DECIMAL(20, 10),
>                > d4 DECIMAL(38, 38),
>                > d5 DECIMAL(10, 5))
>                > PARTITIONED BY (d6 DECIMAL(9, 0))
>                > ROW FORMAT delimited fields terminated by ','
>                > STORED AS TEXTFILE
>                > LOCATION '/test-warehouse/decimal_tbl'
>                >
>                >
>                
> (load-functional-query-exhaustive-impala-generated-text-none-none.sql):
>                > USE functional
>                >
>                >
>                
> (load-functional-query-exhaustive-impala-generated-text-none-none.sql):
>                > ALTER TABLE decimal_tbl ADD IF NOT EXISTS
>                PARTITION(d6=1)
>                >
>                > Data Loading from Impala failed with error:
>                ImpalaBeeswaxException:
>                > INNER EXCEPTION: <class
>                > 'impala._thrift_gen.beeswax.ttypes.BeeswaxException'>
>                > MESSAGE:
>                > Error: null
>                >
>                >
>                
> ******************************************************************************************************************
>
>                >
>                > Here is the complete log for the same. *(See attached
>                file:
>                > data-load-functional-exhaustive.log)*
>                >
>                > It would great if you could guide me on this issue, so i
>                could proceed
>                > with the fe tests.
>                >
>                > Still awaiting link to the source code of HDFS CDH 5.8.0
>                >
>                > Regards,
>                > Valencia
>                >
>                >
>                >
>                >
>
>
>
>
>

Reply via email to