Hi Tim,

Thank you for responding.

Please do let me know if any post-processing was done on the data at
https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp
.

Regards,
Valencia




From:   Tim Armstrong <[email protected]>
To:     Valencia Serrao/Austin/Contr/IBM@IBMUS
Cc:     Casey Ching <[email protected]>, Alex Behm
            <[email protected]>, [email protected],
            Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan
            Jagadale/Austin/Contr/IBM@IBMUS, Manish
            Patil/Austin/Contr/IBM@IBMUS
Date:   07/08/2016 01:31 AM
Subject:        Re: Fw: Issues with generating testdata for Impala



Hi Valencia,
  The data is scale factor 1 for the TPC-H and TPC-DS benchmarks:
http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp


I imagine you could reconstruct it using their data generators.

I'm unsure if we modified those data generators at all or did any
postprocessing. I'm going to check if anyone knows exactly how that data
was generated originally.

On Wed, Jul 6, 2016 at 10:52 PM, Valencia Serrao <[email protected]>
wrote:
  Hi Casey/Alex/Tim,

  I need to know whether it is possible to generate the tpch and tpcds data
  without using the tar's you provided at
  
https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp
  . Because when i tried to load data without using the tpch and tpcds
  tars, though functional-query data loaded successfully, I got the
  following error during the TPC-H data load step:

  Error: Error while compiling statement: FAILED: SemanticException Line
  1:23 Invalid path ''/ImpalaPPC/testdata/impala-data/tpch/lineitem'': No
  files matching path file: /ImpalaPPC/testdata/impala-data/tpch/lineitem
  (state=42000,code=40000)
  org.apache.hive.service.cli.HiveSQLException: Error while compiling
  statement: FAILED: SemanticException Line 1:23 Invalid path
  ''/ImpalaPPC/testdata/impala-data/tpch/lineitem'': No files matching path
  file:/ImpalaPPC/testdata/impala-data/tpch/lineitem
  at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:235)
  at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:221)
  at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:244)
  at org.apache.hive.beeline.Commands.executeInternal(Commands.java:893)
  at org.apache.hive.beeline.Commands.execute(Commands.java:1079)
  at org.apache.hive.beeline.Commands.sql(Commands.java:976)
  at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1085)
  at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917)
  at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:895)
  at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:837)
  at org.apache.hive.beeline.BeeLine.mainWithInputRedirection
  (BeeLine.java:482)
  at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke
  (NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke
  (DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
  Caused by: org.apache.hive.service.cli.HiveSQLException: Error while
  compiling statement: FAILED: SemanticException Line 1:23 Invalid path
  ''/ImpalaPPC/testdata/impala-data/tpch/lineitem'': No files matching path
  file:/ImpalaPPC/testdata/impala-data/tpch/lineitem


  Regards,
  Valencia

  Inactive hide details for Casey Ching ---05/04/2016 11:51:39 AM---Comment
  inline below On May 3, 2016 at 11:18:06 PM, Alex BehmCasey Ching
  ---05/04/2016 11:51:39 AM---Comment inline below On May 3, 2016 at
  11:18:06 PM, Alex Behm ([email protected]) wrote:

  From: Casey Ching <[email protected]>
  To: Alex Behm <[email protected]>, [email protected]
  Cc: Sudarshan Jagadale/Austin/Contr/IBM@IBMUS, Nishidha
  Panpaliya/Austin/Contr/IBM@IBMUS, Valencia Serrao/Austin/Contr/IBM@IBMUS
  Date: 05/04/2016 11:51 AM
  Subject: Re: Fw: Issues with generating testdata for Impala




  Comment inline below



  On May 3, 2016 at 11:18:06 PM, Alex Behm ([email protected]) wrote:


              Hi Valencia,

              I'm sorry you are having so much trouble with our setup.
              Let's see what we
              can do.

              There was an infra issue with receiving the logs you sent me.
              The
              email/attachment got rejected on our side. Maybe you can
              upload the logs
              somewhere so I can grab them?

              See more responses inline below.

              On Sat, Apr 30, 2016 at 5:01 AM, Valencia Serrao <
              [email protected]> wrote:

              > Hi Alex,
              >
              > I was going more deeper through the logs. I have some
              findings and queries:
              >
              > 1. At the "Invalidating Metadata" step (as mentioned in
              below mail), i
              > noticed that, it is trying to use kerberos. Perhaps, this
              is preventing the
              > testdata generation from proceeding, as we are not using
              Kerberos.
              > I need to know how this can be done without involving
              Kerberos support ?
              >
              Kerberos is certainly not needed to build and run tests.

              >
              > 2. I had executed the fe tests despite the incomplete
              testdata generation,
              > the tests started and surely have failed. Many of these
              (null pointer
              > exception in AuthorzationTests) have a common cause: "tpch
              database does
              > not exist."
              > e.g. as shown
              in .Impala/cluster_logs/query_tests/test-run-workload.log.
              >
              > Does the "tpch" database gets created after the current
              blocker step
              > "Invalidating Metadata" ?
              >

              Yes, the TPCH database is created and loaded as part of that
              first phase.
              However, the data files are not yet publicly accessible. Let
              me work on
              that from my side, and get back to you soon. One way or the
              other we'll be
              able to provide you with the data.

  The data is at
  
https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp
   . The files are split into 50 MB pieces for git. You can put them back
  together as is done in
  https://github.com/cloudera/Impala-docker-hub/blob/master/complete/Dockerfile

              >
              > 3. In the fe test console output log, another error shown:
              > ============================= test session starts
              > ==============================
              > platform linux2 -- Python 2.7.5 -- py-1.4.30 --
              pytest-2.7.2
              > rootdir: /work/, inifile:
              > plugins: random, xdist
              > ERROR: file not found:/work/I
              >
              mpala/../Impala-auxiliary-tests/tests/aux_custom_cluster_tests/

              >
              > These are not present/created on my vm. May i know when
              these get created ?
              >
              > 4. Could you also share the total number of fe tests ?
              >

              I'll privately send you the console output from a successful
              FE run.
              Hopefully that can help.

              Cheers,

              Alex

              >
              >
              > Looking forward to your reply.
              >
              > Regards,
              > Valencia
              >
              >
              > [image: Inactive hide details for Valencia
              Serrao---04/30/2016 09:05:54
              > AM---Hi Alex, I've been able to make some progress on
              testdata]Valencia
              > Serrao---04/30/2016 09:05:54 AM---Hi Alex, I've been able
              to make some
              > progress on testdata generation, however, i still face the
              foll
              >
              > From: Valencia Serrao/Austin/Contr/IBM
              > To: [email protected], Alex Behm <
              [email protected]>
              > Cc: Sudarshan Jagadale/Austin/Contr/IBM@IBMUS, Nishidha
              > Panpaliya/Austin/Contr/IBM@IBMUS, Valencia
              Serrao/Austin/Contr/IBM@IBMUS
              > Date: 04/30/2016 09:05 AM
              > Subject: Fw: Issues with generating testdata for Impala
              > ------------------------------
              >
              >
              >
              > Hi Alex,
              >
              > I've been able to make some progress on testdata
              generation, however, i
              > still face the following issues:
              >
              >
              >
              
*******************************************************************************************************************************************************************

              > Invalidating Metadata
              >
              >
              
(load-functional-query-exhaustive-impala-load-generated-parquet-none-none.sql):

              > INSERT OVERWRITE TABLE functional_parquet.alltypes
              partition (year, month)
              > SELECT id, bool_col, tinyint_col, smallint_col, int_col,
              bigint_col,
              > float_col, double_col, date_string_col, string_col,
              timestamp_col, year,
              > month
              > FROM functional.alltypes
              >
              > Data Loading from Impala failed with error:
              ImpalaBeeswaxException:
              > INNER EXCEPTION: <class 'socket.error'>
              > MESSAGE: [Errno 104] Connection reset by peer
              > Error
              in /root/nishidha/Impala/testdata/bin/create-load-data.sh at
              line
              > 41: while [ -n "$*" ]
              > Error in /root/nishidha/Impala/buildall.sh at line 368:
              > ${IMPALA_HOME}/testdata/bin/create-load-data.sh $
              {CREATE_LOAD_DATA_ARGS}
              > <<< Y
              >
              >
              
*************************************************************************************************************************************************************************

              >
              > i continued with fe tests as is. Here is the complete
              output log.
              > [attachment "fe_test_output.zip" deleted by Valencia
              > Serrao/Austin/Contr/IBM]
              >
              > Cluster logs: [attachment "cluster_logs.7z" deleted by
              Valencia
              > Serrao/Austin/Contr/IBM]
              >
              > Kindly guide me on the same.
              >
              > Regards,
              > Valencia
              > ----- Forwarded by Valencia Serrao/Austin/Contr/IBM on
              04/29/2016 10:57 AM
              > -----
              >
              > From: Sudarshan Jagadale/Austin/Contr/IBM
              > To: Valencia Serrao/Austin/Contr/IBM@IBMUS
              > Date: 04/29/2016 10:49 AM
              > Subject: Fw: Issues with generating testdata for Impala
              > ------------------------------
              >
              >
              > FYI
              > Thanks and Regards
              > Sudarshan Jagadale
              > Power Open Source Solutions
              > ----- Forwarded by Sudarshan Jagadale/Austin/Contr/IBM on
              04/29/2016 10:48
              > AM -----
              >
              > From: Alex Behm <[email protected]>
              > To: [email protected]
              > Cc: Sudarshan Jagadale/Austin/Contr/IBM@IBMUS, Nishidha
              > Panpaliya/Austin/Contr/IBM@IBMUS
              > Date: 04/28/2016 09:34 PM
              > Subject: Re: Issues with generating testdata for Impala
              > ------------------------------
              >
              >
              >
              > Hi Valencia,
              >
              > sorry I did not get the attachment. Would you be able to
              tar.gz and attach
              > the whole cluster_logs directory?
              >
              > Alex
              >
              > On Thu, Apr 28, 2016 at 6:23 AM, Valencia Serrao <*
              [email protected]*
              > <[email protected]>> wrote:
              >
              > Hi Alex,
              >
              > I tried building impala again with the following:
              > HDFS CDH 5.7.0 (
              > *
              
http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_57.html#topic_3*

              > <
              
http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_57.html#topic_3
              >
              > )
              > HBASE CDH 5.7.0 SNAPSHOT (
              > *
              
http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.7.0.tar.gz*

              > <
              http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.7.0.tar.gz
              > )
              > - this required to patch in a fix (
              > *
              
https://issues.apache.org/jira/secure/attachment/12792536/HBASE-15322-branch-1.2.patch*

              > <
              
https://issues.apache.org/jira/secure/attachment/12792536/HBASE-15322-branch-1.2.patch
              >
              > )
              > HIVE CDH 5.8.0 SNAPSHOT
              >
              > With the above combination, i'm able to move past the
              exception and
              > also have the RegionServer service up and running. However,
              it now gives
              > error as below:
              >
              >
              >
              
********************************************************************************************************************

              >
              
(load-functional-query-exhaustive-impala-generated-text-none-none.sql):

              > CREATE EXTERNAL TABLE IF NOT EXISTS functional.decimal_tbl
              (
              > d1 DECIMAL,
              > d2 DECIMAL(10, 0),
              > d3 DECIMAL(20, 10),
              > d4 DECIMAL(38, 38),
              > d5 DECIMAL(10, 5))
              > PARTITIONED BY (d6 DECIMAL(9, 0))
              > ROW FORMAT delimited fields terminated by ','
              > STORED AS TEXTFILE
              > LOCATION '/test-warehouse/decimal_tbl'
              >
              >
              
(load-functional-query-exhaustive-impala-generated-text-none-none.sql):

              > USE functional
              >
              >
              
(load-functional-query-exhaustive-impala-generated-text-none-none.sql):

              > ALTER TABLE decimal_tbl ADD IF NOT EXISTS PARTITION(d6=1)
              >
              > Data Loading from Impala failed with error:
              ImpalaBeeswaxException:
              > INNER EXCEPTION: <class
              > 'impala._thrift_gen.beeswax.ttypes.BeeswaxException'>
              > MESSAGE:
              > Error: null
              >
              >
              
******************************************************************************************************************

              >
              > Here is the complete log for the same. *(See attached file:

              > data-load-functional-exhaustive.log)*
              >
              > It would great if you could guide me on this issue, so i
              could proceed
              > with the fe tests.
              >
              > Still awaiting link to the source code of HDFS CDH 5.8.0
              >
              > Regards,
              > Valencia
              >
              >
              >
              >










Reply via email to