Re: LeaseExpiredException while test data creation on ppc64le

Nishidha Panpaliya Tue, 08 Mar 2016 10:12:12 -0800

Hi Tim,

As you suggested, I disabled codegen and also generated core dump.


Core dump has pointed HashUtil::MurmurHash2_64 being problematic. Please
see attached log file.
(See attached file: hs_err_pid15697.log)

I tested this function individually in a small test app and it worked. May
be data given to it was simple enough for it to pass. But in case of
Impala, there is some issue with data/arguments passed to this function in
a particular case. Looks like this function is not called on machines where
SSE is supported, so on x86, you might not see this crash. Do you suspect
anything in this function or the functions calling this function? I'm still
debugging more into this.
If you have any clue, please point that to me so that I can try nail down
the issue on that direction.

Thanks,
Nishidha



From:   nishidha randad <[email protected]>
To:     Tim Armstrong <[email protected]>
Cc:     Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan
            Jagadale/Austin/Contr/IBM@IBMUS,
            [email protected]
Date:   03/07/2016 10:35 PM
Subject:        Re: LeaseExpiredException while test data creation on ppc64le



Thanks a lot Tim! I did check some of the impalad*. Error, impalad*.info,
and other logs in cluster_logs/data_loading. Couple of observations-
1. Dependency on SSE3, error says exiting as hardware does not support
SSE3.
2. One error says to increase num_of_threads_per_disk (something related to
number of threads, not sure about exact variable name) while starting
impalad
3. A few log files say that bad_alloc


I'm analysing all these errors. I'll dig more into this tomorrow and update
you.
One more help I wanted from you is in predicting the amount of work I may
be left with and possible challenges ahead. It would be really great if you
could point that to me from the logs I had posted.


Also, about LLVM 3.7 fixes you did, I was wondering if you have completed
upgradation, since you have also started encountering crashes.


Thanks again!


Nishidha


On 7 Mar 2016 21:56, "Tim Armstrong" <[email protected]> wrote:
  Hi Nishidha,
    I started working on our next release cycle towards the end of last
  week, so I've been looking at LLVM 3.7 and have made a bit of progress
  getting it working on intel. We're trying to get it done working so we
  have plenty of chance to test it.

  RE the TTransportException error, that is often because of a crash.
  Usually to debug I would first look at the /tmp/impalad.ERROR
  and /tmp/impalad.INFO logs for the cause of the crash. The embedded JVM
  also generates hs_err_pid*.log files with a crash report that can
  sometimes be useful. If that doesn't reveal the cause, then I'd look to
  see if there is a core dump in the Impala directory (I normally run with
  "ulimit -c unlimited" set so that a crash will generate a core file).

  I already fixed a couple of problems with codegen in LLVM 3.7, including
  one crash that was an assertion about struct sizes. I'll be posting the
  patch soon once I've done a bit more testing.

  It might help to make progress is you disable LLVM codegen by default
  during data loading by setting the following environment variable:

  export
  START_CLUSTER_ARGS='--impalad_args=-default_query_options="disable_codegen=1"'

  You can also start the test cluster with the same arguments or just set
  it in the set with "set disable_codegen=1).

  ./bin/start-impala-cluster.py
  --impalad_args=-default_query_options="disable_codegen=1"

  On Mon, Mar 7, 2016 at 5:13 AM, Nishidha Panpaliya <[email protected]>
  wrote:
   Hi Tim,

   Yes, I could fix this snappyError by building snappy-java for Power and
   adding the native library for power into existing
   snappy-java-1.0.4.1.jar used by hbase, hive, sentry and hadoop.
   The test data loading has been proceeded further and gave a new
   exception which I'm looking into and as below.

   Data Loading from Impala failed with error: ImpalaBeeswaxException:
   INNER EXCEPTION: <class
   'thrift.transport.TTransport.TTransportException'>
   MESSAGE: None

   Also, I've been able to start impala and try just one following query as
   given in https://github.com/cloudera/Impala/wiki/How-to-build-Impala-
   impala-shell.sh -q"SELECT version()"

   And regarding patch of my work, I'm sorry for the delay. Although it
   does not need any CLA to be signed, but it is under discussion with our
   IBM legal team, just to ensure we are compliant with the policies.
   Hoping to update you on this soon. Could you tell me when are you going
   to start with this new release cycle?

   Thanks,
   Nishidha

   Inactive hide details for Tim Armstrong ---03/05/2016 03:14:29 AM---It
   also looks like it got far enough that you should have aTim Armstrong
   ---03/05/2016 03:14:29 AM---It also looks like it got far enough that
   you should have a bit of data loaded - have you been able

   From: Tim Armstrong <[email protected]>
   To: nishidha panpaliya <[email protected]>
   Cc: Impala Dev <[email protected]>, Nishidha
   Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan
   Jagadale/Austin/Contr/IBM@IBMUS
   Date: 03/05/2016 03:14 AM
   Subject: Re: LeaseExpiredException while test data creation on ppc64le




   It also looks like it got far enough that you should have a bit of data
   loaded - have you been able to start impala and run queries on some of
   those tables?

   We're starting a new release cycle so I'm actually about to focus on
   upgrading our version of LLVM to 3.7 and getting the Intel support
   working. I think we're going to be putting a bit of effort into reducing
   LLVM code generation time: it seems like LLVM 3.7 is slightly slower in
   some cases.

   We should stay in sync, it would be good to make sure that any changes I
   make will work for your PowerPC work too. If you want to share any
   patches (even if you're not formally contributing them) it would be
   helpful for me to understand what you have already done on this path.

   Cheers,
   Tim

   On Fri, Mar 4, 2016 at 1:40 PM, Tim Armstrong <[email protected]>
   wrote:

         Hi Nishidha,
           It looks like Hive is maybe missing the native snappy library: I
         see this in the logs:

         java.lang.Exception: org.xerial.snappy.SnappyError:
         [FAILED_TO_LOAD_NATIVE_LIBRARY] null
             at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks
         (LocalJobRunner.java:462)
             at org.apache.hadoop.mapred.LocalJobRunner$Job.run
         (LocalJobRunner.java:522)
         Caused by: org.xerial.snappy.SnappyError:
         [FAILED_TO_LOAD_NATIVE_LIBRARY] null
             at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
             at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44)
             at org.apache.avro.file.SnappyCodec.compress
         (SnappyCodec.java:43)
             at org.apache.avro.file.DataFileStream$DataBlock.compressUsing
         (DataFileStream.java:361)
             at org.apache.avro.file.DataFileWriter.writeBlock
         (DataFileWriter.java:394)
             at org.apache.avro.file.DataFileWriter.sync
         (DataFileWriter.java:413)



         If you want to try making progress without Hive snappy support, I
         think you coudl disable some of the files formats by editing
         testdata/workloads/*/*.csv and removing some of the "snap" file
         formats. The impala test suite generates data in many different
         file formats with different compression settings.


         On Wed, Mar 2, 2016 at 7:08 AM, nishidha panpaliya <
         [email protected]> wrote:
         Hello,

         After building Impala on ppc64le, I'm trying to run all the tests
         of Impala. In the process, I'm getting an error while test data
         creation.
         Command ran -
                           ${IMPALA_HOME}/buildall.sh -testdata -format
         Output - Attached log (output.txt)

         Also attached logs named
         cluster_logs/data_loading/data-load-functional-exhaustive.log. And
         hive.log.

         I tried setting below parameters in hive-site.xml but of no use.
                           hive.exec.max.dynamic.partitions=100000;
                           hive.exec.max.dynamic.partitions.pernode=100000;
                           hive.exec.parallel=false

         I'll be really thankful if you could provide me some help here.

         Thanks in advance,
         Nishidha


         --
         You received this message because you are subscribed to the Google
         Groups "Impala Dev" group.
         To unsubscribe from this group and stop receiving emails from it,
         send an email to [email protected].

Re: LeaseExpiredException while test data creation on ppc64le

Reply via email to