Re: LeaseExpiredException while test data creation on ppc64le

Todd Lipcon Tue, 08 Mar 2016 10:20:56 -0800

On Tue, Mar 8, 2016 at 8:26 AM, Nishidha Panpaliya <[email protected]>
wrote:


>
> I tested this function individually in a small test app and it worked. May
> be data given to it was simple enough for it to pass. But in case of
> Impala, there is some issue with data/arguments passed to this function in
> a particular case. Looks like this function is not called on machines where
> SSE is supported, so on x86, you might not see this crash. Do you suspect
> anything in this function or the functions calling this function? I'm still
> debugging more into this.
> If you have any clue, please point that to me so that I can try nail down
> the issue on that direction.
>

Kudu uses the same function on x86 and it works fine. You might try
changing the *reinterpret_cast<uint64_t> stuff to use the UNALIGNED_LOAD64
macro from gutil/port.h and see if it helps, although reading the comments
in that file, it says that modern PPC chips can do unaligned loads fine.

-Todd


>
> Thanks,
> Nishidha
>
> [image: Inactive hide details for nishidha randad ---03/07/2016 10:35:39
> PM---Thanks a lot Tim! I did check some of the impalad*. Error]nishidha
> randad ---03/07/2016 10:35:39 PM---Thanks a lot Tim! I did check some of
> the impalad*. Error, impalad*.info, and other logs in cluster_
>
> From: nishidha randad <[email protected]>
> To: Tim Armstrong <[email protected]>
> Cc: Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan
> Jagadale/Austin/Contr/IBM@IBMUS, [email protected]
> Date: 03/07/2016 10:35 PM
> Subject: Re: LeaseExpiredException while test data creation on ppc64le
> ------------------------------
>
>
>
> Thanks a lot Tim! I did check some of the impalad*. Error, impalad*.info,
> and other logs in cluster_logs/data_loading. Couple of observations-
> 1. Dependency on SSE3, error says exiting as hardware does not support
> SSE3.
> 2. One error says to increase num_of_threads_per_disk (something related
> to number of threads, not sure about exact variable name) while starting
> impalad
> 3. A few log files say that bad_alloc
>
> I'm analysing all these errors. I'll dig more into this tomorrow and
> update you.
> One more help I wanted from you is in predicting the amount of work I may
> be left with and possible challenges ahead. It would be really great if you
> could point that to me from the logs I had posted.
>
> Also, about LLVM 3.7 fixes you did, I was wondering if you have completed
> upgradation, since you have also started encountering crashes.
>
> Thanks again!
>
> Nishidha
>
> On 7 Mar 2016 21:56, "Tim Armstrong" <*[email protected]*
> <[email protected]>> wrote:
>
>    Hi Nishidha,
>      I started working on our next release cycle towards the end of last
>    week, so I've been looking at LLVM 3.7 and have made a bit of progress
>    getting it working on intel. We're trying to get it done working so we have
>    plenty of chance to test it.
>
>    RE the TTransportException error, that is often because of a crash.
>    Usually to debug I would first look at the /tmp/impalad.ERROR and
>    /tmp/impalad.INFO logs for the cause of the crash. The embedded JVM also
>    generates hs_err_pid*.log files with a crash report that can sometimes be
>    useful. If that doesn't reveal the cause, then I'd look to see if there is
>    a core dump in the Impala directory (I normally run with "ulimit -c
>    unlimited" set so that a crash will generate a core file).
>
>    I already fixed a couple of problems with codegen in LLVM 3.7,
>    including one crash that was an assertion about struct sizes. I'll be
>    posting the patch soon once I've done a bit more testing.
>
>    It might help to make progress is you disable LLVM codegen by default
>    during data loading by setting the following environment variable:
>
>    export
>    
> START_CLUSTER_ARGS='--impalad_args=-default_query_options="disable_codegen=1"'
>
>    You can also start the test cluster with the same arguments or just
>    set it in the set with "set disable_codegen=1).
>
>    ./bin/start-impala-cluster.py
>    --impalad_args=-default_query_options="disable_codegen=1"
>
>    On Mon, Mar 7, 2016 at 5:13 AM, Nishidha Panpaliya <
>    *[email protected]* <[email protected]>> wrote:
>    Hi Tim,
>
>    Yes, I could fix this snappyError by building snappy-java for Power
>    and adding the native library for power into existing
>    snappy-java-1.0.4.1.jar used by hbase, hive, sentry and hadoop.
>    The test data loading has been proceeded further and gave a new
>    exception which I'm looking into and as below.
>
>    Data Loading from Impala failed with error: ImpalaBeeswaxException:
>    INNER EXCEPTION: <class
>    'thrift.transport.TTransport.TTransportException'>
>    MESSAGE: None
>
>    Also, I've been able to start impala and try just one following query
>    as given in
>    *https://github.com/cloudera/Impala/wiki/How-to-build-Impala*
>    <https://github.com/cloudera/Impala/wiki/How-to-build-Impala>-
>    impala-shell.sh -q"SELECT version()"
>
>    And regarding patch of my work, I'm sorry for the delay. Although it
>    does not need any CLA to be signed, but it is under discussion with our IBM
>    legal team, just to ensure we are compliant with the policies. Hoping to
>    update you on this soon. Could you tell me when are you going to start with
>    this new release cycle?
>
>    Thanks,
>    Nishidha
>
>    [image: Inactive hide details for Tim Armstrong ---03/05/2016 03:14:29
>    AM---It also looks like it got far enough that you should have a]Tim
>    Armstrong ---03/05/2016 03:14:29 AM---It also looks like it got far enough
>    that you should have a bit of data loaded - have you been able
>
>    From: Tim Armstrong <*[email protected]*
>    <[email protected]>>
>    To: nishidha panpaliya <*[email protected]* <[email protected]>>
>    Cc: Impala Dev <*[email protected]* <[email protected]>>,
>    Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan
>    Jagadale/Austin/Contr/IBM@IBMUS
>    Date: 03/05/2016 03:14 AM
>    Subject: Re: LeaseExpiredException while test data creation on ppc64le
>    ------------------------------
>
>
>
>
>    It also looks like it got far enough that you should have a bit of
>    data loaded - have you been able to start impala and run queries on some of
>    those tables?
>
>    We're starting a new release cycle so I'm actually about to focus on
>    upgrading our version of LLVM to 3.7 and getting the Intel support working.
>    I think we're going to be putting a bit of effort into reducing LLVM code
>    generation time: it seems like LLVM 3.7 is slightly slower in some cases.
>
>    We should stay in sync, it would be good to make sure that any changes
>    I make will work for your PowerPC work too. If you want to share any
>    patches (even if you're not formally contributing them) it would be helpful
>    for me to understand what you have already done on this path.
>
>    Cheers,
>    Tim
>
>    On Fri, Mar 4, 2016 at 1:40 PM, Tim Armstrong <
>    *[email protected]* <[email protected]>> wrote:
>
>          Hi Nishidha,
>            It looks like Hive is maybe missing the native snappy library:
>          I see this in the logs:
>
>          java.lang.Exception: org.xerial.snappy.SnappyError:
>          [FAILED_TO_LOAD_NATIVE_LIBRARY] null
>              at
>          
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>              at
>          
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
>          Caused by: org.xerial.snappy.SnappyError:
>          [FAILED_TO_LOAD_NATIVE_LIBRARY] null
>              at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
>              at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44)
>              at
>          org.apache.avro.file.SnappyCodec.compress(SnappyCodec.java:43)
>              at
>          
> org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:361)
>              at
>          
> org.apache.avro.file.DataFileWriter.writeBlock(DataFileWriter.java:394)
>              at
>          org.apache.avro.file.DataFileWriter.sync(DataFileWriter.java:413)
>
>
>
>          If you want to try making progress without Hive snappy support,
>          I think you coudl disable some of the files formats by editing
>          testdata/workloads/*/*.csv and removing some of the "snap" file 
> formats.
>          The impala test suite generates data in many different file formats 
> with
>          different compression settings.
>
>
>          On Wed, Mar 2, 2016 at 7:08 AM, nishidha panpaliya <
>          *[email protected]* <[email protected]>> wrote:
>          Hello,
>
>          After building Impala on ppc64le, I'm trying to run all the
>          tests of Impala. In the process, I'm getting an error while test data
>          creation.
>          Command ran -
>             ${IMPALA_HOME}/buildall.sh -testdata -format
>                         Output - Attached log (output.txt)
>
>          Also attached logs named
>          cluster_logs/data_loading/data-load-functional-exhaustive.log. And 
> hive.log.
>
>          I tried setting below parameters in hive-site.xml but of no use.
>             hive.exec.max.dynamic.partitions=100000;
>
>                            hive.exec.max.dynamic.partitions.pernode=100000;
>                            hive.exec.parallel=false
>
>          I'll be really thankful if you could provide me some help here.
>
>          Thanks in advance,
>          Nishidha
>
>          --
>          You received this message because you are subscribed to the
>          Google Groups "Impala Dev" group.
>          To unsubscribe from this group and stop receiving emails from
>          it, send an email to *[email protected]*
>          <[email protected]>.
>
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: LeaseExpiredException while test data creation on ppc64le

Reply via email to