Re: LeaseExpiredException while test data creation on ppc64le

Tim Armstrong Tue, 08 Mar 2016 09:27:03 -0800

Oh, are you running with a debug build or a release build by the way?

On Tue, Mar 8, 2016 at 8:58 AM, Tim Armstrong <[email protected]>
wrote:


> I don't think we've seen that crash before. It looks like it's
> dereferencing a pointer thatis causing the crash. Tracing back through the
> callstack, it looks like somehow the expression below is constructing a
> StringValue that is causing a segfault when dereferenced
> (0x000000ffff9b6aa0).
>
> inline Status
> HdfsParquetTableWriter::BaseColumnWriter::AppendRow(TupleRow* row) {
>   ++num_values_;
>   void* value = expr_ctx_->GetValue(row); <==
>
> I'm not sure why it would be returning an invalid pointer. It not a NULL
> pointer and looks possibly valid and 16-byte allgned. If you have a core
> dump it would be interesting to know if that pointer is pointing into
> invalid memory or if something else is going on.
>
> > Thanks a lot Tim! I did check some of the impalad*. Error,
> impalad*.info, and other logs in cluster_logs/data_loading. Couple of
> observations-
> > 1. Dependency on SSE3, error says exiting as hardware does not support
> SSE3.
> I think if you were able to compile this is ok. We have some inline
> assembly and SSE3 intrinsics, but you probably had to work around those
> already to build. You could fix this so that the check isn't done if
> running on PowerPC.
> > 2. One error says to increase num_of_threads_per_disk (something related
> to number of threads, not sure about exact variable name) while starting
> impalad
> Hmm, this is probably because its detection of local disks fails. This is
> expected to happen if running on a remote filesystem (e.g. a cloud
> filesystem like S3, or some specialised disk hardware like Isilon or DSSD).
> If it's happening with local disks, it's probably because it assumes it's
> running on linux with specific filesystem nodes for devices.
> > 3. A few log files say that bad_alloc
> I think this is the exception that gets thrown when malloc() fails (when
> called via the C++ new operator). I wonder if the system is low on memory?
> How much RAM do you have?
>
> I think how long it will take probably depends on what the end goal is: if
> it's just to get it running, probably a couple more weeks, maybe more if
> there are any particularly tricky bugs. If you want to do performance
> tuning, I feel like there's probably some more work there. I think we
> implicitly depend on certain properties on Intel hardware, e.g. recent
> Intel processors have reasonably fast unaligned loads and stores and in
> some cases wetake advantage of that, but I'm not sure if that's also true
> of the processors you're targeting.
>
> On Tue, Mar 8, 2016 at 8:26 AM, Nishidha Panpaliya <[email protected]>
> wrote:
>
>> Hi Tim,
>>
>> As you suggested, I disabled codegen and also generated core dump.
>>
>> Core dump has pointed HashUtil::MurmurHash2_64 being problematic. Please
>> see attached log file.
>> *(See attached file: hs_err_pid15697.log)*
>>
>> I tested this function individually in a small test app and it worked.
>> May be data given to it was simple enough for it to pass. But in case of
>> Impala, there is some issue with data/arguments passed to this function in
>> a particular case. Looks like this function is not called on machines where
>> SSE is supported, so on x86, you might not see this crash. Do you suspect
>> anything in this function or the functions calling this function? I'm still
>> debugging more into this.
>> If you have any clue, please point that to me so that I can try nail down
>> the issue on that direction.
>>
>> Thanks,
>> Nishidha
>>
>> [image: Inactive hide details for nishidha randad ---03/07/2016 10:35:39
>> PM---Thanks a lot Tim! I did check some of the impalad*. Error]nishidha
>> randad ---03/07/2016 10:35:39 PM---Thanks a lot Tim! I did check some of
>> the impalad*. Error, impalad*.info, and other logs in cluster_
>>
>> From: nishidha randad <[email protected]>
>> To: Tim Armstrong <[email protected]>
>> Cc: Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan
>> Jagadale/Austin/Contr/IBM@IBMUS, [email protected]
>> Date: 03/07/2016 10:35 PM
>>
>> Subject: Re: LeaseExpiredException while test data creation on ppc64le
>> ------------------------------
>>
>>
>>
>> Thanks a lot Tim! I did check some of the impalad*. Error, impalad*.info,
>> and other logs in cluster_logs/data_loading. Couple of observations-
>> 1. Dependency on SSE3, error says exiting as hardware does not support
>> SSE3.
>> 2. One error says to increase num_of_threads_per_disk (something related
>> to number of threads, not sure about exact variable name) while starting
>> impalad
>> 3. A few log files say that bad_alloc
>>
>> I'm analysing all these errors. I'll dig more into this tomorrow and
>> update you.
>> One more help I wanted from you is in predicting the amount of work I may
>> be left with and possible challenges ahead. It would be really great if you
>> could point that to me from the logs I had posted.
>>
>> Also, about LLVM 3.7 fixes you did, I was wondering if you have completed
>> upgradation, since you have also started encountering crashes.
>>
>> Thanks again!
>>
>> Nishidha
>>
>> On 7 Mar 2016 21:56, "Tim Armstrong" <*[email protected]*
>> <[email protected]>> wrote:
>>
>>    Hi Nishidha,
>>      I started working on our next release cycle towards the end of last
>>    week, so I've been looking at LLVM 3.7 and have made a bit of progress
>>    getting it working on intel. We're trying to get it done working so we 
>> have
>>    plenty of chance to test it.
>>
>>    RE the TTransportException error, that is often because of a crash.
>>    Usually to debug I would first look at the /tmp/impalad.ERROR and
>>    /tmp/impalad.INFO logs for the cause of the crash. The embedded JVM also
>>    generates hs_err_pid*.log files with a crash report that can sometimes be
>>    useful. If that doesn't reveal the cause, then I'd look to see if there is
>>    a core dump in the Impala directory (I normally run with "ulimit -c
>>    unlimited" set so that a crash will generate a core file).
>>
>>    I already fixed a couple of problems with codegen in LLVM 3.7,
>>    including one crash that was an assertion about struct sizes. I'll be
>>    posting the patch soon once I've done a bit more testing.
>>
>>    It might help to make progress is you disable LLVM codegen by default
>>    during data loading by setting the following environment variable:
>>
>>    export
>>    
>> START_CLUSTER_ARGS='--impalad_args=-default_query_options="disable_codegen=1"'
>>
>>    You can also start the test cluster with the same arguments or just
>>    set it in the set with "set disable_codegen=1).
>>
>>    ./bin/start-impala-cluster.py
>>    --impalad_args=-default_query_options="disable_codegen=1"
>>
>>    On Mon, Mar 7, 2016 at 5:13 AM, Nishidha Panpaliya <
>>    *[email protected]* <[email protected]>> wrote:
>>    Hi Tim,
>>
>>    Yes, I could fix this snappyError by building snappy-java for Power
>>    and adding the native library for power into existing
>>    snappy-java-1.0.4.1.jar used by hbase, hive, sentry and hadoop.
>>    The test data loading has been proceeded further and gave a new
>>    exception which I'm looking into and as below.
>>
>>    Data Loading from Impala failed with error: ImpalaBeeswaxException:
>>    INNER EXCEPTION: <class
>>    'thrift.transport.TTransport.TTransportException'>
>>    MESSAGE: None
>>
>>    Also, I've been able to start impala and try just one following query
>>    as given in
>>    *https://github.com/cloudera/Impala/wiki/How-to-build-Impala*
>>    <https://github.com/cloudera/Impala/wiki/How-to-build-Impala>-
>>    impala-shell.sh -q"SELECT version()"
>>
>>    And regarding patch of my work, I'm sorry for the delay. Although it
>>    does not need any CLA to be signed, but it is under discussion with our 
>> IBM
>>    legal team, just to ensure we are compliant with the policies. Hoping to
>>    update you on this soon. Could you tell me when are you going to start 
>> with
>>    this new release cycle?
>>
>>    Thanks,
>>    Nishidha
>>
>>    [image: Inactive hide details for Tim Armstrong ---03/05/2016
>>    03:14:29 AM---It also looks like it got far enough that you should have 
>> a]Tim
>>    Armstrong ---03/05/2016 03:14:29 AM---It also looks like it got far enough
>>    that you should have a bit of data loaded - have you been able
>>
>>    From: Tim Armstrong <*[email protected]*
>>    <[email protected]>>
>>    To: nishidha panpaliya <*[email protected]* <[email protected]>>
>>    Cc: Impala Dev <*[email protected]* <[email protected]>>,
>>    Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan
>>    Jagadale/Austin/Contr/IBM@IBMUS
>>    Date: 03/05/2016 03:14 AM
>>    Subject: Re: LeaseExpiredException while test data creation on ppc64le
>>    ------------------------------
>>
>>
>>
>>
>>    It also looks like it got far enough that you should have a bit of
>>    data loaded - have you been able to start impala and run queries on some 
>> of
>>    those tables?
>>
>>    We're starting a new release cycle so I'm actually about to focus on
>>    upgrading our version of LLVM to 3.7 and getting the Intel support 
>> working.
>>    I think we're going to be putting a bit of effort into reducing LLVM code
>>    generation time: it seems like LLVM 3.7 is slightly slower in some cases.
>>
>>    We should stay in sync, it would be good to make sure that any
>>    changes I make will work for your PowerPC work too. If you want to share
>>    any patches (even if you're not formally contributing them) it would be
>>    helpful for me to understand what you have already done on this path.
>>
>>    Cheers,
>>    Tim
>>
>>    On Fri, Mar 4, 2016 at 1:40 PM, Tim Armstrong <
>>    *[email protected]* <[email protected]>> wrote:
>>
>>          Hi Nishidha,
>>            It looks like Hive is maybe missing the native snappy
>>          library: I see this in the logs:
>>
>>          java.lang.Exception: org.xerial.snappy.SnappyError:
>>          [FAILED_TO_LOAD_NATIVE_LIBRARY] null
>>              at
>>          
>> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>>              at
>>          
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
>>          Caused by: org.xerial.snappy.SnappyError:
>>          [FAILED_TO_LOAD_NATIVE_LIBRARY] null
>>              at
>>          org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
>>              at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44)
>>              at
>>          org.apache.avro.file.SnappyCodec.compress(SnappyCodec.java:43)
>>              at
>>          
>> org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:361)
>>              at
>>          
>> org.apache.avro.file.DataFileWriter.writeBlock(DataFileWriter.java:394)
>>              at
>>          org.apache.avro.file.DataFileWriter.sync(DataFileWriter.java:413)
>>
>>
>>
>>          If you want to try making progress without Hive snappy support,
>>          I think you coudl disable some of the files formats by editing
>>          testdata/workloads/*/*.csv and removing some of the "snap" file 
>> formats.
>>          The impala test suite generates data in many different file formats 
>> with
>>          different compression settings.
>>
>>
>>          On Wed, Mar 2, 2016 at 7:08 AM, nishidha panpaliya <
>>          *[email protected]* <[email protected]>> wrote:
>>          Hello,
>>
>>          After building Impala on ppc64le, I'm trying to run all the
>>          tests of Impala. In the process, I'm getting an error while test 
>> data
>>          creation.
>>          Command ran -
>>             ${IMPALA_HOME}/buildall.sh -testdata -format
>>                         Output - Attached log (output.txt)
>>
>>          Also attached logs named
>>          cluster_logs/data_loading/data-load-functional-exhaustive.log. And 
>> hive.log.
>>
>>          I tried setting below parameters in hive-site.xml but of no use.
>>             hive.exec.max.dynamic.partitions=100000;
>>
>>                            hive.exec.max.dynamic.partitions.pernode=100000;
>>                            hive.exec.parallel=false
>>
>>          I'll be really thankful if you could provide me some help here.
>>
>>          Thanks in advance,
>>          Nishidha
>>
>>          --
>>          You received this message because you are subscribed to the
>>          Google Groups "Impala Dev" group.
>>          To unsubscribe from this group and stop receiving emails from
>>          it, send an email to *[email protected]*
>>          <[email protected]>.
>>
>>
>>
>>
>

Re: LeaseExpiredException while test data creation on ppc64le

Reply via email to