Re: LeaseExpiredException while test data creation on ppc64le

Tim Armstrong Tue, 08 Mar 2016 23:18:07 -0800

Hi Nishidha,
  I was able to get LLVM 3.7 functional on x86. I ran into a few issues
with the code generation that I suspect you will probably also run into, so
it might be worth having a look.


I have an initial version of the patch here:
http://gerrit.cloudera.org/#/c/2486/ . Likely it will go through some
iteration as we review it but I think it shouldn't change too much.

- Tim

On Tue, Mar 8, 2016 at 10:43 AM, Tim Armstrong <[email protected]>
wrote:

> We do use MurmurHash in many cases in Impala, e.g. when repartitioning
> data for a large join, so it definitely works fine.
>
> You could try running the hash-benchmark binary that is build with Impala,
> that would exercise the hash function directly.
>
> On Tue, Mar 8, 2016 at 10:20 AM, Todd Lipcon <[email protected]> wrote:
>
>> On Tue, Mar 8, 2016 at 8:26 AM, Nishidha Panpaliya <[email protected]>
>> wrote:
>>
>>>
>>> I tested this function individually in a small test app and it worked.
>>> May be data given to it was simple enough for it to pass. But in case of
>>> Impala, there is some issue with data/arguments passed to this function in
>>> a particular case. Looks like this function is not called on machines where
>>> SSE is supported, so on x86, you might not see this crash. Do you suspect
>>> anything in this function or the functions calling this function? I'm still
>>> debugging more into this.
>>> If you have any clue, please point that to me so that I can try nail
>>> down the issue on that direction.
>>>
>>
>> Kudu uses the same function on x86 and it works fine. You might try
>> changing the *reinterpret_cast<uint64_t> stuff to use the UNALIGNED_LOAD64
>> macro from gutil/port.h and see if it helps, although reading the comments
>> in that file, it says that modern PPC chips can do unaligned loads fine.
>>
>> -Todd
>>
>>
>>>
>>> Thanks,
>>> Nishidha
>>>
>>> [image: Inactive hide details for nishidha randad ---03/07/2016 10:35:39
>>> PM---Thanks a lot Tim! I did check some of the impalad*. Error]nishidha
>>> randad ---03/07/2016 10:35:39 PM---Thanks a lot Tim! I did check some of
>>> the impalad*. Error, impalad*.info, and other logs in cluster_
>>>
>>> From: nishidha randad <[email protected]>
>>> To: Tim Armstrong <[email protected]>
>>> Cc: Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan
>>> Jagadale/Austin/Contr/IBM@IBMUS, [email protected]
>>> Date: 03/07/2016 10:35 PM
>>> Subject: Re: LeaseExpiredException while test data creation on ppc64le
>>> ------------------------------
>>>
>>>
>>>
>>> Thanks a lot Tim! I did check some of the impalad*. Error,
>>> impalad*.info, and other logs in cluster_logs/data_loading. Couple of
>>> observations-
>>> 1. Dependency on SSE3, error says exiting as hardware does not support
>>> SSE3.
>>> 2. One error says to increase num_of_threads_per_disk (something related
>>> to number of threads, not sure about exact variable name) while starting
>>> impalad
>>> 3. A few log files say that bad_alloc
>>>
>>> I'm analysing all these errors. I'll dig more into this tomorrow and
>>> update you.
>>> One more help I wanted from you is in predicting the amount of work I
>>> may be left with and possible challenges ahead. It would be really great if
>>> you could point that to me from the logs I had posted.
>>>
>>> Also, about LLVM 3.7 fixes you did, I was wondering if you have
>>> completed upgradation, since you have also started encountering crashes.
>>>
>>> Thanks again!
>>>
>>> Nishidha
>>>
>>> On 7 Mar 2016 21:56, "Tim Armstrong" <*[email protected]*
>>> <[email protected]>> wrote:
>>>
>>>    Hi Nishidha,
>>>      I started working on our next release cycle towards the end of
>>>    last week, so I've been looking at LLVM 3.7 and have made a bit of 
>>> progress
>>>    getting it working on intel. We're trying to get it done working so we 
>>> have
>>>    plenty of chance to test it.
>>>
>>>    RE the TTransportException error, that is often because of a crash.
>>>    Usually to debug I would first look at the /tmp/impalad.ERROR and
>>>    /tmp/impalad.INFO logs for the cause of the crash. The embedded JVM also
>>>    generates hs_err_pid*.log files with a crash report that can sometimes be
>>>    useful. If that doesn't reveal the cause, then I'd look to see if there 
>>> is
>>>    a core dump in the Impala directory (I normally run with "ulimit -c
>>>    unlimited" set so that a crash will generate a core file).
>>>
>>>    I already fixed a couple of problems with codegen in LLVM 3.7,
>>>    including one crash that was an assertion about struct sizes. I'll be
>>>    posting the patch soon once I've done a bit more testing.
>>>
>>>    It might help to make progress is you disable LLVM codegen by
>>>    default during data loading by setting the following environment 
>>> variable:
>>>
>>>    export
>>>    
>>> START_CLUSTER_ARGS='--impalad_args=-default_query_options="disable_codegen=1"'
>>>
>>>    You can also start the test cluster with the same arguments or just
>>>    set it in the set with "set disable_codegen=1).
>>>
>>>    ./bin/start-impala-cluster.py
>>>    --impalad_args=-default_query_options="disable_codegen=1"
>>>
>>>    On Mon, Mar 7, 2016 at 5:13 AM, Nishidha Panpaliya <
>>>    *[email protected]* <[email protected]>> wrote:
>>>    Hi Tim,
>>>
>>>    Yes, I could fix this snappyError by building snappy-java for Power
>>>    and adding the native library for power into existing
>>>    snappy-java-1.0.4.1.jar used by hbase, hive, sentry and hadoop.
>>>    The test data loading has been proceeded further and gave a new
>>>    exception which I'm looking into and as below.
>>>
>>>    Data Loading from Impala failed with error: ImpalaBeeswaxException:
>>>    INNER EXCEPTION: <class
>>>    'thrift.transport.TTransport.TTransportException'>
>>>    MESSAGE: None
>>>
>>>    Also, I've been able to start impala and try just one following
>>>    query as given in
>>>    *https://github.com/cloudera/Impala/wiki/How-to-build-Impala*
>>>    <https://github.com/cloudera/Impala/wiki/How-to-build-Impala>-
>>>    impala-shell.sh -q"SELECT version()"
>>>
>>>    And regarding patch of my work, I'm sorry for the delay. Although it
>>>    does not need any CLA to be signed, but it is under discussion with our 
>>> IBM
>>>    legal team, just to ensure we are compliant with the policies. Hoping to
>>>    update you on this soon. Could you tell me when are you going to start 
>>> with
>>>    this new release cycle?
>>>
>>>    Thanks,
>>>    Nishidha
>>>
>>>    [image: Inactive hide details for Tim Armstrong ---03/05/2016
>>>    03:14:29 AM---It also looks like it got far enough that you should have 
>>> a]Tim
>>>    Armstrong ---03/05/2016 03:14:29 AM---It also looks like it got far 
>>> enough
>>>    that you should have a bit of data loaded - have you been able
>>>
>>>    From: Tim Armstrong <*[email protected]*
>>>    <[email protected]>>
>>>    To: nishidha panpaliya <*[email protected]* <[email protected]>
>>>    >
>>>    Cc: Impala Dev <*[email protected]* <[email protected]>>,
>>>    Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan
>>>    Jagadale/Austin/Contr/IBM@IBMUS
>>>    Date: 03/05/2016 03:14 AM
>>>    Subject: Re: LeaseExpiredException while test data creation on
>>>    ppc64le
>>>    ------------------------------
>>>
>>>
>>>
>>>
>>>    It also looks like it got far enough that you should have a bit of
>>>    data loaded - have you been able to start impala and run queries on some 
>>> of
>>>    those tables?
>>>
>>>    We're starting a new release cycle so I'm actually about to focus on
>>>    upgrading our version of LLVM to 3.7 and getting the Intel support 
>>> working.
>>>    I think we're going to be putting a bit of effort into reducing LLVM code
>>>    generation time: it seems like LLVM 3.7 is slightly slower in some cases.
>>>
>>>    We should stay in sync, it would be good to make sure that any
>>>    changes I make will work for your PowerPC work too. If you want to share
>>>    any patches (even if you're not formally contributing them) it would be
>>>    helpful for me to understand what you have already done on this path.
>>>
>>>    Cheers,
>>>    Tim
>>>
>>>    On Fri, Mar 4, 2016 at 1:40 PM, Tim Armstrong <
>>>    *[email protected]* <[email protected]>> wrote:
>>>
>>>          Hi Nishidha,
>>>            It looks like Hive is maybe missing the native snappy
>>>          library: I see this in the logs:
>>>
>>>          java.lang.Exception: org.xerial.snappy.SnappyError:
>>>          [FAILED_TO_LOAD_NATIVE_LIBRARY] null
>>>              at
>>>          
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>>>              at
>>>          
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
>>>          Caused by: org.xerial.snappy.SnappyError:
>>>          [FAILED_TO_LOAD_NATIVE_LIBRARY] null
>>>              at
>>>          org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
>>>              at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44)
>>>              at
>>>          org.apache.avro.file.SnappyCodec.compress(SnappyCodec.java:43)
>>>              at
>>>          
>>> org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:361)
>>>              at
>>>          
>>> org.apache.avro.file.DataFileWriter.writeBlock(DataFileWriter.java:394)
>>>              at
>>>          org.apache.avro.file.DataFileWriter.sync(DataFileWriter.java:413)
>>>
>>>
>>>
>>>          If you want to try making progress without Hive snappy
>>>          support, I think you coudl disable some of the files formats by 
>>> editing
>>>          testdata/workloads/*/*.csv and removing some of the "snap" file 
>>> formats.
>>>          The impala test suite generates data in many different file 
>>> formats with
>>>          different compression settings.
>>>
>>>
>>>          On Wed, Mar 2, 2016 at 7:08 AM, nishidha panpaliya <
>>>          *[email protected]* <[email protected]>> wrote:
>>>          Hello,
>>>
>>>          After building Impala on ppc64le, I'm trying to run all the
>>>          tests of Impala. In the process, I'm getting an error while test 
>>> data
>>>          creation.
>>>          Command ran -
>>>             ${IMPALA_HOME}/buildall.sh -testdata -format
>>>                         Output - Attached log (output.txt)
>>>
>>>          Also attached logs named
>>>          cluster_logs/data_loading/data-load-functional-exhaustive.log. And 
>>> hive.log.
>>>
>>>          I tried setting below parameters in hive-site.xml but of no
>>>          use.
>>>             hive.exec.max.dynamic.partitions=100000;
>>>
>>>                            hive.exec.max.dynamic.partitions.pernode=100000;
>>>                            hive.exec.parallel=false
>>>
>>>          I'll be really thankful if you could provide me some help
>>>          here.
>>>
>>>          Thanks in advance,
>>>          Nishidha
>>>
>>>          --
>>>          You received this message because you are subscribed to the
>>>          Google Groups "Impala Dev" group.
>>>          To unsubscribe from this group and stop receiving emails from
>>>          it, send an email to *[email protected]*
>>>          <[email protected]>.
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>

Re: LeaseExpiredException while test data creation on ppc64le

Reply via email to