Oh, are you running with a debug build or a release build by the way?
On Tue, Mar 8, 2016 at 8:58 AM, Tim Armstrong <[email protected]>
wrote:
> I don't think we've seen that crash before. It looks like it's
> dereferencing a pointer thatis causing the crash. Tracing back through the
> callstack, it looks like somehow the expression below is constructing a
> StringValue that is causing a segfault when dereferenced
> (0x000000ffff9b6aa0).
>
> inline Status
> HdfsParquetTableWriter::BaseColumnWriter::AppendRow(TupleRow* row) {
> ++num_values_;
> void* value = expr_ctx_->GetValue(row); <==
>
> I'm not sure why it would be returning an invalid pointer. It not a NULL
> pointer and looks possibly valid and 16-byte allgned. If you have a core
> dump it would be interesting to know if that pointer is pointing into
> invalid memory or if something else is going on.
>
> > Thanks a lot Tim! I did check some of the impalad*. Error,
> impalad*.info, and other logs in cluster_logs/data_loading. Couple of
> observations-
> > 1. Dependency on SSE3, error says exiting as hardware does not support
> SSE3.
> I think if you were able to compile this is ok. We have some inline
> assembly and SSE3 intrinsics, but you probably had to work around those
> already to build. You could fix this so that the check isn't done if
> running on PowerPC.
> > 2. One error says to increase num_of_threads_per_disk (something related
> to number of threads, not sure about exact variable name) while starting
> impalad
> Hmm, this is probably because its detection of local disks fails. This is
> expected to happen if running on a remote filesystem (e.g. a cloud
> filesystem like S3, or some specialised disk hardware like Isilon or DSSD).
> If it's happening with local disks, it's probably because it assumes it's
> running on linux with specific filesystem nodes for devices.
> > 3. A few log files say that bad_alloc
> I think this is the exception that gets thrown when malloc() fails (when
> called via the C++ new operator). I wonder if the system is low on memory?
> How much RAM do you have?
>
> I think how long it will take probably depends on what the end goal is: if
> it's just to get it running, probably a couple more weeks, maybe more if
> there are any particularly tricky bugs. If you want to do performance
> tuning, I feel like there's probably some more work there. I think we
> implicitly depend on certain properties on Intel hardware, e.g. recent
> Intel processors have reasonably fast unaligned loads and stores and in
> some cases wetake advantage of that, but I'm not sure if that's also true
> of the processors you're targeting.
>
> On Tue, Mar 8, 2016 at 8:26 AM, Nishidha Panpaliya <[email protected]>
> wrote:
>
>> Hi Tim,
>>
>> As you suggested, I disabled codegen and also generated core dump.
>>
>> Core dump has pointed HashUtil::MurmurHash2_64 being problematic. Please
>> see attached log file.
>> *(See attached file: hs_err_pid15697.log)*
>>
>> I tested this function individually in a small test app and it worked.
>> May be data given to it was simple enough for it to pass. But in case of
>> Impala, there is some issue with data/arguments passed to this function in
>> a particular case. Looks like this function is not called on machines where
>> SSE is supported, so on x86, you might not see this crash. Do you suspect
>> anything in this function or the functions calling this function? I'm still
>> debugging more into this.
>> If you have any clue, please point that to me so that I can try nail down
>> the issue on that direction.
>>
>> Thanks,
>> Nishidha
>>
>> [image: Inactive hide details for nishidha randad ---03/07/2016 10:35:39
>> PM---Thanks a lot Tim! I did check some of the impalad*. Error]nishidha
>> randad ---03/07/2016 10:35:39 PM---Thanks a lot Tim! I did check some of
>> the impalad*. Error, impalad*.info, and other logs in cluster_
>>
>> From: nishidha randad <[email protected]>
>> To: Tim Armstrong <[email protected]>
>> Cc: Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan
>> Jagadale/Austin/Contr/IBM@IBMUS, [email protected]
>> Date: 03/07/2016 10:35 PM
>>
>> Subject: Re: LeaseExpiredException while test data creation on ppc64le
>> ------------------------------
>>
>>
>>
>> Thanks a lot Tim! I did check some of the impalad*. Error, impalad*.info,
>> and other logs in cluster_logs/data_loading. Couple of observations-
>> 1. Dependency on SSE3, error says exiting as hardware does not support
>> SSE3.
>> 2. One error says to increase num_of_threads_per_disk (something related
>> to number of threads, not sure about exact variable name) while starting
>> impalad
>> 3. A few log files say that bad_alloc
>>
>> I'm analysing all these errors. I'll dig more into this tomorrow and
>> update you.
>> One more help I wanted from you is in predicting the amount of work I may
>> be left with and possible challenges ahead. It would be really great if you
>> could point that to me from the logs I had posted.
>>
>> Also, about LLVM 3.7 fixes you did, I was wondering if you have completed
>> upgradation, since you have also started encountering crashes.
>>
>> Thanks again!
>>
>> Nishidha
>>
>> On 7 Mar 2016 21:56, "Tim Armstrong" <*[email protected]*
>> <[email protected]>> wrote:
>>
>> Hi Nishidha,
>> I started working on our next release cycle towards the end of last
>> week, so I've been looking at LLVM 3.7 and have made a bit of progress
>> getting it working on intel. We're trying to get it done working so we
>> have
>> plenty of chance to test it.
>>
>> RE the TTransportException error, that is often because of a crash.
>> Usually to debug I would first look at the /tmp/impalad.ERROR and
>> /tmp/impalad.INFO logs for the cause of the crash. The embedded JVM also
>> generates hs_err_pid*.log files with a crash report that can sometimes be
>> useful. If that doesn't reveal the cause, then I'd look to see if there is
>> a core dump in the Impala directory (I normally run with "ulimit -c
>> unlimited" set so that a crash will generate a core file).
>>
>> I already fixed a couple of problems with codegen in LLVM 3.7,
>> including one crash that was an assertion about struct sizes. I'll be
>> posting the patch soon once I've done a bit more testing.
>>
>> It might help to make progress is you disable LLVM codegen by default
>> during data loading by setting the following environment variable:
>>
>> export
>>
>> START_CLUSTER_ARGS='--impalad_args=-default_query_options="disable_codegen=1"'
>>
>> You can also start the test cluster with the same arguments or just
>> set it in the set with "set disable_codegen=1).
>>
>> ./bin/start-impala-cluster.py
>> --impalad_args=-default_query_options="disable_codegen=1"
>>
>> On Mon, Mar 7, 2016 at 5:13 AM, Nishidha Panpaliya <
>> *[email protected]* <[email protected]>> wrote:
>> Hi Tim,
>>
>> Yes, I could fix this snappyError by building snappy-java for Power
>> and adding the native library for power into existing
>> snappy-java-1.0.4.1.jar used by hbase, hive, sentry and hadoop.
>> The test data loading has been proceeded further and gave a new
>> exception which I'm looking into and as below.
>>
>> Data Loading from Impala failed with error: ImpalaBeeswaxException:
>> INNER EXCEPTION: <class
>> 'thrift.transport.TTransport.TTransportException'>
>> MESSAGE: None
>>
>> Also, I've been able to start impala and try just one following query
>> as given in
>> *https://github.com/cloudera/Impala/wiki/How-to-build-Impala*
>> <https://github.com/cloudera/Impala/wiki/How-to-build-Impala>-
>> impala-shell.sh -q"SELECT version()"
>>
>> And regarding patch of my work, I'm sorry for the delay. Although it
>> does not need any CLA to be signed, but it is under discussion with our
>> IBM
>> legal team, just to ensure we are compliant with the policies. Hoping to
>> update you on this soon. Could you tell me when are you going to start
>> with
>> this new release cycle?
>>
>> Thanks,
>> Nishidha
>>
>> [image: Inactive hide details for Tim Armstrong ---03/05/2016
>> 03:14:29 AM---It also looks like it got far enough that you should have
>> a]Tim
>> Armstrong ---03/05/2016 03:14:29 AM---It also looks like it got far enough
>> that you should have a bit of data loaded - have you been able
>>
>> From: Tim Armstrong <*[email protected]*
>> <[email protected]>>
>> To: nishidha panpaliya <*[email protected]* <[email protected]>>
>> Cc: Impala Dev <*[email protected]* <[email protected]>>,
>> Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan
>> Jagadale/Austin/Contr/IBM@IBMUS
>> Date: 03/05/2016 03:14 AM
>> Subject: Re: LeaseExpiredException while test data creation on ppc64le
>> ------------------------------
>>
>>
>>
>>
>> It also looks like it got far enough that you should have a bit of
>> data loaded - have you been able to start impala and run queries on some
>> of
>> those tables?
>>
>> We're starting a new release cycle so I'm actually about to focus on
>> upgrading our version of LLVM to 3.7 and getting the Intel support
>> working.
>> I think we're going to be putting a bit of effort into reducing LLVM code
>> generation time: it seems like LLVM 3.7 is slightly slower in some cases.
>>
>> We should stay in sync, it would be good to make sure that any
>> changes I make will work for your PowerPC work too. If you want to share
>> any patches (even if you're not formally contributing them) it would be
>> helpful for me to understand what you have already done on this path.
>>
>> Cheers,
>> Tim
>>
>> On Fri, Mar 4, 2016 at 1:40 PM, Tim Armstrong <
>> *[email protected]* <[email protected]>> wrote:
>>
>> Hi Nishidha,
>> It looks like Hive is maybe missing the native snappy
>> library: I see this in the logs:
>>
>> java.lang.Exception: org.xerial.snappy.SnappyError:
>> [FAILED_TO_LOAD_NATIVE_LIBRARY] null
>> at
>>
>> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>> at
>>
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
>> Caused by: org.xerial.snappy.SnappyError:
>> [FAILED_TO_LOAD_NATIVE_LIBRARY] null
>> at
>> org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
>> at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44)
>> at
>> org.apache.avro.file.SnappyCodec.compress(SnappyCodec.java:43)
>> at
>>
>> org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:361)
>> at
>>
>> org.apache.avro.file.DataFileWriter.writeBlock(DataFileWriter.java:394)
>> at
>> org.apache.avro.file.DataFileWriter.sync(DataFileWriter.java:413)
>>
>>
>>
>> If you want to try making progress without Hive snappy support,
>> I think you coudl disable some of the files formats by editing
>> testdata/workloads/*/*.csv and removing some of the "snap" file
>> formats.
>> The impala test suite generates data in many different file formats
>> with
>> different compression settings.
>>
>>
>> On Wed, Mar 2, 2016 at 7:08 AM, nishidha panpaliya <
>> *[email protected]* <[email protected]>> wrote:
>> Hello,
>>
>> After building Impala on ppc64le, I'm trying to run all the
>> tests of Impala. In the process, I'm getting an error while test
>> data
>> creation.
>> Command ran -
>> ${IMPALA_HOME}/buildall.sh -testdata -format
>> Output - Attached log (output.txt)
>>
>> Also attached logs named
>> cluster_logs/data_loading/data-load-functional-exhaustive.log. And
>> hive.log.
>>
>> I tried setting below parameters in hive-site.xml but of no use.
>> hive.exec.max.dynamic.partitions=100000;
>>
>> hive.exec.max.dynamic.partitions.pernode=100000;
>> hive.exec.parallel=false
>>
>> I'll be really thankful if you could provide me some help here.
>>
>> Thanks in advance,
>> Nishidha
>>
>> --
>> You received this message because you are subscribed to the
>> Google Groups "Impala Dev" group.
>> To unsubscribe from this group and stop receiving emails from
>> it, send an email to *[email protected]*
>> <[email protected]>.
>>
>>
>>
>>
>