We do use MurmurHash in many cases in Impala, e.g. when repartitioning data for a large join, so it definitely works fine.
You could try running the hash-benchmark binary that is build with Impala, that would exercise the hash function directly. On Tue, Mar 8, 2016 at 10:20 AM, Todd Lipcon <[email protected]> wrote: > On Tue, Mar 8, 2016 at 8:26 AM, Nishidha Panpaliya <[email protected]> > wrote: > >> >> I tested this function individually in a small test app and it worked. >> May be data given to it was simple enough for it to pass. But in case of >> Impala, there is some issue with data/arguments passed to this function in >> a particular case. Looks like this function is not called on machines where >> SSE is supported, so on x86, you might not see this crash. Do you suspect >> anything in this function or the functions calling this function? I'm still >> debugging more into this. >> If you have any clue, please point that to me so that I can try nail down >> the issue on that direction. >> > > Kudu uses the same function on x86 and it works fine. You might try > changing the *reinterpret_cast<uint64_t> stuff to use the UNALIGNED_LOAD64 > macro from gutil/port.h and see if it helps, although reading the comments > in that file, it says that modern PPC chips can do unaligned loads fine. > > -Todd > > >> >> Thanks, >> Nishidha >> >> [image: Inactive hide details for nishidha randad ---03/07/2016 10:35:39 >> PM---Thanks a lot Tim! I did check some of the impalad*. Error]nishidha >> randad ---03/07/2016 10:35:39 PM---Thanks a lot Tim! I did check some of >> the impalad*. Error, impalad*.info, and other logs in cluster_ >> >> From: nishidha randad <[email protected]> >> To: Tim Armstrong <[email protected]> >> Cc: Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan >> Jagadale/Austin/Contr/IBM@IBMUS, [email protected] >> Date: 03/07/2016 10:35 PM >> Subject: Re: LeaseExpiredException while test data creation on ppc64le >> ------------------------------ >> >> >> >> Thanks a lot Tim! I did check some of the impalad*. Error, impalad*.info, >> and other logs in cluster_logs/data_loading. Couple of observations- >> 1. Dependency on SSE3, error says exiting as hardware does not support >> SSE3. >> 2. One error says to increase num_of_threads_per_disk (something related >> to number of threads, not sure about exact variable name) while starting >> impalad >> 3. A few log files say that bad_alloc >> >> I'm analysing all these errors. I'll dig more into this tomorrow and >> update you. >> One more help I wanted from you is in predicting the amount of work I may >> be left with and possible challenges ahead. It would be really great if you >> could point that to me from the logs I had posted. >> >> Also, about LLVM 3.7 fixes you did, I was wondering if you have completed >> upgradation, since you have also started encountering crashes. >> >> Thanks again! >> >> Nishidha >> >> On 7 Mar 2016 21:56, "Tim Armstrong" <*[email protected]* >> <[email protected]>> wrote: >> >> Hi Nishidha, >> I started working on our next release cycle towards the end of last >> week, so I've been looking at LLVM 3.7 and have made a bit of progress >> getting it working on intel. We're trying to get it done working so we >> have >> plenty of chance to test it. >> >> RE the TTransportException error, that is often because of a crash. >> Usually to debug I would first look at the /tmp/impalad.ERROR and >> /tmp/impalad.INFO logs for the cause of the crash. The embedded JVM also >> generates hs_err_pid*.log files with a crash report that can sometimes be >> useful. If that doesn't reveal the cause, then I'd look to see if there is >> a core dump in the Impala directory (I normally run with "ulimit -c >> unlimited" set so that a crash will generate a core file). >> >> I already fixed a couple of problems with codegen in LLVM 3.7, >> including one crash that was an assertion about struct sizes. I'll be >> posting the patch soon once I've done a bit more testing. >> >> It might help to make progress is you disable LLVM codegen by default >> during data loading by setting the following environment variable: >> >> export >> >> START_CLUSTER_ARGS='--impalad_args=-default_query_options="disable_codegen=1"' >> >> You can also start the test cluster with the same arguments or just >> set it in the set with "set disable_codegen=1). >> >> ./bin/start-impala-cluster.py >> --impalad_args=-default_query_options="disable_codegen=1" >> >> On Mon, Mar 7, 2016 at 5:13 AM, Nishidha Panpaliya < >> *[email protected]* <[email protected]>> wrote: >> Hi Tim, >> >> Yes, I could fix this snappyError by building snappy-java for Power >> and adding the native library for power into existing >> snappy-java-1.0.4.1.jar used by hbase, hive, sentry and hadoop. >> The test data loading has been proceeded further and gave a new >> exception which I'm looking into and as below. >> >> Data Loading from Impala failed with error: ImpalaBeeswaxException: >> INNER EXCEPTION: <class >> 'thrift.transport.TTransport.TTransportException'> >> MESSAGE: None >> >> Also, I've been able to start impala and try just one following query >> as given in >> *https://github.com/cloudera/Impala/wiki/How-to-build-Impala* >> <https://github.com/cloudera/Impala/wiki/How-to-build-Impala>- >> impala-shell.sh -q"SELECT version()" >> >> And regarding patch of my work, I'm sorry for the delay. Although it >> does not need any CLA to be signed, but it is under discussion with our >> IBM >> legal team, just to ensure we are compliant with the policies. Hoping to >> update you on this soon. Could you tell me when are you going to start >> with >> this new release cycle? >> >> Thanks, >> Nishidha >> >> [image: Inactive hide details for Tim Armstrong ---03/05/2016 >> 03:14:29 AM---It also looks like it got far enough that you should have >> a]Tim >> Armstrong ---03/05/2016 03:14:29 AM---It also looks like it got far enough >> that you should have a bit of data loaded - have you been able >> >> From: Tim Armstrong <*[email protected]* >> <[email protected]>> >> To: nishidha panpaliya <*[email protected]* <[email protected]>> >> Cc: Impala Dev <*[email protected]* <[email protected]>>, >> Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan >> Jagadale/Austin/Contr/IBM@IBMUS >> Date: 03/05/2016 03:14 AM >> Subject: Re: LeaseExpiredException while test data creation on ppc64le >> ------------------------------ >> >> >> >> >> It also looks like it got far enough that you should have a bit of >> data loaded - have you been able to start impala and run queries on some >> of >> those tables? >> >> We're starting a new release cycle so I'm actually about to focus on >> upgrading our version of LLVM to 3.7 and getting the Intel support >> working. >> I think we're going to be putting a bit of effort into reducing LLVM code >> generation time: it seems like LLVM 3.7 is slightly slower in some cases. >> >> We should stay in sync, it would be good to make sure that any >> changes I make will work for your PowerPC work too. If you want to share >> any patches (even if you're not formally contributing them) it would be >> helpful for me to understand what you have already done on this path. >> >> Cheers, >> Tim >> >> On Fri, Mar 4, 2016 at 1:40 PM, Tim Armstrong < >> *[email protected]* <[email protected]>> wrote: >> >> Hi Nishidha, >> It looks like Hive is maybe missing the native snappy >> library: I see this in the logs: >> >> java.lang.Exception: org.xerial.snappy.SnappyError: >> [FAILED_TO_LOAD_NATIVE_LIBRARY] null >> at >> >> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) >> at >> >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) >> Caused by: org.xerial.snappy.SnappyError: >> [FAILED_TO_LOAD_NATIVE_LIBRARY] null >> at >> org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229) >> at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44) >> at >> org.apache.avro.file.SnappyCodec.compress(SnappyCodec.java:43) >> at >> >> org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:361) >> at >> >> org.apache.avro.file.DataFileWriter.writeBlock(DataFileWriter.java:394) >> at >> org.apache.avro.file.DataFileWriter.sync(DataFileWriter.java:413) >> >> >> >> If you want to try making progress without Hive snappy support, >> I think you coudl disable some of the files formats by editing >> testdata/workloads/*/*.csv and removing some of the "snap" file >> formats. >> The impala test suite generates data in many different file formats >> with >> different compression settings. >> >> >> On Wed, Mar 2, 2016 at 7:08 AM, nishidha panpaliya < >> *[email protected]* <[email protected]>> wrote: >> Hello, >> >> After building Impala on ppc64le, I'm trying to run all the >> tests of Impala. In the process, I'm getting an error while test >> data >> creation. >> Command ran - >> ${IMPALA_HOME}/buildall.sh -testdata -format >> Output - Attached log (output.txt) >> >> Also attached logs named >> cluster_logs/data_loading/data-load-functional-exhaustive.log. And >> hive.log. >> >> I tried setting below parameters in hive-site.xml but of no use. >> hive.exec.max.dynamic.partitions=100000; >> >> hive.exec.max.dynamic.partitions.pernode=100000; >> hive.exec.parallel=false >> >> I'll be really thankful if you could provide me some help here. >> >> Thanks in advance, >> Nishidha >> >> -- >> You received this message because you are subscribed to the >> Google Groups "Impala Dev" group. >> To unsubscribe from this group and stop receiving emails from >> it, send an email to *[email protected]* >> <[email protected]>. >> >> >> >> > > > -- > Todd Lipcon > Software Engineer, Cloudera >
