RE: Answers to recent questions on Hive on Spark

Mich Talebzadeh Mon, 30 Nov 2015 09:36:12 -0800

Thanks Xuefi.


As I understand from your below statement

 

…… Thus, you only need to build the spark-assembly.jar w/o HIve and put it in 
Hive's /lib directory….

 

 

So this is the procedure I followd:

 

1.      Used spark 1.5.2 pre-built with Hadoop 2.6 as $SPARK_HOME

2.      Copied over /usr/lib/spark_1.5.2_build/lib/ 
spark-assembly-1.5.2-hadoop2.4.0.jar --> $HIVE_HOME/lib

3.      Started spark master

4.      Started spark slave

5.      Logged in to hive in debug mode

6.    Ran the following settings

a.  set spark.home=/usr/lib/spark;

b.  set hive.execution.engine=spark;

c.  set spark.master=spark:// 50.140.197.217:7077;  --(actually 50.140.197.217 
loops back to localhost 127.0.0.1)

d.  set spark.eventLog.enabled=true;

e.  set spark.eventLog.dir=/usr/lib/spark/logs;

f.  set spark.executor.memory=512m;

g.  set spark.serializer=org.apache.spark.serializer.KryoSerializer;

 

this the log

 

15/11/28 10:00:04 [main]: INFO spark.SparkTask:   set 
mapreduce.job.reduces=<number>

15/11/28 10:00:04 [main]: INFO session.SparkSessionManagerImpl: Setting up the 
session manager.

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load RPC property 
from hive configuration (hive.spark.client.connect.timeout -> 1000).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load spark 
property from hive configuration (spark.serializer -> 
org.apache.spark.serializer.KryoSerializer).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load spark 
property from hive configuration (spark.eventLog.enabled -> true).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load spark 
property from hive configuration (spark.eventLog.dir -> /usr/lib/spark/logs).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load RPC property 
from hive configuration (hive.spark.client.rpc.threads -> 8).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load RPC property 
from hive configuration (hive.spark.client.secret.bits -> 256).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load spark 
property from hive configuration (spark.home -> /usr/lib/spark).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load RPC property 
from hive configuration (hive.spark.client.rpc.max.size -> 52428800).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load spark 
property from hive configuration (spark.master -> spark://50.140.197.217:1077).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load spark 
property from hive configuration (spark.executor.memory -> 512m).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load RPC property 
from hive configuration (hive.spark.client.server.connect.timeout -> 90000).

15/11/28 10:00:04 [main]: DEBUG logging.InternalLoggerFactory: Using SLF4J as 
the default logging framework

15/11/28 10:00:04 [main]: DEBUG channel.MultithreadEventLoopGroup: 
-Dio.netty.eventLoopThreads: 24

15/11/28 10:00:04 [main]: DEBUG internal.PlatformDependent0: 
java.nio.Buffer.address: available

15/11/28 10:00:04 [main]: DEBUG internal.PlatformDependent0: 
sun.misc.Unsafe.theUnsafe: available

15/11/28 10:00:04 [main]: DEBUG internal.PlatformDependent0: 
sun.misc.Unsafe.copyMemory: available

15/11/28 10:00:04 [main]: DEBUG internal.PlatformDependent0: 
java.nio.Bits.unaligned: true

15/11/28 10:00:04 [main]: DEBUG internal.PlatformDependent: Java version: 7

15/11/28 10:00:04 [main]: DEBUG internal.PlatformDependent: 
-Dio.netty.noUnsafe: false

15/11/28 10:00:04 [main]: DEBUG internal.PlatformDependent: sun.misc.Unsafe: 
available

15/11/28 10:00:04 [main]: DEBUG internal.PlatformDependent: 
-Dio.netty.noJavassist: false

15/11/28 10:00:04 [main]: DEBUG internal.PlatformDependent: Javassist: 
unavailable

15/11/28 10:00:04 [main]: DEBUG internal.PlatformDependent: You don't have 
Javassist in your class path or you don't have enough permission to load 
dynamically generated classes.  Please check the configuration for better 
performance.

15/11/28 10:00:04 [main]: DEBUG internal.PlatformDependent: -Dio.netty.tmpdir: 
/tmp (java.io.tmpdir)

15/11/28 10:00:04 [main]: DEBUG internal.PlatformDependent: -Dio.netty.bitMode: 
64 (sun.arch.data.model)

15/11/28 10:00:04 [main]: DEBUG internal.PlatformDependent: 
-Dio.netty.noPreferDirect: false

15/11/28 10:00:04 [main]: DEBUG nio.NioEventLoop: 
-Dio.netty.noKeySetOptimization: false

15/11/28 10:00:04 [main]: DEBUG nio.NioEventLoop: 
-Dio.netty.selectorAutoRebuildThreshold: 512

15/11/28 10:00:04 [main]: DEBUG internal.ThreadLocalRandom: 
-Dio.netty.initialSeedUniquifier: 0x615383bf736b2c17 (took 0 ms)

15/11/28 10:00:04 [main]: DEBUG buffer.ByteBufUtil: -Dio.netty.allocator.type: 
unpooled

15/11/28 10:00:04 [main]: DEBUG buffer.ByteBufUtil: 
-Dio.netty.threadLocalDirectBufferSize: 65536

15/11/28 10:00:04 [main]: DEBUG util.NetUtil: Loopback interface: lo (lo, 
127.0.0.1)

15/11/28 10:00:04 [main]: DEBUG util.NetUtil: /proc/sys/net/core/somaxconn: 128

15/11/28 10:00:04 [main]: WARN rpc.RpcConfiguration: Your hostname, rhes564, 
resolves to a loopback address; using 192.168.0.9  instead (on interface eth1)

15/11/28 10:00:04 [main]: WARN rpc.RpcConfiguration: Set 
'hive.spark.client.server.address' if you need to bind to another address.

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load RPC property 
from hive configuration (hive.spark.client.connect.timeout -> 1000).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load spark 
property from hive configuration (spark.serializer -> 
org.apache.spark.serializer.KryoSerializer).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load spark 
property from hive configuration (spark.eventLog.enabled -> true).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load spark 
property from hive configuration (spark.eventLog.dir -> /usr/lib/spark/logs).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load RPC property 
from hive configuration (hive.spark.client.rpc.threads -> 8).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load RPC property 
from hive configuration (hive.spark.client.secret.bits -> 256).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load spark 
property from hive configuration (spark.home -> /usr/lib/spark).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load RPC property 
from hive configuration (hive.spark.client.rpc.max.size -> 52428800).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load spark 
property from hive configuration (spark.master -> spark://50.140.197.217:1077).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load spark 
property from hive configuration (spark.executor.memory -> 512m).

15/11/28 10:00:04 [main]: INFO spark.HiveSparkClientFactory: load RPC property 
from hive configuration (hive.spark.client.server.connect.timeout -> 90000).

15/11/28 10:00:04 [main]: INFO client.SparkClientImpl: Running client driver 
with argv: /usr/lib/spark/bin/spark-submit --properties-file 
/tmp/spark-submit.8364457496527994996.properties --class 
org.apache.hive.spark.client.RemoteDriver /usr/lib/hive/lib/hive-exec-1.2.1.jar 
--remote-host 192.168.0.9 --remote-port 38939 --conf 
hive.spark.client.connect.timeout=1000 --conf 
hive.spark.client.server.connect.timeout=90000 --conf 
hive.spark.client.channel.log.level=null --conf 
hive.spark.client.rpc.max.size=52428800 --conf hive.spark.client.rpc.threads=8 
--conf hive.spark.client.secret.bits=256

Failed to execute spark task, with exception 
'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark 
client.)'

15/11/28 10:00:04 [main]: ERROR spark.SparkTask: Failed to execute spark task, 
with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to 
create spark client.)'

 

These two lines 

 

15/11/28 10:00:04 [main]: WARN rpc.RpcConfiguration: Your hostname, rhes564, 
resolves to a loopback address; using 192.168.0.9  instead (on interface eth1)

15/11/28 10:00:04 [main]: WARN rpc.RpcConfiguration: Set 
'hive.spark.client.server.address' if you need to bind to another address.

 

 

Ok the big question is where I can specify explicit value for 
hive.spark.client.server.address that works. It does not work in hive-site.xml

 

Starting Hive Metastore Server

2015-11-28 10:30:15,372 WARN  [main] conf.HiveConf: HiveConf of name 
hive.spark.client.server.address does not exist

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

 

From: Xuefu Zhang [mailto:xzh...@cloudera.com] 
Sent: 28 November 2015 04:35
To: u...@hive.apache.org
Cc: dev@hive.apache.org
Subject: Re: Answers to recent questions on Hive on Spark

 

Okay. I think I know what problem you have now. To run Hive on Spark, 
spark-assembly.jar is needed and it's also recommended that you have a spark 
installation (identified by spark.home) on the same host where HS2 is running. 
You only need spark-assembly.jar in HS2's /lib directory. Other than those, 
Hive on Spark doesn't have any other dependency at service level. On the job 
level, Hive on Spark jobs of course run on a spark cluster, which could be 
standalone, yarn-cluster, etc. However, how you get the binaries for your spark 
cluster and how you start them is completely independent of Hive.

Thus, you only need to build the spark-assembly.jar w/o HIve and put it in 
Hive's /lib directory. The one in the existing spark build may contain Hive 
classes and that's why you need to build your own. Your spark installation can 
still have a jar that's different from what you build for Hive on Spark. Your 
spark.home can still point to your existing spark installation. In fact, Hive 
on Spark only needs spark-submit from your Spark installation. Therefore, you 
should be okay even if your spark installation contains Hive classes.

By following this, I'm sure you will get your Hive on Spark to work. Depending 
on the Hive version that your spark installation contains, you may have problem 
with spark applications such as SparkSQL, but it shouldn't be a concern if you 
decide that you use Hive in Hive.

Let me know if you are still confused.

Thanks,

Xuefu

 

On Fri, Nov 27, 2015 at 4:34 PM, Mich Talebzadeh <m...@peridale.co.uk 
<mailto:m...@peridale.co.uk> > wrote:

Hi,

 

Thanks for heads up and comments.

 

Sounds like when it comes to using spark as the execution engine for Hive, we 
are in no man’s land so to speak. I have opened questions in both Hive and 
Spark user forums. Not much of luck for reasons that you alluded to.

 

Ok just to clarify the prebuild version of spark (as opposed get the source 
code and build your spec) works fine for me.

 

Components are

 

hadoop version

Hadoop 2.6.0

 

hive --version

Hive 1.2.1

 

Spark 

version 1.5.2

 

It does what it says on the tin. For example I can start the master node OK 
start-master.sh. 

 

 

Spark Command: /usr/java/latest/bin/java -cp 
/usr/lib/spark_1.5.2_bin/sbin/../conf/:/usr/lib/spark_1.5.2_bin/lib/spark-assembly-1.5.2-hadoop2.6.0.jar:/usr/lib/spark_1.5.2_bin/lib/datanucleus-core-3.2.10.jar:/usr/lib/spark_1.5.2_bin/lib/datanucleus-api-jdo-3.2.6.jar:/usr/lib/spark_1.5.2_bin/lib/datanucleus-rdbms-3.2.9.jar:/home/hduser/hadoop-2.6.0/etc/hadoop/
 -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip 
127.0.0.1 --port 7077 --webui-port 8080

========================================

15/11/28 00:05:23 INFO master.Master: Registered signal handlers for [TERM, 
HUP, INT]

15/11/28 00:05:23 WARN util.Utils: Your hostname, rhes564 resolves to a 
loopback address: 127.0.0.1; using 50.140.197.217 instead (on interface eth0)

15/11/28 00:05:23 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to 
another address

15/11/28 00:05:24 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

15/11/28 00:05:24 INFO spark.SecurityManager: Changing view acls to: hduser

15/11/28 00:05:24 INFO spark.SecurityManager: Changing modify acls to: hduser

15/11/28 00:05:24 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(hduser); users 
with modify permissions: Set(hduser)

15/11/28 00:05:25 INFO slf4j.Slf4jLogger: Slf4jLogger started

15/11/28 00:05:25 INFO Remoting: Starting remoting

15/11/28 00:05:25 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sparkMaster@127.0.0.1:7077 <http://sparkMaster@127.0.0.1:7077> ]

15/11/28 00:05:25 INFO util.Utils: Successfully started service 'sparkMaster' 
on port 7077.

15/11/28 00:05:25 INFO master.Master: Starting Spark master at 
spark://127.0.0.1:7077 <http://127.0.0.1:7077> 

15/11/28 00:05:25 INFO master.Master: Running Spark version 1.5.2

15/11/28 00:05:25 INFO server.Server: jetty-8.y.z-SNAPSHOT

15/11/28 00:05:25 INFO server.AbstractConnector: Started 
SelectChannelConnector@0.0.0.0:8080 
<http://SelectChannelConnector@0.0.0.0:8080> 

15/11/28 00:05:25 INFO util.Utils: Successfully started service 'MasterUI' on 
port 8080.

15/11/28 00:05:25 INFO ui.MasterWebUI: Started MasterWebUI at 
http://50.140.197.217:8080

15/11/28 00:05:25 INFO server.Server: jetty-8.y.z-SNAPSHOT

15/11/28 00:05:25 INFO server.AbstractConnector: Started 
SelectChannelConnector@rhes564:6066

15/11/28 00:05:25 INFO util.Utils: Successfully started service on port 6066.

15/11/28 00:05:25 INFO rest.StandaloneRestServer: Started REST server for 
submitting applications on port 6066

15/11/28 00:05:25 INFO master.Master: I have been elected leader! New state: 
ALIVE

 

However, I cannot use spark in place of MapReduce engine with this build. It 
fails 

 

The instruction says download the source code for spark and build it by 
excluding Hive jar files so that you can use spark as the execution engine

 

Ok

 

I downloaded spark 1.5.2 source code and used the following to create the 
tarred and zipped file

 

./make-distribution.sh --name "hadoop2-without-hive" --tgz 
"-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"

 

After unpacking the file, I attempted to start the master node as above 
start-master.sh, However, regrettably it fails with the following error

 

 

Spark Command: /usr/java/latest/bin/java -cp 
/usr/lib/spark_1.5.2_build/sbin/../conf/:/usr/lib/spark_1.5.2_build/lib/spark-assembly-1.5.2-hadoop2.4.0.jar:/home/hduser/hadoop-2.6.0/etc/hadoop/
 -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip 
127.0.0.1 --port 7077 --webui-port 8080

========================================

Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger

        at java.lang.Class.getDeclaredMethods0(Native Method)

        at java.lang.Class.privateGetDeclaredMethods(Class.java:2521)

        at java.lang.Class.getMethod0(Class.java:2764)

        at java.lang.Class.getMethod(Class.java:1653)

        at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)

        at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)

Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger

        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

        at java.security.AccessController.doPrivileged(Native Method)

        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

        ... 6 more 

 

 

I believe the problem lies in spark-assembly-1.5.2-hadoop2.4.0.jar file. Case 
in point, if I copy the jar file spark-assembly-1.5.2-hadoop2.6.0.jar to the 
lib directory above , I can start the master node.

 

hduser@rhes564::/usr/lib/spark_1.5.2_build/lib 
<mailto:hduser@rhes564::/usr/lib/spark_1.5.2_build/lib> > mv 
spark-assembly-1.5.2-hadoop2.4.0.jar spark-assembly-1.5.2-hadoop2.4.0.jar_old

hduser@rhes564::/usr/lib/spark_1.5.2_build/lib 
<mailto:hduser@rhes564::/usr/lib/spark_1.5.2_build/lib> > cp 
/usr/lib/spark_1.5.2_bin/lib/spark-assembly-1.5.2-hadoop2.6.0.jar .

 

hduser@rhes564::/usr/lib/spark_1.5.2_build/lib 
<mailto:hduser@rhes564::/usr/lib/spark_1.5.2_build/lib> > cd ../sbin

hduser@rhes564::/usr/lib/spark_1.5.2_build/sbin 
<mailto:hduser@rhes564::/usr/lib/spark_1.5.2_build/sbin> > start-master.sh

starting org.apache.spark.deploy.master.Master, logging to 
/usr/lib/spark_1.5.2_build/sbin/../logs/spark-hduser-org.apache.spark.deploy.master.Master-1-rhes564.out

hduser@rhes564::/usr/lib/spark_1.5.2_build/sbin 
<mailto:hduser@rhes564::/usr/lib/spark_1.5.2_build/sbin> > cat 
/usr/lib/spark_1.5.2_build/sbin/../logs/spark-hduser-org.apache.spark.deploy.master.Master-1-rhes564.out

Spark Command: /usr/java/latest/bin/java -cp 
/usr/lib/spark_1.5.2_build/sbin/../conf/:/usr/lib/spark_1.5.2_build/lib/spark-assembly-1.5.2-hadoop2.6.0.jar:/home/hduser/hadoop-2.6.0/etc/hadoop/
 -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master --ip 
50.140.197.217 --port 7077 --webui-port 8080

========================================

15/11/28 00:31:24 INFO master.Master: Registered signal handlers for [TERM, 
HUP, INT]

15/11/28 00:31:25 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

15/11/28 00:31:25 INFO spark.SecurityManager: Changing view acls to: hduser

15/11/28 00:31:25 INFO spark.SecurityManager: Changing modify acls to: hduser

15/11/28 00:31:25 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(hduser); users 
with modify permissions: Set(hduser)

15/11/28 00:31:25 INFO slf4j.Slf4jLogger: Slf4jLogger started

15/11/28 00:31:26 INFO Remoting: Starting remoting

15/11/28 00:31:26 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sparkMaster@50.140.197.217:7077 
<http://sparkMaster@50.140.197.217:7077> ]

15/11/28 00:31:26 INFO util.Utils: Successfully started service 'sparkMaster' 
on port 7077.

15/11/28 00:31:26 INFO master.Master: Starting Spark master at 
spark://50.140.197.217:7077 <http://50.140.197.217:7077> 

15/11/28 00:31:26 INFO master.Master: Running Spark version 1.5.2

15/11/28 00:31:26 INFO server.Server: jetty-8.y.z-SNAPSHOT

15/11/28 00:31:26 INFO server.AbstractConnector: Started 
SelectChannelConnector@0.0.0.0:8080 
<http://SelectChannelConnector@0.0.0.0:8080> 

15/11/28 00:31:26 INFO util.Utils: Successfully started service 'MasterUI' on 
port 8080.

15/11/28 00:31:26 INFO ui.MasterWebUI: Started MasterWebUI at 
http://50.140.197.217:8080

15/11/28 00:31:26 INFO server.Server: jetty-8.y.z-SNAPSHOT

15/11/28 00:31:26 INFO server.AbstractConnector: Started 
selectchannelconnec...@c-50-140-197-217.hsd1.fl.comcast.net:6066 
<http://selectchannelconnec...@c-50-140-197-217.hsd1.fl.comcast.net:6066> 

15/11/28 00:31:26 INFO util.Utils: Successfully started service on port 6066.

15/11/28 00:31:26 INFO rest.StandaloneRestServer: Started REST server for 
submitting applications on port 6066

15/11/28 00:31:27 INFO master.Master: I have been elected leader! New state: 
ALIVE

 

Thanks again.

 

 

Mich Talebzadeh

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", 
ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 
978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one 
out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended 
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Technology Ltd, its 
subsidiaries or their employees, unless expressly so stated. It is the 
responsibility of the recipient to ensure that this email is virus free, 
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any 
responsibility.

 

From: Xuefu Zhang [mailto:xzh...@cloudera.com <mailto:xzh...@cloudera.com> ] 
Sent: 27 November 2015 18:12
To: u...@hive.apache.org <mailto:u...@hive.apache.org> ; dev@hive.apache.org 
<mailto:dev@hive.apache.org> 
Subject: Answers to recent questions on Hive on Spark

 

Hi there,

There seemed an increasing interest in Hive On Spark From the Hive users. I 
understand that there have been a few questions or problems reported and I can 
see some frustration sometimes. It's impossible for Hive on Spark team to 
respond every inquiry even thought we wish we could. However, there are a few 
items to be noted:

1. Hive on Spark is being tested as part of Precommit test.

2. Hive on Spark is supported in some distributions such as CDH.

3. I tried a couple of days ago with latest master and branch-1, and they all 
worked with my Spark 1.5 build.

Therefore, if you are facing some problem, it's likely due to your setup. 
Please refer to Wiki on how to do it right. Nevertheless, I have a few 
suggestions here:

1. Start with simple. Try out a CDH sandbox or distribution first and to see it 
works in action before building your own. Comparing with your setup may give 
you some clues.

2. Try with spark.master=local first, making sure that you have all the 
necessary dependent jars, and then move to your production setup. Please note 
that yarn-cluster is recommended and mesos is not supported. I tried both 
yarn-cluster and local-cluster and both worked for me.

3. Check logs beyond hive.log such as spark log, and yarn-log to get more error 
messages.

When you report your problem, please provide as much info as possible, such as 
your platform, your builds, your configurations, and relevant logs so that 
others can reproduce.

Please note that we are not in a good position to answer questions with respect 
to Spark itself, such as spark-shell. Not only is that beyond the scope of Hive 
on Scope, but also the team may not have the expertise to give your meaningful 
answers. One thing to emphasize. When you build your spark jar, don't include 
Hive, as it's very likely there is a version mismatch. Again, a distribution 
may have solve the problem for you if you like to give it a try.

Hope this helps.

Thanks,

Xuefu

RE: Answers to recent questions on Hive on Spark

Reply via email to