我在自己的Mac Pro上安装了单机版的hadoop, 使用的版本是官方hadoop3.3.6,系统用户名是 shuai.chen
,是的,名称中间带有点号!以下实验过程都是以系统用户名 shuai.chen 进行的。
core-site.xml配置如下
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/shuai.chen/dev/hadoop-3.3.6/hdfs/tmp</value>
</property>
<property>
<name>hadoop.proxyuser.shuai.chen.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.shuai.chen.groups</name>
<value>*</value>
</property>
</configuration>
安装hive 3.1.2并使用beeline通过hive账户访问正常,数据可以读写
hive-site.xml配置如下
<?xml version="1.0"?>
<configuration>
<property>
<name>hive.metastore.local</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
<property>
<name>beeline.hs2.connection.user</name>
<value>hive</value>
</property>
<property>
<name>beeline.hs2.connection.password</name>
<value>hive</value>
</property>
<property>
<name>beeline.hs2.connection.hosts</name>
<value>localhost:10000</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>/Users/shuai.chen/dev/apache-hive-3.1.2-bin/auxlib</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>mr</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>localhost</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
<property>
<name>hive.server2.webui.host</name>
<value>localhost</value>
</property>
<property>
<name>hive.server2.webui.port</name>
<value>10002</value>
</property>
</configuration>
0: jdbc:hive2://localhost:10000> select * from student;
INFO : Compiling
command(queryId=shuai.chen_20241025121808_1285a60c-aef9-42da-a839-347e30586aa6):
select * from student
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema:
Schema(fieldSchemas:[FieldSchema(name:student.id, type:int, comment:null),
FieldSchema(name:student.name, type:string, comment:null)], properties:null)
INFO : Completed compiling
command(queryId=shuai.chen_20241025121808_1285a60c-aef9-42da-a839-347e30586aa6);
Time taken: 2.692 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing
command(queryId=shuai.chen_20241025121808_1285a60c-aef9-42da-a839-347e30586aa6):
select * from student
INFO : Completed executing
command(queryId=shuai.chen_20241025121808_1285a60c-aef9-42da-a839-347e30586aa6);
Time taken: 0.006 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
+-------------+---------------+
| student.id | student.name |
+-------------+---------------+
| 1 | Jack |
| 2 | Rose |
+-------------+---------------+
2 rows selected (3.162 seconds)
安装spark 3.3.1 with hadoop 3也能够使用spark sql访问hive表
24/10/25 12:18:57 WARN HiveConf: HiveConf of name hive.metastore.local does not
exist
24/10/25 12:18:57 WARN HiveConf: HiveConf of name
hive.metastore.event.db.notification.api.auth does not exist
Spark master: local[*], Application Id: local-1729829938805
spark-sql (default)> select * from student;
24/10/25 12:19:11 WARN SessionState: METASTORE_FILTER_HOOK will be ignored,
since hive.security.authorization.manager is set to instance of
HiveAuthorizerFactory.
idname
1Jack
2Rose
Time taken: 2.274 seconds, Fetched 2 row(s)
接下来通过kyuubi 1.9.2访问hive表时报了如下错误:
bin/beeline -u 'jdbc:hive2://localhost:10009/' -n apache
Connecting to jdbc:hive2://localhost:10009/
2024-10-25 12:28:49.658 INFO KyuubiSessionManager-exec-pool: Thread-63
org.apache.kyuubi.operation.LaunchEngine: Processing apache's
query[3961940e-7d87-46a6-a2c8-edd3677b5d96]: PENDING_STATE -> RUNNING_STATE,
statement:
LaunchEngine
2024-10-25 12:28:49.661 INFO KyuubiSessionManager-exec-pool: Thread-63
org.apache.kyuubi.shaded.curator.framework.imps.CuratorFrameworkImpl: Starting
2024-10-25 12:28:49.661 INFO KyuubiSessionManager-exec-pool: Thread-63
org.apache.kyuubi.shaded.zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=60000
watcher=org.apache.kyuubi.shaded.curator.ConnectionState@8b80d8c
2024-10-25 12:28:49.664 INFO KyuubiSessionManager-exec-pool:
Thread-63-SendThread(localhost:2181)
org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Opening socket connection to
server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using
SASL (unknown error)
2024-10-25 12:28:49.665 INFO KyuubiSessionManager-exec-pool:
Thread-63-SendThread(localhost:2181)
org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Socket connection established to
localhost/0:0:0:0:0:0:0:1:2181, initiating session
2024-10-25 12:28:49.667 INFO KyuubiSessionManager-exec-pool:
Thread-63-SendThread(localhost:2181)
org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Session establishment complete
on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x10009bf7b340013,
negotiated timeout = 40000
2024-10-25 12:28:49.667 INFO KyuubiSessionManager-exec-pool:
Thread-63-EventThread
org.apache.kyuubi.shaded.curator.framework.state.ConnectionStateManager: State
change: CONNECTED
2024-10-25 12:28:49.684 INFO KyuubiSessionManager-exec-pool: Thread-63
org.apache.kyuubi.engine.ProcBuilder: Logging to
/Users/shuai.chen/dev/apache-kyuubi-1.9.2-bin/work/apache/kyuubi-spark-sql-engine.log.5
2024-10-25 12:28:49.685 INFO KyuubiSessionManager-exec-pool: Thread-63
org.apache.kyuubi.engine.EngineRef: Launching engine:
/Users/shuai.chen/dev/spark-3.3.1-bin-hadoop3/bin/spark-submit \
--class org.apache.kyuubi.engine.spark.SparkSQLEngine \
--conf spark.hive.server2.thrift.resultset.default.fetch.size=1000 \
--conf spark.kyuubi.client.ipAddress=127.0.0.1 \
--conf spark.kyuubi.client.version=1.9.2 \
--conf
spark.kyuubi.engine.engineLog.path=/Users/shuai.chen/dev/apache-kyuubi-1.9.2-bin/work/apache/kyuubi-spark-sql-engine.log.5
\
--conf spark.kyuubi.engine.share.level=USER \
--conf spark.kyuubi.engine.submit.time=1729830529677 \
--conf spark.kyuubi.engine.type=SPARK_SQL \
--conf spark.kyuubi.frontend.protocols=THRIFT_BINARY,REST \
--conf spark.kyuubi.ha.addresses=localhost:2181 \
--conf spark.kyuubi.ha.engine.ref.id=6498a13e-ca86-4f7b-9515-b9b59d19a6dd \
--conf spark.kyuubi.ha.namespace=/kyuubi_1.9.2_USER_SPARK_SQL/apache/default \
--conf spark.kyuubi.server.ipAddress=127.0.0.1 \
--conf spark.kyuubi.session.connection.url=localhost:10009 \
--conf spark.kyuubi.session.engine.initialize.timeout=PT3M \
--conf spark.kyuubi.session.real.user=apache \
--conf
spark.app.name=kyuubi_USER_SPARK_SQL_apache_default_6498a13e-ca86-4f7b-9515-b9b59d19a6dd
\
--conf spark.master=yarn \
--conf spark.submit.deployMode=cluster \
--conf spark.yarn.maxAppAttempts=1 \
--conf spark.yarn.tags=KYUUBI,6498a13e-ca86-4f7b-9515-b9b59d19a6dd \
--proxy-user apache
/Users/shuai.chen/dev/apache-kyuubi-1.9.2-bin/externals/engines/spark/kyuubi-spark-sql-engine_2.12-1.9.2.jar
2024-10-25 12:28:53.735 INFO Curator-Framework-0
org.apache.kyuubi.shaded.curator.framework.imps.CuratorFrameworkImpl:
backgroundOperationsLoop exiting
2024-10-25 12:28:53.737 INFO KyuubiSessionManager-exec-pool: Thread-63
org.apache.kyuubi.shaded.zookeeper.ZooKeeper: Session: 0x10009bf7b340013 closed
2024-10-25 12:28:53.737 INFO KyuubiSessionManager-exec-pool:
Thread-63-EventThread org.apache.kyuubi.shaded.zookeeper.ClientCnxn:
EventThread shut down for session: 0x10009bf7b340013
2024-10-25 12:28:53.738 INFO KyuubiSessionManager-exec-pool: Thread-63
org.apache.kyuubi.operation.LaunchEngine: Processing apache's
query[3961940e-7d87-46a6-a2c8-edd3677b5d96]: RUNNING_STATE -> ERROR_STATE, time
taken: 4.079 seconds
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/Users/shuai.chen/dev/spark-3.3.1-bin-hadoop3/jars/log4j-slf4j-impl-2.17.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/Users/shuai.chen/dev/hadoop-3.3.6/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
24/10/25 12:28:51 WARN Utils: Your hostname, shuaichendeMacBook-Pro.local
resolves to a loopback address: 127.0.0.1; using 172.31.21.68 instead (on
interface en0)
24/10/25 12:28:51 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another
address
24/10/25 12:28:52 WARN NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
24/10/25 12:28:52 INFO DefaultNoHARMFailoverProxyProvider: Connecting to
ResourceManager at localhost/127.0.0.1:8032
Exception in thread "main" org.apache.spark.SparkException: ERROR:
org.apache.hadoop.security.authorize.AuthorizationException: User: shuai.chen
is not allowed to impersonate apache
at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:975)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:174)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
24/10/25 12:28:52 INFO ShutdownHookManager: Shutdown hook called
24/10/25 12:28:52 INFO ShutdownHookManager: Deleting directory
/private/var/folders/t9/q0g6dydj28ncjkn18rfx_mhc0000gn/T/spark-8025071a-0f2c-4377-bf68-a7b92aee08b9
Error: org.apache.kyuubi.KyuubiSQLException:
org.apache.kyuubi.KyuubiSQLException: Exception in thread "main"
org.apache.spark.SparkException: ERROR:
org.apache.hadoop.security.authorize.AuthorizationException: User: shuai.chen
is not allowed to impersonate apache
at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:975)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:174)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
See more:
/Users/shuai.chen/dev/apache-kyuubi-1.9.2-bin/work/apache/kyuubi-spark-sql-engine.log.5
at org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:69)
at org.apache.kyuubi.engine.ProcBuilder.$anonfun$start$1(ProcBuilder.scala:234)
at java.lang.Thread.run(Thread.java:750)
我查到Hadoop社区已经在3.2.0版本修复了该问题,链接如下
https://issues.apache.org/jira/browse/HADOOP-15395
是我哪里没有配置对么?
附kyuubi-defaults.conf
```
kyuubi.authentication NONE
kyuubi.frontend.bind.host localhost
kyuubi.frontend.protocols THRIFT_BINARY,REST
kyuubi.frontend.thrift.binary.bind.port 10009
kyuubi.frontend.rest.bind.port 10099
kyuubi.engine.type SPARK_SQL
#kyuubi.engine.type=FLINK_SQL
#kyuubi.engine.type=TRINO
#kyuubi.session.engine.trino.connection.url=http://localhost:18080
#kyuubi.session.engine.trino.connection.catalog=hive
kyuubi.engine.share.level USER
kyuubi.session.engine.initialize.timeout PT3M
#kyuubi.ha.addresses localhost:2181
#kyuubi.ha.namespace kyuubi
spark.master=yarn
spark.submit.deployMode=cluster
# Details in https://kyuubi.readthedocs.io/en/master/configuration/settings.html
```
和kyuubi-env.sh
export JAVA_HOME=/Users/shuai.chen/.sdkman/candidates/java/8.0.422-zulu
export SPARK_HOME=/Users/shuai.chen/dev/spark-3.3.1-bin-hadoop3
export FLINK_HOME=/Users/shuai.chen/dev/flink
export FLINK_ENGINE_HOME=/Users/shuai.chen/dev/flink
export TRINO_HOME=/Users/shuai.chen/dev/trino-server-427
export TRINO_ENGINE_HOME=/Users/shuai.chen/dev/trino-server-427
export HADOOP_HOME=/Users/shuai.chen/dev/hadoop-3.3.6
export HADOOP_CONF_DIR=/Users/shuai.chen/dev/hadoop-3.3.6/etc/hadoop
export SPARK_DIST_CLASSPATH=$(/Users/shuai.chen/dev/hadoop-3.3.6/bin/hadoop
classpath)