[
https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315951#comment-14315951
]
Tao Wang edited comment on SPARK-5159 at 2/11/15 10:18 AM:
-----------------------------------------------------------
I have tested this on branch 1.2, below are results:
1.When set hive.server2.enable.doAs=false, I use user `hdfs` to connect
ThriftServer, then do some operation, the audit log in NameNode shows like this:
bq.2015-02-11 18:07:50,568 | INFO | IPC Server handler 62 on 25000 |
allowed=true ugi=hdfs (auth:PROXY) via spark/[email protected] (auth:KERBEROS)
ip=/9.91.11.204 cmd=getfileinfo
src=/user/sparkhive/warehouse/yarn.db/child dst=null perm=null |
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950)
2015-02-11 18:07:50,577 | INFO | IPC Server handler 16 on 25000 | allowed=true
ugi=hdfs (auth:PROXY) via spark/[email protected] (auth:KERBEROS)
ip=/9.91.11.204 cmd=mkdirs src=/user/sparkhive/warehouse/yarn.db/child
dst=null perm=hdfs:hadoop:rwxr-xr-x |
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950)
and ThriftServer's log shows like this:
bq.2015-02-11 18:07:50,471 | INFO | [pool-9-thread-2] | ugi=hdfs
ip=unknown-ip-addr cmd=create_table: Table(tableName:child, dbName:yarn,
owner:hdfs, createTime:1423649270, lastAccessTime:0, retention:0,
sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null),
FieldSchema(name:age, type:int, comment:null)], location:null,
inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
parameters:{serialization.format=,, field.delim=,}), bucketCols:[],
sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[],
skewedColValues:[], skewedColValueLocationMaps:{}),
storedAsSubDirectories:false), partitionKeys:[], parameters:{},
viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) |
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:305)
2. When set hive.server2.enable.doAs=true, NameNode's log show like this:
bq.2015-02-11 18:00:05,599 | INFO | IPC Server handler 32 on 25000 |
allowed=true ugi=spark/[email protected] (auth:KERBEROS) ip=/9.91.11.204
cmd=getfileinfo src=/user/sparkhive/warehouse/yarn.db dst=null perm=null
|
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950)
2015-02-11 18:00:05,607 | INFO | IPC Server handler 24 on 25000 | allowed=true
ugi=spark/[email protected] (auth:KERBEROS) ip=/9.91.11.204 cmd=mkdirs
src=/user/sparkhive/warehouse/yarn.db dst=null
perm=spark:hadoop:rwxr-xr-x |
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950)
ThriftServer's log shows like this:
bq.2015-02-11 18:00:05,437 | INFO | [pool-9-thread-2] |
ugi=spark/[email protected] ip=unknown-ip-addr cmd=create_database:
Database(name:yarn, description:null, locationUri:null, parameters:null,
ownerName:spark, ownerType:USER) |
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:305)
2015-02-11 18:00:05,437 | INFO | [pool-9-thread-2] | 2: get_database: yarn |
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:623)
2015-02-11 18:00:05,438 | INFO | [pool-9-thread-2] |
ugi=spark/[email protected] ip=unknown-ip-addr cmd=get_database:
yarn |
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:305)
I am not an expert on Hive or `doAs` feature. But it met my expect from my
point.
P.S. spark/[email protected] is the principle for HiveServer2 to access HDFS.
was (Author: wangtaothetonic):
I have tested this on branch 1.2, below are results:
1.When set hive.server2.enable.doAs=false, I use user `hdfs` to connect
ThriftServer, then do some operation, the audit log in NameNode shows like this:
bq.
2015-02-11 18:07:50,568 | INFO | IPC Server handler 62 on 25000 | allowed=true
ugi=hdfs (auth:PROXY) via spark/[email protected] (auth:KERBEROS)
ip=/9.91.11.204 cmd=getfileinfo src=/user/sparkhive/warehouse/yarn.db/child
dst=null perm=null |
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950)
2015-02-11 18:07:50,577 | INFO | IPC Server handler 16 on 25000 | allowed=true
ugi=hdfs (auth:PROXY) via spark/[email protected] (auth:KERBEROS)
ip=/9.91.11.204 cmd=mkdirs src=/user/sparkhive/warehouse/yarn.db/child
dst=null perm=hdfs:hadoop:rwxr-xr-x |
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950)
and ThriftServer's log shows like this:
bq.
2015-02-11 18:07:50,471 | INFO | [pool-9-thread-2] | ugi=hdfs
ip=unknown-ip-addr cmd=create_table: Table(tableName:child, dbName:yarn,
owner:hdfs, createTime:1423649270, lastAccessTime:0, retention:0,
sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:null),
FieldSchema(name:age, type:int, comment:null)], location:null,
inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
parameters:{serialization.format=,, field.delim=,}), bucketCols:[],
sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[],
skewedColValues:[], skewedColValueLocationMaps:{}),
storedAsSubDirectories:false), partitionKeys:[], parameters:{},
viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) |
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:305)
2. When set hive.server2.enable.doAs=true, NameNode's log show like this:
bq.
2015-02-11 18:00:05,599 | INFO | IPC Server handler 32 on 25000 | allowed=true
ugi=spark/[email protected] (auth:KERBEROS) ip=/9.91.11.204 cmd=getfileinfo
src=/user/sparkhive/warehouse/yarn.db dst=null perm=null |
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950)
2015-02-11 18:00:05,607 | INFO | IPC Server handler 24 on 25000 | allowed=true
ugi=spark/[email protected] (auth:KERBEROS) ip=/9.91.11.204 cmd=mkdirs
src=/user/sparkhive/warehouse/yarn.db dst=null
perm=spark:hadoop:rwxr-xr-x |
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$DefaultAuditLogger.logAuditMessage(FSNamesystem.java:7950)
ThriftServer's log shows like this:
bq.
2015-02-11 18:00:05,437 | INFO | [pool-9-thread-2] |
ugi=spark/[email protected] ip=unknown-ip-addr cmd=create_database:
Database(name:yarn, description:null, locationUri:null, parameters:null,
ownerName:spark, ownerType:USER) |
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:305)
2015-02-11 18:00:05,437 | INFO | [pool-9-thread-2] | 2: get_database: yarn |
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:623)
2015-02-11 18:00:05,438 | INFO | [pool-9-thread-2] |
ugi=spark/[email protected] ip=unknown-ip-addr cmd=get_database:
yarn |
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:305)
I am not an expert on Hive or `doAs` feature. But it met my expect from my
point.
P.S. spark/[email protected] is the principle for HiveServer2 to access HDFS.
> Thrift server does not respect hive.server2.enable.doAs=true
> ------------------------------------------------------------
>
> Key: SPARK-5159
> URL: https://issues.apache.org/jira/browse/SPARK-5159
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.2.0
> Reporter: Andrew Ray
>
> I'm currently testing the spark sql thrift server on a kerberos secured
> cluster in YARN mode. Currently any user can access any table regardless of
> HDFS permissions as all data is read as the hive user. In HiveServer2 the
> property hive.server2.enable.doAs=true causes all access to be done as the
> submitting user. We should do the same.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]