Yeah, I think that the conf var is a better solution, because it would give 
consistent behavior once the switch is made. Plus, it would avoid cluttering up 
the metastore API (at the expense of another conf var...). If the CLI were 
configured to use a remote metastore, it would need to have additional checks 
to see if the directory were created by the metastore call.

From: Pradeep Kamath [mailto:[email protected]]
Sent: Wednesday, July 21, 2010 9:07 AM
To: [email protected]
Subject: RE: Thrift metastore server and dfs file owner

I favor the option of a conf variable - "strict.owner.mode" to indicate that 
dirs will not be created by server and will be done by the client. In 
installations where there are thrift clients, this can be set to false till the 
point the clients are ready to create the dirs themselves - is this an 
acceptable solution - I can then open a jira with this proposed solution.

Thoughts?

Pradeep

________________________________
From: Pradeep Kamath [mailto:[email protected]]
Sent: Tuesday, July 20, 2010 10:10 AM
To: [email protected]
Subject: RE: Thrift metastore server and dfs file owner

In addition to the options below, if there is some way to have custom code into 
thrift clients then that could be a third option - from what little I know of 
thrift, I think the client code is generated and there is no way to add 
additional logic into the methods - but in case there is a way to do that, then 
that might be the best option.

________________________________
From: Pradeep Kamath [mailto:[email protected]]
Sent: Monday, July 19, 2010 1:09 PM
To: [email protected]
Subject: RE: Thrift metastore server and dfs file owner


I agree this will be an issue for direct thrift clients. How about the 
following options:



1) Add a conf variable - "strict.owner.mode" - if this is set to true on the 
server, dirs will not be created and they will be created on the client (both 
client and server should have the same value (true or false).

OR

2) Add a new API method in the thrift API which takes an extra Boolean arg 
whether or not to create dirs. The HiveMetaStoreClient code will use this new 
api with a "false" argument value and create the dir on the client side. The 
issue with this is that existing Thrift client would be calling the current API 
method which would create dirs as the thrift server users. So depending on 
whether you are creating the table using thrift (with old method) or CLI you 
get different results. The old method could be deprecated and the thrift 
clients can migrate to the new one.



Thoughts?



(This directory creation/deletion is relevant to create table/drop table/add 
partition/alter table/alter partition I think)



Pradeep



-----Original Message-----
From: Paul Yang [mailto:[email protected]]
Sent: Monday, July 19, 2010 10:53 AM
To: [email protected]
Subject: RE: Thrift metastore server and dfs file owner



That approach would work for the CLI, but then the semantics for the create 
table/create partition calls for thrift clients would be different - it would 
no longer create the table directory. This might be a problem if there are 
scripts that rely on this property for copying/moving files. Also, table 
renaming code would need to be modified as well.



-----Original Message-----

From: Pradeep Kamath [mailto:[email protected]]

Sent: Monday, July 19, 2010 10:24 AM

To: [email protected]

Subject: RE: Thrift metastore server and dfs file owner



I was thinking about this a little more and was wondering if the following 
alternative approach is feasible:

Instead of the Metastore code creating the directories why not have 
HiveMetastoreClient create it in createTable() after the table is created - 
i.e. it can do a getTable().getSd().getLocation() and perform wh.mkdirs() on 
that path. We could do the same thing with addPartition().



This way, we can have the metastore thrift server running as a 
non-hdfs-superuser. Also, we no longer need to keep track or user/group 
information since the client already is running with the right user/group 
credentials.



Thoughts?



Pradeep



-----Original Message-----

From: Pradeep Kamath [mailto:[email protected]]

Sent: Thursday, July 15, 2010 10:23 AM

To: [email protected]

Subject: RE: Thrift metastore server and dfs file owner



Currently group information is not present in the Table and both owner and 
group information are absent from Database. If these are added to these 
classes, we could change Warehouse.mkdirs(). This method is also called form 
addPartition(), should we just use the table's owner/group in this case? - 
could potentially fail in non thrift case if some other user is creating the 
partitions OR we would need to add owner/group to Partition as well with the 
implication that table and partition owner's could differ causing query 
failures.



Paul's concern about security is valid but is there any other way around this?



Pradeep



-----Original Message-----

From: Paul Yang [mailto:[email protected]]

Sent: Wednesday, July 14, 2010 3:18 PM

To: [email protected]

Subject: RE: Thrift metastore server and dfs file owner



Yeah, you could overload Warehouse.mkdirs() to allow specification of an 
owner/group and then use Filesystem.setOwner() within the method.



If the thrift server has full permissions for DFS though, wouldn't this present 
a security hole?



-----Original Message-----

From: Ashish Thusoo [mailto:[email protected]]

Sent: Wednesday, July 14, 2010 12:34 PM

To: [email protected]

Subject: RE: Thrift metastore server and dfs file owner



We could just fix this in Warehouse.java so that the mkdirs call make the 
directories according to the owner field that is passed to the table? That 
probably would be a simple fix for this, no?



Ashish



-----Original Message-----

From: Pradeep Kamath [mailto:[email protected]]

Sent: Wednesday, July 14, 2010 11:14 AM

To: [email protected]

Subject: RE: Thrift metastore server and dfs file owner



<name>dfs.permissions</name>

<value>true</value>

..

<name>dfs.permissions.supergroup</name>

<value>hdfs</value>



You mentioned: "I think the thrift server can use the dfs processor." - were 
you suggesting the metastore implementation in HiveMetastore should always do 
chown user:user on create_table_core() (or selectively look at the conf and 
known it is being run as a thrift server and chown only in that case)?



Pradeep



-----Original Message-----

From: Edward Capriolo [mailto:[email protected]]

Sent: Tuesday, July 13, 2010 4:52 PM

To: [email protected]

Subject: Re: Thrift metastore server and dfs file owner



On Tue, Jul 13, 2010 at 6:20 PM, Pradeep Kamath <[email protected]> wrote:

> I tried:

> hive -e "set user.name=$USER;create table foo2 ( name string);"

>

> My warehouse table dir still got created by "root" (the user my thrift

> server is running as) drwxr-xr-x   - root supergroup          0

> 2010-07-13 15:19 /user/pradeepk/hive/warehouse/foo2

>

> -----Original Message-----

> From: Edward Capriolo [mailto:[email protected]]

> Sent: Tuesday, July 13, 2010 2:47 PM

> To: [email protected]

> Subject: Re: Thrift metastore server and dfs file owner

>

> On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath <[email protected]> 
> wrote:

>> Hi,

>>

>>    I suspect this is true but wanted to confirm: If I start a thrift

>> metastore service as user "joe" then all internal tables created will

>> have directories under the warehouse directory owned by "joe"

>> regardless of the actual user running the create table statement - is

>> this correct? There is no way for the thrift server to create the directory 
>> as the actual user?

>> However if thrift service is not used and the hive client directly

>> works against the metastore database, then the directories are

>> created by the actual user - is this correct?

>>

>>

>>

>> Thanks,

>>

>> Pradeep

>

> The hive web interface does this:

>

>    queries.add("set hadoop.job.ugi=" + auth.getUser() + ","

>        + auth.getGroups()[0]);

>    queries.add("set user.name=" + auth.getUser());

>

> You should be able to accomplish the same thing using set commands

> with the Thrift Server to impersonate.

>

> Regards,

> Edward

>



You are right. That technique may only affect files created during the 
map/reduce job. I think the thrift server can use the dfs processor.



hive> dfs -chown user:user /user/hive/warehouse/foo2;



Questions:

Who is your hadoop superuser?

Are you enforcing dfs permissions?



If you are enforcing permissions only the hadoop superuser (hadoop) will be 
able to chown files to other users and groups.

Reply via email to