[ 
https://issues.apache.org/jira/browse/HBASE-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219887#comment-13219887
 ] 

Francis Liu commented on HBASE-5498:
------------------------------------

Andrew, thanks for point that discussion out. Can't those two hooks be combined 
into one? The user can just ignore the scanner if he doesn't need it. Or is 
there a large overhead on even just creating the scanner? If the hdfs level 
'chown' enhancement is implemented, wouldn't you need to change the method 
signature, which would make hbase dependent on security-enabled hadoop 
deployments?

The bulk enhancement I am proposing is used for more than just 'chown'. Correct 
me if I'm wrong here but given the partitioning constraint needed to generate 
the HFiles, very few users will actually call completeBulkUpload after their 
processing job. A lot of them will have their own import MR jobs which converts 
processed data from one format into HFiles and then call completeBulkUpload. 
Users can be smart and create a job which does most of it's work map-side then 
be able to do the correct partitioning. But the trend at least at Y! is that 
the majority of the users are using DSLs and its going to keep growing. In 
effect we are not introducing any added overhead to the user only making their 
lives easier. With the 'chown' enhancement we can make it so that an MR job 
doesn't have to be launched for importing hfiles.
                
> Secure Bulk Load
> ----------------
>
>                 Key: HBASE-5498
>                 URL: https://issues.apache.org/jira/browse/HBASE-5498
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Francis Liu
>
> Design doc: 
> https://cwiki.apache.org/confluence/display/HCATALOG/HBase+Secure+Bulk+Load
> Short summary:
> Security as it stands does not cover the bulkLoadHFiles() feature. Users 
> calling this method will bypass ACLs. Also loading is made more cumbersome in 
> a secure setting because of hdfs privileges. bulkLoadHFiles() moves the data 
> from user's directory to the hbase directory, which would require certain 
> write access privileges set.
> Our solution is to create a coprocessor which makes use of AuthManager to 
> verify if a user has write access to the table. If so, launches a MR job as 
> the hbase user to do the importing (ie rewrite from text to hfiles). One 
> tricky part this job will have to do is impersonate the calling user when 
> reading the input files. We can do this by expecting the user to pass an hdfs 
> delegation token as part of the secureBulkLoad() coprocessor call and extend 
> an inputformat to make use of that token. The output is written to a 
> temporary directory accessible only by hbase and then bulkloadHFiles() is 
> called.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to