[ https://issues.apache.org/jira/browse/HBASE-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219887#comment-13219887 ]
Francis Liu commented on HBASE-5498: ------------------------------------ Andrew, thanks for point that discussion out. Can't those two hooks be combined into one? The user can just ignore the scanner if he doesn't need it. Or is there a large overhead on even just creating the scanner? If the hdfs level 'chown' enhancement is implemented, wouldn't you need to change the method signature, which would make hbase dependent on security-enabled hadoop deployments? The bulk enhancement I am proposing is used for more than just 'chown'. Correct me if I'm wrong here but given the partitioning constraint needed to generate the HFiles, very few users will actually call completeBulkUpload after their processing job. A lot of them will have their own import MR jobs which converts processed data from one format into HFiles and then call completeBulkUpload. Users can be smart and create a job which does most of it's work map-side then be able to do the correct partitioning. But the trend at least at Y! is that the majority of the users are using DSLs and its going to keep growing. In effect we are not introducing any added overhead to the user only making their lives easier. With the 'chown' enhancement we can make it so that an MR job doesn't have to be launched for importing hfiles. > Secure Bulk Load > ---------------- > > Key: HBASE-5498 > URL: https://issues.apache.org/jira/browse/HBASE-5498 > Project: HBase > Issue Type: Improvement > Reporter: Francis Liu > > Design doc: > https://cwiki.apache.org/confluence/display/HCATALOG/HBase+Secure+Bulk+Load > Short summary: > Security as it stands does not cover the bulkLoadHFiles() feature. Users > calling this method will bypass ACLs. Also loading is made more cumbersome in > a secure setting because of hdfs privileges. bulkLoadHFiles() moves the data > from user's directory to the hbase directory, which would require certain > write access privileges set. > Our solution is to create a coprocessor which makes use of AuthManager to > verify if a user has write access to the table. If so, launches a MR job as > the hbase user to do the importing (ie rewrite from text to hfiles). One > tricky part this job will have to do is impersonate the calling user when > reading the input files. We can do this by expecting the user to pass an hdfs > delegation token as part of the secureBulkLoad() coprocessor call and extend > an inputformat to make use of that token. The output is written to a > temporary directory accessible only by hbase and then bulkloadHFiles() is > called. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira