[ https://issues.apache.org/jira/browse/HBASE-27044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537812#comment-17537812 ]
Josh Elser commented on HBASE-27044: ------------------------------------ We could do a pretty naive "change" here where we just return back a {{User}} which is "unknown" when we fail to parse the serialized protobuf which would be enough to fix this problem on the surface. However, I think this change is missing the root of the problem (the expectation that HBase should just be able to "reattach" itself to an hbase.rootdir). I can't think of any way in which the above exception would be thrown other than the cloud storage reattachment case I described. I'm happy to put up a patch to gracefully handle a the failure to create the UGI if folks think there is merit in that. > Serialized procedures which point to users from other Kerberos domains can > prevent master startup > ------------------------------------------------------------------------------------------------- > > Key: HBASE-27044 > URL: https://issues.apache.org/jira/browse/HBASE-27044 > Project: HBase > Issue Type: Bug > Components: proc-v2 > Reporter: Josh Elser > Priority: Major > > We ran into an interesting bug when test teams were running HBase against > cloud storage without ensuring that the previous location was cleaned. This > resulted in an hbase.rootdir that had: > * A valid HBase MasterData Region > * A valid hbase:meta > * A valid collection of HBase tables > * An empty ZooKeeper > Through the changes that we've worked on prior, those described in > HBASE-24286 were effective in getting every _except_ the Procedures back > online without issue. Parsing the existing procedures produced an interesting > error: > {noformat} > java.lang.IllegalArgumentException: Illegal principal name > hbase/wrong-hostname.domain@WRONG_REALM: > org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: > No rules applied to hbase/wrong-hostname.domain@WRONG_REALM > at org.apache.hadoop.security.User.<init>(User.java:51) > at org.apache.hadoop.security.User.<init>(User.java:43) > at > org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1418) > at > org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1402) > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.toUserInfo(MasterProcedureUtil.java:60) > at > org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.deserializeStateData(ModifyTableProcedure.java:262) > at > org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:294) > at > org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43) > at > org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:411) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.load(ProcedureExecutor.java:339) > at > org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:285) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:330) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:600) > at > org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1581) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:835) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2205) > at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:514) > at java.lang.Thread.run(Thread.java:750) {noformat} > What's actually happening is that we are storing the {{User}} into the > procedure and then relying on UserGroupInformation to parse the {{User}} > protobuf into a UGI to get the "short" username. > When the serialized procedure (whether in the MasterData region over via PV2 > WAL files, I think) gets loaded, we end up needing Hadoop auth_to_local > configuration to be able to parse that kerberos principal back to a name. > However, Hadoop's KerberosName will only unwrap Kerberos principals which > match the local Kerberos realm (defined by the krb5.conf's default_realm, > [ref|https://github.com/frohoff/jdk8u-jdk/blob/master/src/share/classes/sun/security/krb5/Config.java#L978-L983]) > The interesting part is that we don't seem to ever use the user _other_ than > to display the {{owner}} attribute for procedures on the HBase UI. There is a > method in hbase-procedure which can filter procedures based on Owner, but I > didn't see any usages of that method. > Given the pushback against HBASE-24286, I assume that, for the same reasons, > we would see pushback against fixing this issue. However, I wanted to call it > out for posterity. The expectation of users is that HBase _should_ implicitly > handle this case. -- This message was sent by Atlassian Jira (v8.20.7#820007)