[ 
https://issues.apache.org/jira/browse/HBASE-24961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-24961:
-----------------------------------------
    Description: 
This came up when running bulkloads on hbase deployments using HBOSS. The fixes 
introduced by HBASE-23679 use *_FileSystem.closeAllForUGI(ugi)_* to make sure 
_*FileSystem*_ instances get cleared for the specific running UGI. Problem is 
that _*FileSystem.closeAllForUGI*_ does not remove the instance from 
_*FileSystem.CACHE*_ explicitly, it rather calls _*FileSystem.close*_, which in 
turn removes itself from _*FileSystem.CACHE*_. In this case, though, our 
_*FileSystem*_ implementation is _*HBaseObjectStoreSemantics*_, so 
_*FileSystem.closeAllForUGI*_ closes it, but does not remove it from 
_*FileSystem.CACHE*_, leading to all attempts to _*FileSystem.get*_ by the same 
UGI retrieving a closed _*HBaseObjectStoreSemantics*_ instance, ultimately 
failing as below:

 
{noformat}
2020-08-26 12:43:57,528 ERROR 
org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager: Failed to complete 
bulk load
java.io.IOException: Exception while testing a lock
        at 
org.apache.hadoop.hbase.oss.sync.ZKTreeLockManager.isLocked(ZKTreeLockManager.java:312)
        at 
org.apache.hadoop.hbase.oss.sync.ZKTreeLockManager.writeLockAbove(ZKTreeLockManager.java:183)
        at 
org.apache.hadoop.hbase.oss.sync.TreeLockManager.treeReadLock(TreeLockManager.java:282)
        at 
org.apache.hadoop.hbase.oss.sync.TreeLockManager.lock(TreeLockManager.java:449)
        at 
org.apache.hadoop.hbase.oss.HBaseObjectStoreSemantics.exists(HBaseObjectStoreSemantics.java:498)
        at 
org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:281)
        at 
org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:266)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:360)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1856)
        at 
org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager.secureBulkLoadHFiles(SecureBulkLoadManager.java:266)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2445)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42280)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
Caused by: java.lang.IllegalStateException: Expected state [STARTED] was 
[STOPPED] {noformat}

  was:
This came up when running bulkloads on hbase deployments using HBOSS. The fixes 
introduced by HBASE-23679 use `FileSystem.closeAllForUGI(ugi)` to make sure 
`FileSystem` instances get cleared for the specific running UGI. Problem is 
that `FileSystem.closeAllForUGI` does not remove the instance from 
`FileSystem.CACHE` explicitly, it rather calls `FileSystem.close`, which in 
turn removes itself from `FileSystem.CACHE`. In this case, though, our 
`FileSystem` implementation is `HBaseObjectStoreSemantics`, so 
`FileSystem.closeAllForUGI` closes it, but does not remove it from 
`FileSystem.CACHE`, leading to all attempts to `FileSystem.get` by the same UGI 
retrieving a closed `HBaseObjectStoreSemantics` instance, ultimately failing as 
below:

 
{noformat}
2020-08-26 12:43:57,528 ERROR 
org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager: Failed to complete 
bulk load
java.io.IOException: Exception while testing a lock
        at 
org.apache.hadoop.hbase.oss.sync.ZKTreeLockManager.isLocked(ZKTreeLockManager.java:312)
        at 
org.apache.hadoop.hbase.oss.sync.ZKTreeLockManager.writeLockAbove(ZKTreeLockManager.java:183)
        at 
org.apache.hadoop.hbase.oss.sync.TreeLockManager.treeReadLock(TreeLockManager.java:282)
        at 
org.apache.hadoop.hbase.oss.sync.TreeLockManager.lock(TreeLockManager.java:449)
        at 
org.apache.hadoop.hbase.oss.HBaseObjectStoreSemantics.exists(HBaseObjectStoreSemantics.java:498)
        at 
org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:281)
        at 
org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:266)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:360)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1856)
        at 
org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager.secureBulkLoadHFiles(SecureBulkLoadManager.java:266)
        at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2445)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42280)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
Caused by: java.lang.IllegalStateException: Expected state [STARTED] was 
[STOPPED] {noformat}


> [HBOSS] HBaseObjectStoreSemantics.close should call super.close to make sure 
> its own instance always get removed from FileSystem.CACHE
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-24961
>                 URL: https://issues.apache.org/jira/browse/HBASE-24961
>             Project: HBase
>          Issue Type: Bug
>          Components: Filesystem Integration, hboss
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>
> This came up when running bulkloads on hbase deployments using HBOSS. The 
> fixes introduced by HBASE-23679 use *_FileSystem.closeAllForUGI(ugi)_* to 
> make sure _*FileSystem*_ instances get cleared for the specific running UGI. 
> Problem is that _*FileSystem.closeAllForUGI*_ does not remove the instance 
> from _*FileSystem.CACHE*_ explicitly, it rather calls _*FileSystem.close*_, 
> which in turn removes itself from _*FileSystem.CACHE*_. In this case, though, 
> our _*FileSystem*_ implementation is _*HBaseObjectStoreSemantics*_, so 
> _*FileSystem.closeAllForUGI*_ closes it, but does not remove it from 
> _*FileSystem.CACHE*_, leading to all attempts to _*FileSystem.get*_ by the 
> same UGI retrieving a closed _*HBaseObjectStoreSemantics*_ instance, 
> ultimately failing as below:
>  
> {noformat}
> 2020-08-26 12:43:57,528 ERROR 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager: Failed to 
> complete bulk load
> java.io.IOException: Exception while testing a lock
>         at 
> org.apache.hadoop.hbase.oss.sync.ZKTreeLockManager.isLocked(ZKTreeLockManager.java:312)
>         at 
> org.apache.hadoop.hbase.oss.sync.ZKTreeLockManager.writeLockAbove(ZKTreeLockManager.java:183)
>         at 
> org.apache.hadoop.hbase.oss.sync.TreeLockManager.treeReadLock(TreeLockManager.java:282)
>         at 
> org.apache.hadoop.hbase.oss.sync.TreeLockManager.lock(TreeLockManager.java:449)
>         at 
> org.apache.hadoop.hbase.oss.HBaseObjectStoreSemantics.exists(HBaseObjectStoreSemantics.java:498)
>         at 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:281)
>         at 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:266)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:360)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1856)
>         at 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager.secureBulkLoadHFiles(SecureBulkLoadManager.java:266)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2445)
>         at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42280)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:418)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> Caused by: java.lang.IllegalStateException: Expected state [STARTED] was 
> [STOPPED] {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to