[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

Mike Drob (JIRA) Mon, 22 Oct 2018 08:59:18 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16659108#comment-16659108
 ]


Mike Drob commented on HBASE-21342:
-----------------------------------

Thanks for the patch, [~mazhenlin]! It's looking much better now, I think the 
structure is all there. Just a few more minor issues with implementation.

{code}
+  ConcurrentHashMap<UserGroupInformation, Integer> ugiReferenceCounter;
+  InternalObserverForFSOperation testObserver = null;
+  void incrementUgiReference(UserGroupInformation ugi) {
+  void decrementUgiReference(UserGroupInformation ugi) {
+  boolean isUserReferenced(UserGroupInformation ugi) {
{code}
should be private

bq. +    ugiReferenceCounter = new ConcurrentHashMap();
Needs generics.

{code}
+  @VisibleForTesting
+  interface InternalObserverForFSOperation {
+    void afterFileSystemCreated(HRegion r) throws Exception;
+  }
{code}
{code}
  @Override
  public void afterFileSystemCreated(HRegion r) throws Exception{
    if (r.getRegionInfo().containsRow(key3)) {
      ealierBulkload.join(); /// wait util the other bulkload finished
    }
  }
{code}
Is the exception throwing necessary? I can't tell if we actually use that to 
fail the test or not. If not, then the interface could be {{Consumer<HRegion>}} 
instead of declaring our own.

{code}
+  void incrementUgiReference(UserGroupInformation ugi) {
+    ugiReferenceCounter.merge(ugi, 1, new BiFunction<Integer, Integer, 
Integer>() {
+      @Override
+      public Integer apply(Integer integer, Integer integer2) {
+        return ++integer;
+      }
+    });
+  }
{code}
Personal preference would be to use a lambda here, but that's strictly personal 
preference. Would be good practice to call those integer arguments something 
more descriptive than "int" and "int2" for folks unfamiliar with how merge 
works. Maybe "oldvalue" and "value"

{code}
+  boolean isUserReferenced(UserGroupInformation ugi) {
+    return ugiReferenceCounter.containsKey(ugi) && 
ugiReferenceCounter.get(ugi) > 0;
+  }
{code}
Contains is effectively a get under the hood, so better performance would be to 
do a single get and check against null and > 0.

{code}
+            /// just for test
{code}
Unnecessary comment.

{code}
+    /// create table
+    HTableDescriptor htd2 = new HTableDescriptor(TABLE);
+    htd2.addFamily(new HColumnDescriptor(FAMILY));
+    testUtil.getHBaseAdmin().createTable(htd2, 
Bytes.toByteArrays(SPLIT_ROWKEY));
{code}
Please don't use deprecated APIs. There's a helper method on HBaseTestUtility 
for create table to make this easier. {{createTable(TableName, byte[] family, 
byte[][] splits)}}

{code}
+    HTableDescriptor desc = testUtil.getAdmin().getTableDescriptor(TABLE);
+    HColumnDescriptor family = desc.getFamily(FAMILY);
{code}
Again, avoid deprecated APIs.

{code}
+  class MyExceptionToAvoidRetry extends DoNotRetryIOException {
{code}
Why is the original DoNotRetryIOException insufficient here? Probably best to 
add a comment.

{code}
+    ealierBulkload = new Thread(new Runnable() {
+      @Override
+      public void run() {
+        try {
+          doBulkloadWithoutRetry(dir1);
+        } catch (Exception e) {
+          LOG.error("bulk load failed .",e);
+        }
+      }
+    });
{code}
If this bulk load fails, do we want that to fail the whole test? I would assume 
we do, you can look at 
https://github.com/apache/hbase/blob/master/hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestProcedureNonce.java#L181-L182
 for a pattern of how we do this in other places.

{code}
+@Category({RegionServerTests.class, SmallTests.class})
{code}
This starts a mini-cluster, so I think that would make it a medium test at a 
minimum.

> FileSystem in use may get closed by other bulk load call  in secure bulkLoad
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-21342
>                 URL: https://issues.apache.org/jira/browse/HBASE-21342
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
>            Reporter: mazhenlin
>            Assignee: mazhenlin
>            Priority: Major
>         Attachments: 21342.v1.txt, HBASE-21342.002.patch, 
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch, 
> race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition.   If 
> Two secure bulkload calls  from the same UGI into two different regions and 
> one region finishes earlier, it will close the bulk load fs, and the other 
> region will fail.
>  
> Another case would be more serious. The FileSystem.close() function needs two 
> synchronized variables : CACHE and deleteOnExit. If one region calls 
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while 
> another region is trying to close srcFS ( in  
> SecureBulkLoadListener.closeSrcFs)   , can cause deadlock here.
>  
> I have wrote a UT for this and fixed it using reference counter.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21342) FileSystem in use may get closed by other bulk load call in secure bulkLoad

Reply via email to