[
https://issues.apache.org/jira/browse/HBASE-21342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16659108#comment-16659108
]
Mike Drob commented on HBASE-21342:
-----------------------------------
Thanks for the patch, [~mazhenlin]! It's looking much better now, I think the
structure is all there. Just a few more minor issues with implementation.
{code}
+ ConcurrentHashMap<UserGroupInformation, Integer> ugiReferenceCounter;
+ InternalObserverForFSOperation testObserver = null;
+ void incrementUgiReference(UserGroupInformation ugi) {
+ void decrementUgiReference(UserGroupInformation ugi) {
+ boolean isUserReferenced(UserGroupInformation ugi) {
{code}
should be private
bq. + ugiReferenceCounter = new ConcurrentHashMap();
Needs generics.
{code}
+ @VisibleForTesting
+ interface InternalObserverForFSOperation {
+ void afterFileSystemCreated(HRegion r) throws Exception;
+ }
{code}
{code}
@Override
public void afterFileSystemCreated(HRegion r) throws Exception{
if (r.getRegionInfo().containsRow(key3)) {
ealierBulkload.join(); /// wait util the other bulkload finished
}
}
{code}
Is the exception throwing necessary? I can't tell if we actually use that to
fail the test or not. If not, then the interface could be {{Consumer<HRegion>}}
instead of declaring our own.
{code}
+ void incrementUgiReference(UserGroupInformation ugi) {
+ ugiReferenceCounter.merge(ugi, 1, new BiFunction<Integer, Integer,
Integer>() {
+ @Override
+ public Integer apply(Integer integer, Integer integer2) {
+ return ++integer;
+ }
+ });
+ }
{code}
Personal preference would be to use a lambda here, but that's strictly personal
preference. Would be good practice to call those integer arguments something
more descriptive than "int" and "int2" for folks unfamiliar with how merge
works. Maybe "oldvalue" and "value"
{code}
+ boolean isUserReferenced(UserGroupInformation ugi) {
+ return ugiReferenceCounter.containsKey(ugi) &&
ugiReferenceCounter.get(ugi) > 0;
+ }
{code}
Contains is effectively a get under the hood, so better performance would be to
do a single get and check against null and > 0.
{code}
+ /// just for test
{code}
Unnecessary comment.
{code}
+ /// create table
+ HTableDescriptor htd2 = new HTableDescriptor(TABLE);
+ htd2.addFamily(new HColumnDescriptor(FAMILY));
+ testUtil.getHBaseAdmin().createTable(htd2,
Bytes.toByteArrays(SPLIT_ROWKEY));
{code}
Please don't use deprecated APIs. There's a helper method on HBaseTestUtility
for create table to make this easier. {{createTable(TableName, byte[] family,
byte[][] splits)}}
{code}
+ HTableDescriptor desc = testUtil.getAdmin().getTableDescriptor(TABLE);
+ HColumnDescriptor family = desc.getFamily(FAMILY);
{code}
Again, avoid deprecated APIs.
{code}
+ class MyExceptionToAvoidRetry extends DoNotRetryIOException {
{code}
Why is the original DoNotRetryIOException insufficient here? Probably best to
add a comment.
{code}
+ ealierBulkload = new Thread(new Runnable() {
+ @Override
+ public void run() {
+ try {
+ doBulkloadWithoutRetry(dir1);
+ } catch (Exception e) {
+ LOG.error("bulk load failed .",e);
+ }
+ }
+ });
{code}
If this bulk load fails, do we want that to fail the whole test? I would assume
we do, you can look at
https://github.com/apache/hbase/blob/master/hbase-procedure/src/test/java/org/apache/hadoop/hbase/procedure2/TestProcedureNonce.java#L181-L182
for a pattern of how we do this in other places.
{code}
+@Category({RegionServerTests.class, SmallTests.class})
{code}
This starts a mini-cluster, so I think that would make it a medium test at a
minimum.
> FileSystem in use may get closed by other bulk load call in secure bulkLoad
> ----------------------------------------------------------------------------
>
> Key: HBASE-21342
> URL: https://issues.apache.org/jira/browse/HBASE-21342
> Project: HBase
> Issue Type: Bug
> Affects Versions: 3.0.0, 2.1.0, 1.5.0, 1.3.3, 1.4.4, 2.0.1, 1.2.7
> Reporter: mazhenlin
> Assignee: mazhenlin
> Priority: Major
> Attachments: 21342.v1.txt, HBASE-21342.002.patch,
> HBASE-21342.003.patch, HBASE-21342.004.patch, HBASE-21342.005.patch,
> race.patch
>
>
> As mentioned in [HBASE-15291|#HBASE-15291], there is a race condition. If
> Two secure bulkload calls from the same UGI into two different regions and
> one region finishes earlier, it will close the bulk load fs, and the other
> region will fail.
>
> Another case would be more serious. The FileSystem.close() function needs two
> synchronized variables : CACHE and deleteOnExit. If one region calls
> FileSystem.closeAllForUGI ( in SecureBulkLoadManager.cleanupBulkLoad) while
> another region is trying to close srcFS ( in
> SecureBulkLoadListener.closeSrcFs) , can cause deadlock here.
>
> I have wrote a UT for this and fixed it using reference counter.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)