[ 
https://issues.apache.org/jira/browse/HBASE-22634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873431#comment-16873431
 ] 

Josh Elser commented on HBASE-22634:
------------------------------------

[~Apache9], you might be interested in this one. I think you've been poking 
around this code recently.

Any chance you can give us some more detail on each of the bullet-points you 
included in the description, Sebastien? Knowing what we're looking for would 
help in reading your patch.

bq. The default ThreadPoolExecutor uses a default size of 1 (property 
hbase.htable.threads.max)

The code in branch-2 reads like it allows {{Integer.MAX_VALUE}} threads at 
most, only defaulting to {{1}} when you explicitly set {{0}} in the 
configuration. Are you talking about the pool's core size?

{code}
+  void waitForFreeSlot(int size, long id, int periodToLog, Consumer<Long> 
logger);
+
+  void waitForAllFreeSlot(long id);
{code}

Add some javadoc for these methods, please.

{code}
-    this.cleanupIdleConnectionTask = IDLE_CONN_SWEEPER.scheduleAtFixedRate(new 
Runnable() {
{code}

In isolation, it looks like you removed the internal code to clean up 
connections which the caller has left idle. Was this intentional to do so 
(without replacement)?

{code}
+    try {
+        isOverKrb = shouldAuthenticateOverKrb();
+    } catch (IOException e) {
+        return;
+    }
{code}

I don't see any expected reason that this method should throw an IOException. 
Were you seeing some instance where this catch was important to add? If it's 
just because that method is throwing an Exception, I think we should re-throw 
loudly as that indicates a bug.

{code}
+    // SG: Must be synchronized because of the background thread 
writeBufferPeriodicFlushTimer
{code}

Nit: we don't put attribution comments into code. Folks can work back from the 
changelog to find the original author of some part of the codebase.

{code}
-  void sendMultiAction(Map<ServerName, MultiAction> actionsByServer,
+    // SG: Must be synchronized because of the background thread 
writeBufferPeriodicFlushTimer
+    synchronized void sendMultiAction(Map<ServerName, MultiAction> 
actionsByServer,
                                int numAttempt, List<Action> 
actionsForReplicaThread, boolean reuseThread) {
{code}

I'm not following this -- if the BufferedMutator does the periodic flush, that 
should get a bin of new mutations that need to be flushed out, creating new 
async actions to submit to AP. How would we have two competing threads 
submitting the same work? If that _is_ happening from BM, it sounds like that's 
an issue in BM, not AP.

{code}
+public class BufferedMutatorThreadPoolExecutor extends ThreadPoolExecutor {
{code}

What is this customization doing different than ThreadPoolExecutor? Needs 
comments.

{code}
+                callableInFutureTask = 
FutureTask.class.getDeclaredField("callable");
+                callableInFutureTask.setAccessible(true);
{code}

This is a really big smell. We shouldn't be peering into JDK classes and 
mucking with them like this. Makes HBase extremely brittle to the JDK version 
that we require.

{code}
   public void flush() throws InterruptedIOException, 
RetriesExhaustedWithDetailsException {
-    checkClose();
-    doFlush(true);
+    // This will avoid concurrency between period flush, flush() and close()
+    // mutate are not synchronized, because it use doFlush(false)
+    if (writeBufferPeriodicFlushTimer != null) {
+      lock.lock();
+    }
{code}

Why a lock in {{flush()}} and not just make {{doFlush(boolean)}} synchronized? 
The javadoc seems unclear in what should happen with multiple concurrent 
callers of {{flush()}}. Are they blocking? Does the new call wait for the 
previous flush() to return? Does the new call wait for the previous flush() to 
return, and then run another flush()?

Finally, a couple of places that need license headers on the files. Thanks for 
putting up this patch, Sebastien.

> Improve performance of BufferedMutator
> --------------------------------------
>
>                 Key: HBASE-22634
>                 URL: https://issues.apache.org/jira/browse/HBASE-22634
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client
>    Affects Versions: 2.1.5
>         Environment: HDP 2.6.5
> Linux RedHat
>            Reporter: Sebastien Barnoud
>            Priority: Major
>         Attachments: HBASE-22634.001.branch-2.patch
>
>
> The default ThreadPoolExecutor uses a default size of 1 (property 
> hbase.htable.threads.max). When using a size > 1, we still encountered poor 
> performance and exception while submitting to the pool (pool exceed its 
> capacity).
> This patch propose a fix on different issues encountered when the pool size 
> is > 1:
>  * thread safety issue
>  * concurrent cleanup by Netty and the "legacy" code
>  * errors in the backpressure
>  * Netty memory leak
> And propose a BufferedMutatorThreadPoolExecutor which:
>  * uses hbase.client.max.total.tasks as the default size (instead of 1)
>  * some usefull metrics
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to