[
https://issues.apache.org/jira/browse/HBASE-22634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873431#comment-16873431
]
Josh Elser commented on HBASE-22634:
------------------------------------
[~Apache9], you might be interested in this one. I think you've been poking
around this code recently.
Any chance you can give us some more detail on each of the bullet-points you
included in the description, Sebastien? Knowing what we're looking for would
help in reading your patch.
bq. The default ThreadPoolExecutor uses a default size of 1 (property
hbase.htable.threads.max)
The code in branch-2 reads like it allows {{Integer.MAX_VALUE}} threads at
most, only defaulting to {{1}} when you explicitly set {{0}} in the
configuration. Are you talking about the pool's core size?
{code}
+ void waitForFreeSlot(int size, long id, int periodToLog, Consumer<Long>
logger);
+
+ void waitForAllFreeSlot(long id);
{code}
Add some javadoc for these methods, please.
{code}
- this.cleanupIdleConnectionTask = IDLE_CONN_SWEEPER.scheduleAtFixedRate(new
Runnable() {
{code}
In isolation, it looks like you removed the internal code to clean up
connections which the caller has left idle. Was this intentional to do so
(without replacement)?
{code}
+ try {
+ isOverKrb = shouldAuthenticateOverKrb();
+ } catch (IOException e) {
+ return;
+ }
{code}
I don't see any expected reason that this method should throw an IOException.
Were you seeing some instance where this catch was important to add? If it's
just because that method is throwing an Exception, I think we should re-throw
loudly as that indicates a bug.
{code}
+ // SG: Must be synchronized because of the background thread
writeBufferPeriodicFlushTimer
{code}
Nit: we don't put attribution comments into code. Folks can work back from the
changelog to find the original author of some part of the codebase.
{code}
- void sendMultiAction(Map<ServerName, MultiAction> actionsByServer,
+ // SG: Must be synchronized because of the background thread
writeBufferPeriodicFlushTimer
+ synchronized void sendMultiAction(Map<ServerName, MultiAction>
actionsByServer,
int numAttempt, List<Action>
actionsForReplicaThread, boolean reuseThread) {
{code}
I'm not following this -- if the BufferedMutator does the periodic flush, that
should get a bin of new mutations that need to be flushed out, creating new
async actions to submit to AP. How would we have two competing threads
submitting the same work? If that _is_ happening from BM, it sounds like that's
an issue in BM, not AP.
{code}
+public class BufferedMutatorThreadPoolExecutor extends ThreadPoolExecutor {
{code}
What is this customization doing different than ThreadPoolExecutor? Needs
comments.
{code}
+ callableInFutureTask =
FutureTask.class.getDeclaredField("callable");
+ callableInFutureTask.setAccessible(true);
{code}
This is a really big smell. We shouldn't be peering into JDK classes and
mucking with them like this. Makes HBase extremely brittle to the JDK version
that we require.
{code}
public void flush() throws InterruptedIOException,
RetriesExhaustedWithDetailsException {
- checkClose();
- doFlush(true);
+ // This will avoid concurrency between period flush, flush() and close()
+ // mutate are not synchronized, because it use doFlush(false)
+ if (writeBufferPeriodicFlushTimer != null) {
+ lock.lock();
+ }
{code}
Why a lock in {{flush()}} and not just make {{doFlush(boolean)}} synchronized?
The javadoc seems unclear in what should happen with multiple concurrent
callers of {{flush()}}. Are they blocking? Does the new call wait for the
previous flush() to return? Does the new call wait for the previous flush() to
return, and then run another flush()?
Finally, a couple of places that need license headers on the files. Thanks for
putting up this patch, Sebastien.
> Improve performance of BufferedMutator
> --------------------------------------
>
> Key: HBASE-22634
> URL: https://issues.apache.org/jira/browse/HBASE-22634
> Project: HBase
> Issue Type: Improvement
> Components: Client
> Affects Versions: 2.1.5
> Environment: HDP 2.6.5
> Linux RedHat
> Reporter: Sebastien Barnoud
> Priority: Major
> Attachments: HBASE-22634.001.branch-2.patch
>
>
> The default ThreadPoolExecutor uses a default size of 1 (property
> hbase.htable.threads.max). When using a size > 1, we still encountered poor
> performance and exception while submitting to the pool (pool exceed its
> capacity).
> This patch propose a fix on different issues encountered when the pool size
> is > 1:
> * thread safety issue
> * concurrent cleanup by Netty and the "legacy" code
> * errors in the backpressure
> * Netty memory leak
> And propose a BufferedMutatorThreadPoolExecutor which:
> * uses hbase.client.max.total.tasks as the default size (instead of 1)
> * some usefull metrics
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)