[
https://issues.apache.org/jira/browse/PHOENIX-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086940#comment-16086940
]
Samarth Jain commented on PHOENIX-4021:
---------------------------------------
[~jamestaylor] - the RegionCoprocessorEnvironment that is passed in to the
setup method already has the right config setup in there.
One important thing to determine would be the kind of backing worker thread
pool to use. The current CachedHTableFactory shares this pool among the HTables:
{code}
this.pool = new ThreadPoolExecutor(1,
env.getConfiguration().getInt(INDEX_WRITES_THREAD_MAX_PER_REGIONSERVER_KEY,
Integer.MAX_VALUE),
env.getConfiguration().getInt(HTABLE_KEEP_ALIVE_KEY, 60),
TimeUnit.SECONDS,
new SynchronousQueue<Runnable>(),
Threads.newDaemonThreadFactory("CachedHtables"));
pool.allowCoreThreadTimeOut(true);
{code}
Notice the kind of worker queue we are using - SynchronousQueue. Also, we are
configuring the max number of threads as Integer.MAX and core threads to 1.
If you still want to use the above kind of pool, you might have to tweak
IndexWriterUtils#CoprocessorHConnectionTableFactory a little bit to make sure
this pool is passed in to all the HTable constructors.
For ex:
{code}
@Override
public HTableInterface getTable(ImmutableBytesPtr tablename) throws
IOException {
return
getConnection(conf).getTable(tablename.copyBytesIfNecessary(), sharedPool);
}
{code}
If you don't specify the thread pool to use, then by default we use the use
pool returned by HConnectionImplementation#getBatchPool(). This pool uses a
LinkedBlockingQueue. And configures number of core and max threads differently.
{code}
int maxThreads = conf.getInt("hbase.hconnection.threads.max", 256);
int coreThreads = conf.getInt("hbase.hconnection.threads.core",
256);
if (maxThreads == 0) {
maxThreads = Runtime.getRuntime().availableProcessors() * 8;
}
if (coreThreads == 0) {
coreThreads = Runtime.getRuntime().availableProcessors() * 8;
}
long keepAliveTime =
conf.getLong("hbase.hconnection.threads.keepalivetime", 60);
LinkedBlockingQueue<Runnable> workQueue =
new LinkedBlockingQueue<Runnable>(maxThreads *
conf.getInt(HConstants.HBASE_CLIENT_MAX_TOTAL_TASKS,
HConstants.DEFAULT_HBASE_CLIENT_MAX_TOTAL_TASKS));
ThreadPoolExecutor tpe = new ThreadPoolExecutor(
coreThreads,
maxThreads,
keepAliveTime,
TimeUnit.SECONDS,
workQueue,
Threads.newDaemonThreadFactory(toString() + "-shared-"));
tpe.allowCoreThreadTimeOut(true);
this.batchPool = tpe;
{code}
We should definitely perf test which pool to use here. My hunch is that the way
getBatchPool() is configured will perform better since it configures core pool
size to a more reasonable value. Also, the executor in CachedHTableFactory ends
up creating a new thread (up to the limit of INTEGER.MAX), every time there is
more work than the number of threads seems a bit wasteful. Either way, we
should perf test it.
> Remove CachingHTableFactory
> ---------------------------
>
> Key: PHOENIX-4021
> URL: https://issues.apache.org/jira/browse/PHOENIX-4021
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.11.0
> Reporter: Geoffrey Jacoby
> Assignee: Geoffrey Jacoby
> Labels: globalMutableSecondaryIndex
> Fix For: 4.12.0
>
> Attachments: PHOENIX-4021.patch
>
>
> CachingHTableFactory is used as a performance optimization when writing to
> global indexes so that HTable instances are cached and later automatically
> cleaned up, rather than instantiated each time we write to an index.
> This should be removed for two reasons:
> 1. It opens us up to race conditions, because HTables aren't threadsafe, but
> CachingHTableFactory doesn't guard against two threads both grabbing the same
> HTable and using it simultaneously. Since all ops going through a region
> share the same IndexWriter and ParallelWriterIndexCommitter, and hence the
> same CachingHTableFactory, that means separate operations can both be holding
> the same HTable.
> 2. According to discussion on PHOENIX-3159, and offline discussions I've had
> with [~apurtell], HBase 1.x and above make creating throwaway HTable
> instances cheap so the caching is no longer needed.
> For 4.x-HBase-1.x and master, we should remove CachingHTableFactory, and for
> 4.x-HBase-0.98, we should either get rid of it (if it's not too much of a
> perf hit) or at least make it threadsafe.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)