[jira] [Commented] (PHOENIX-4021) Remove CachingHTableFactory

Samarth Jain (JIRA) Thu, 13 Jul 2017 23:52:13 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086940#comment-16086940
 ]


Samarth Jain commented on PHOENIX-4021:
---------------------------------------

[~jamestaylor] - the RegionCoprocessorEnvironment that is passed in to the 
setup method already has the right config setup in there. 

One important thing to determine would be the kind of backing worker thread 
pool to use. The current CachedHTableFactory shares this pool among the HTables:

{code}
this.pool = new ThreadPoolExecutor(1,
                
env.getConfiguration().getInt(INDEX_WRITES_THREAD_MAX_PER_REGIONSERVER_KEY, 
Integer.MAX_VALUE),
                env.getConfiguration().getInt(HTABLE_KEEP_ALIVE_KEY, 60), 
TimeUnit.SECONDS,
                new SynchronousQueue<Runnable>(), 
Threads.newDaemonThreadFactory("CachedHtables"));
        pool.allowCoreThreadTimeOut(true);
{code}
Notice the kind of worker queue we are using - SynchronousQueue. Also, we are 
configuring the max number of threads as Integer.MAX and core threads to 1. 

If you still want to use the above kind of pool, you might have to tweak 
IndexWriterUtils#CoprocessorHConnectionTableFactory a little bit to make sure 
this pool is passed in to all the HTable constructors.

For ex:

{code}
@Override
        public HTableInterface getTable(ImmutableBytesPtr tablename) throws 
IOException {
            return 
getConnection(conf).getTable(tablename.copyBytesIfNecessary(), sharedPool);
        }
{code}

If you don't specify the thread pool to use, then by default  we use the use 
pool returned by HConnectionImplementation#getBatchPool(). This pool uses a 
LinkedBlockingQueue. And configures number of core and max threads differently.

{code}
int maxThreads = conf.getInt("hbase.hconnection.threads.max", 256);
            int coreThreads = conf.getInt("hbase.hconnection.threads.core", 
256);
            if (maxThreads == 0) {
              maxThreads = Runtime.getRuntime().availableProcessors() * 8;
            }
            if (coreThreads == 0) {
              coreThreads = Runtime.getRuntime().availableProcessors() * 8;
            }
            long keepAliveTime = 
conf.getLong("hbase.hconnection.threads.keepalivetime", 60);
            LinkedBlockingQueue<Runnable> workQueue =
              new LinkedBlockingQueue<Runnable>(maxThreads *
                conf.getInt(HConstants.HBASE_CLIENT_MAX_TOTAL_TASKS,
                  HConstants.DEFAULT_HBASE_CLIENT_MAX_TOTAL_TASKS));
            ThreadPoolExecutor tpe = new ThreadPoolExecutor(
                coreThreads,
                maxThreads,
                keepAliveTime,
                TimeUnit.SECONDS,
                workQueue,
                Threads.newDaemonThreadFactory(toString() + "-shared-"));
            tpe.allowCoreThreadTimeOut(true);
            this.batchPool = tpe;
{code}

We should definitely perf test which pool to use here. My hunch is that the way 
getBatchPool() is configured will perform better since it configures core pool 
size to a more reasonable value. Also, the executor in CachedHTableFactory ends 
up creating a new thread (up to the limit of INTEGER.MAX), every time there is 
more work than the number of threads seems a bit wasteful. Either way, we 
should perf test it.

> Remove CachingHTableFactory
> ---------------------------
>
>                 Key: PHOENIX-4021
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4021
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.11.0
>            Reporter: Geoffrey Jacoby
>            Assignee: Geoffrey Jacoby
>              Labels: globalMutableSecondaryIndex
>             Fix For: 4.12.0
>
>         Attachments: PHOENIX-4021.patch
>
>
> CachingHTableFactory is used as a performance optimization when writing to 
> global indexes so that HTable instances are cached and later automatically 
> cleaned up, rather than instantiated each time we write to an index.
> This should be removed for two reasons:
> 1. It opens us up to race conditions, because HTables aren't threadsafe, but 
> CachingHTableFactory doesn't guard against two threads both grabbing the same 
> HTable and using it simultaneously. Since all ops going through a region 
> share the same IndexWriter and ParallelWriterIndexCommitter, and hence the 
> same CachingHTableFactory, that means separate operations can both be holding 
> the same HTable. 
> 2. According to discussion on PHOENIX-3159, and offline discussions I've had 
> with [~apurtell], HBase 1.x and above make creating throwaway HTable 
> instances cheap so the caching is no longer needed.
> For 4.x-HBase-1.x and master, we should remove CachingHTableFactory, and for 
> 4.x-HBase-0.98, we should either get rid of it (if it's not too much of a 
> perf hit) or at least make it threadsafe.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PHOENIX-4021) Remove CachingHTableFactory

Reply via email to