[
https://issues.apache.org/jira/browse/HIVE-26794?focusedWorklogId=837600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-837600
]
ASF GitHub Bot logged work on HIVE-26794:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 06/Jan/23 21:09
Start Date: 06/Jan/23 21:09
Worklog Time Spent: 10m
Work Description: cnauroth commented on code in PR #3817:
URL: https://github.com/apache/hive/pull/3817#discussion_r1063738486
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DbCPDataSourceProvider.java:
##########
@@ -112,9 +113,17 @@ public DataSource create(Configuration hdpConfig, int
maxPoolSize) throws SQLExc
objectPool.setSoftMinEvictableIdleTimeMillis(softMinEvictableIdleTimeMillis);
objectPool.setLifo(lifo);
+ if ("mutex".equalsIgnoreCase(poolName)) {
Review Comment:
I suggest adding a comment to explain why we are limiting database
connections related to mutexes, similar to the comment you already put in
`HikariCPDataSourceProvider`.
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DbCPDataSourceProvider.java:
##########
@@ -112,9 +113,17 @@ public DataSource create(Configuration hdpConfig, int
maxPoolSize) throws SQLExc
objectPool.setSoftMinEvictableIdleTimeMillis(softMinEvictableIdleTimeMillis);
objectPool.setLifo(lifo);
+ if ("mutex".equalsIgnoreCase(poolName)) {
+ if (timeBetweenEvictionRuns < 0) {
+ objectPool.setTimeBetweenEvictionRunsMillis(30 * 1000);
+ }
+ if (softMinEvictableIdleTimeMillis < 0) {
+ objectPool.setSoftMinEvictableIdleTimeMillis(600 * 1000);
+ }
+ }
String stmt = dbProduct.getPrepareTxnStmt();
if (stmt != null) {
- poolableConnFactory.setValidationQuery(stmt);
+ poolableConnFactory.setConnectionInitSql(Arrays.asList(stmt));
Review Comment:
Nitpick: consider `Collections#singletonList` for a single element.
##########
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/datasource/TestDataSourceProviderFactory.java:
##########
@@ -143,6 +148,62 @@ public void testCreateDbCpDataSource() throws SQLException
{
Assert.assertTrue(ds instanceof PoolingDataSource);
}
+ @Test
+ public void testEvictIdleConnection() throws Exception {
+ String[] dataSourceType = {HikariCPDataSourceProvider.HIKARI,
DbCPDataSourceProvider.DBCP};
+ try (DataSourceProvider.DataSourceNameConfigurator configurator =
+ new DataSourceProvider.DataSourceNameConfigurator(conf, "mutex"))
{
+ for (final String type: dataSourceType) {
+ MetastoreConf.setVar(conf, ConfVars.CONNECTION_POOLING_TYPE, type);
+ boolean isHikari = HikariCPDataSourceProvider.HIKARI.equals(type);
+ if (isHikari) {
+ conf.unset("hikaricp.connectionInitSql");
+ // The minimum of idleTimeout is 10s
+ conf.set("hikaricp.idleTimeout", "10000");
+ System.setProperty("com.zaxxer.hikari.housekeeping.periodMs",
"1000");
+ } else {
+ conf.set("dbcp.timeBetweenEvictionRunsMillis", "1000");
+ conf.set("dbcp.softMinEvictableIdleTimeMillis", "3000");
+ conf.set("dbcp.maxIdle", "0");
+ }
+ DataSourceProvider dsp =
DataSourceProviderFactory.tryGetDataSourceProviderOrNull(conf);
+ DataSource ds = dsp.create(conf, 5);
+ List<Connection> connections = new ArrayList<>();
+ for (int i = 0; i < 5; i++) {
+ connections.add(ds.getConnection());
+ }
+ HikariPoolMXBean poolMXBean = null;
+ GenericObjectPool objectPool = null;
+ if (isHikari) {
+ poolMXBean = ((HikariDataSource) ds).getHikariPoolMXBean();
+ Assert.assertEquals(5, poolMXBean.getTotalConnections());
+ Assert.assertEquals(5, poolMXBean.getActiveConnections());
+ } else {
+ Method getPool =
PoolingDataSource.class.getDeclaredMethod("getPool");
Review Comment:
These 3 lines could be shortened to 1 using `invokeMethod` from Commons Lang
[MethodUtils](https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/reflect/MethodUtils.html).
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java:
##########
@@ -72,6 +72,17 @@ public DataSource create(Configuration hdpConfig, int
maxPoolSize) throws SQLExc
config.setPoolName(poolName);
}
+ // It's kind of a waste to create a fixed size connection pool as same as
the TxnHandler#connPool,
+ // TxnHandler#connPoolMutex is mostly used for MutexAPI that is primarily
designed to
+ // provide coarse-grained mutex support to maintenance tasks running
inside the Metastore,
+ // add minimumIdle=2 and idleTimeout=5min to the pool, so that the
connection pool can retire
+ // the idle connection aggressively, this will make Metastore more
scalable especially if
+ // there is a leader in the warehouse.
+ if ("mutex".equals(poolName)) {
+ config.setMinimumIdle(Math.min(maxPoolSize, 2));
Review Comment:
It looks like the behavior is different between `DbCPDataSourceProvider` and
`HikariCPDataSourceProvider` now:
* **Hikari**: Set minimum idle connections to 2 (or smaller). Note that this
will override any user configuration of `hikaricp.minimumIdle` in
metastore-site.xml or hive-site.xml.
* **DbCP**: Keep using the same user-configured minimum idle connections,
and instead override the eviction timing.
Can you please explain why they are different? FWIW, the Hikari logic is the
one that looks right to me: decrease idle connections for mutexes.
##########
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/datasource/TestDataSourceProviderFactory.java:
##########
@@ -143,6 +148,62 @@ public void testCreateDbCpDataSource() throws SQLException
{
Assert.assertTrue(ds instanceof PoolingDataSource);
}
+ @Test
+ public void testEvictIdleConnection() throws Exception {
+ String[] dataSourceType = {HikariCPDataSourceProvider.HIKARI,
DbCPDataSourceProvider.DBCP};
+ try (DataSourceProvider.DataSourceNameConfigurator configurator =
+ new DataSourceProvider.DataSourceNameConfigurator(conf, "mutex"))
{
+ for (final String type: dataSourceType) {
Review Comment:
If a test fails, it will be hard to tell if it failed while testing Hikari
vs. DbCP. I suggest using customized assert messages that include
`dataSourceType`.
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java:
##########
@@ -72,6 +72,17 @@ public DataSource create(Configuration hdpConfig, int
maxPoolSize) throws SQLExc
config.setPoolName(poolName);
}
+ // It's kind of a waste to create a fixed size connection pool as same as
the TxnHandler#connPool,
+ // TxnHandler#connPoolMutex is mostly used for MutexAPI that is primarily
designed to
+ // provide coarse-grained mutex support to maintenance tasks running
inside the Metastore,
+ // add minimumIdle=2 and idleTimeout=5min to the pool, so that the
connection pool can retire
+ // the idle connection aggressively, this will make Metastore more
scalable especially if
+ // there is a leader in the warehouse.
+ if ("mutex".equals(poolName)) {
Review Comment:
`"mutex"` is repeated in a few places, and correctness of this logic depends
on the `DataSourceProvider` code being in sync with the pool name chosen in
`TxnHandler`. I guess ideally, all of the data source name strings should be
moved to shared constants somewhere, maybe as a separate refactoring patch.
Issue Time Tracking
-------------------
Worklog Id: (was: 837600)
Time Spent: 4h 10m (was: 4h)
> Explore retiring TxnHandler#connPoolMutex idle connections
> ----------------------------------------------------------
>
> Key: HIVE-26794
> URL: https://issues.apache.org/jira/browse/HIVE-26794
> Project: Hive
> Issue Type: Improvement
> Components: Standalone Metastore
> Reporter: Zhihua Deng
> Assignee: Zhihua Deng
> Priority: Major
> Labels: pull-request-available
> Time Spent: 4h 10m
> Remaining Estimate: 0h
>
> Instead of creating a fixed size connection pool for TxnHandler#MutexAPI, the
> pool can be assigned to a more dynamic size pool due to:
> * TxnHandler#MutexAPI is primarily designed to provide coarse-grained mutex
> support to maintenance tasks running inside the Metastore, these tasks are
> not user faced;
> * A fixed size connection pool as same as the pool used in ObjectStore is a
> waste for other non leaders in the warehouse;
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)