cnauroth commented on code in PR #3817:
URL: https://github.com/apache/hive/pull/3817#discussion_r1063738486
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DbCPDataSourceProvider.java:
##########
@@ -112,9 +113,17 @@ public DataSource create(Configuration hdpConfig, int
maxPoolSize) throws SQLExc
objectPool.setSoftMinEvictableIdleTimeMillis(softMinEvictableIdleTimeMillis);
objectPool.setLifo(lifo);
+ if ("mutex".equalsIgnoreCase(poolName)) {
Review Comment:
I suggest adding a comment to explain why we are limiting database
connections related to mutexes, similar to the comment you already put in
`HikariCPDataSourceProvider`.
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DbCPDataSourceProvider.java:
##########
@@ -112,9 +113,17 @@ public DataSource create(Configuration hdpConfig, int
maxPoolSize) throws SQLExc
objectPool.setSoftMinEvictableIdleTimeMillis(softMinEvictableIdleTimeMillis);
objectPool.setLifo(lifo);
+ if ("mutex".equalsIgnoreCase(poolName)) {
+ if (timeBetweenEvictionRuns < 0) {
+ objectPool.setTimeBetweenEvictionRunsMillis(30 * 1000);
+ }
+ if (softMinEvictableIdleTimeMillis < 0) {
+ objectPool.setSoftMinEvictableIdleTimeMillis(600 * 1000);
+ }
+ }
String stmt = dbProduct.getPrepareTxnStmt();
if (stmt != null) {
- poolableConnFactory.setValidationQuery(stmt);
+ poolableConnFactory.setConnectionInitSql(Arrays.asList(stmt));
Review Comment:
Nitpick: consider `Collections#singletonList` for a single element.
##########
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/datasource/TestDataSourceProviderFactory.java:
##########
@@ -143,6 +148,62 @@ public void testCreateDbCpDataSource() throws SQLException
{
Assert.assertTrue(ds instanceof PoolingDataSource);
}
+ @Test
+ public void testEvictIdleConnection() throws Exception {
+ String[] dataSourceType = {HikariCPDataSourceProvider.HIKARI,
DbCPDataSourceProvider.DBCP};
+ try (DataSourceProvider.DataSourceNameConfigurator configurator =
+ new DataSourceProvider.DataSourceNameConfigurator(conf, "mutex"))
{
+ for (final String type: dataSourceType) {
+ MetastoreConf.setVar(conf, ConfVars.CONNECTION_POOLING_TYPE, type);
+ boolean isHikari = HikariCPDataSourceProvider.HIKARI.equals(type);
+ if (isHikari) {
+ conf.unset("hikaricp.connectionInitSql");
+ // The minimum of idleTimeout is 10s
+ conf.set("hikaricp.idleTimeout", "10000");
+ System.setProperty("com.zaxxer.hikari.housekeeping.periodMs",
"1000");
+ } else {
+ conf.set("dbcp.timeBetweenEvictionRunsMillis", "1000");
+ conf.set("dbcp.softMinEvictableIdleTimeMillis", "3000");
+ conf.set("dbcp.maxIdle", "0");
+ }
+ DataSourceProvider dsp =
DataSourceProviderFactory.tryGetDataSourceProviderOrNull(conf);
+ DataSource ds = dsp.create(conf, 5);
+ List<Connection> connections = new ArrayList<>();
+ for (int i = 0; i < 5; i++) {
+ connections.add(ds.getConnection());
+ }
+ HikariPoolMXBean poolMXBean = null;
+ GenericObjectPool objectPool = null;
+ if (isHikari) {
+ poolMXBean = ((HikariDataSource) ds).getHikariPoolMXBean();
+ Assert.assertEquals(5, poolMXBean.getTotalConnections());
+ Assert.assertEquals(5, poolMXBean.getActiveConnections());
+ } else {
+ Method getPool =
PoolingDataSource.class.getDeclaredMethod("getPool");
Review Comment:
These 3 lines could be shortened to 1 using `invokeMethod` from Commons Lang
[MethodUtils](https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/reflect/MethodUtils.html).
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java:
##########
@@ -72,6 +72,17 @@ public DataSource create(Configuration hdpConfig, int
maxPoolSize) throws SQLExc
config.setPoolName(poolName);
}
+ // It's kind of a waste to create a fixed size connection pool as same as
the TxnHandler#connPool,
+ // TxnHandler#connPoolMutex is mostly used for MutexAPI that is primarily
designed to
+ // provide coarse-grained mutex support to maintenance tasks running
inside the Metastore,
+ // add minimumIdle=2 and idleTimeout=5min to the pool, so that the
connection pool can retire
+ // the idle connection aggressively, this will make Metastore more
scalable especially if
+ // there is a leader in the warehouse.
+ if ("mutex".equals(poolName)) {
+ config.setMinimumIdle(Math.min(maxPoolSize, 2));
Review Comment:
It looks like the behavior is different between `DbCPDataSourceProvider` and
`HikariCPDataSourceProvider` now:
* **Hikari**: Set minimum idle connections to 2 (or smaller). Note that this
will override any user configuration of `hikaricp.minimumIdle` in
metastore-site.xml or hive-site.xml.
* **DbCP**: Keep using the same user-configured minimum idle connections,
and instead override the eviction timing.
Can you please explain why they are different? FWIW, the Hikari logic is the
one that looks right to me: decrease idle connections for mutexes.
##########
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/datasource/TestDataSourceProviderFactory.java:
##########
@@ -143,6 +148,62 @@ public void testCreateDbCpDataSource() throws SQLException
{
Assert.assertTrue(ds instanceof PoolingDataSource);
}
+ @Test
+ public void testEvictIdleConnection() throws Exception {
+ String[] dataSourceType = {HikariCPDataSourceProvider.HIKARI,
DbCPDataSourceProvider.DBCP};
+ try (DataSourceProvider.DataSourceNameConfigurator configurator =
+ new DataSourceProvider.DataSourceNameConfigurator(conf, "mutex"))
{
+ for (final String type: dataSourceType) {
Review Comment:
If a test fails, it will be hard to tell if it failed while testing Hikari
vs. DbCP. I suggest using customized assert messages that include
`dataSourceType`.
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java:
##########
@@ -72,6 +72,17 @@ public DataSource create(Configuration hdpConfig, int
maxPoolSize) throws SQLExc
config.setPoolName(poolName);
}
+ // It's kind of a waste to create a fixed size connection pool as same as
the TxnHandler#connPool,
+ // TxnHandler#connPoolMutex is mostly used for MutexAPI that is primarily
designed to
+ // provide coarse-grained mutex support to maintenance tasks running
inside the Metastore,
+ // add minimumIdle=2 and idleTimeout=5min to the pool, so that the
connection pool can retire
+ // the idle connection aggressively, this will make Metastore more
scalable especially if
+ // there is a leader in the warehouse.
+ if ("mutex".equals(poolName)) {
Review Comment:
`"mutex"` is repeated in a few places, and correctness of this logic depends
on the `DataSourceProvider` code being in sync with the pool name chosen in
`TxnHandler`. I guess ideally, all of the data source name strings should be
moved to shared constants somewhere, maybe as a separate refactoring patch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]