cnauroth commented on code in PR #3817:
URL: https://github.com/apache/hive/pull/3817#discussion_r1063738486


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DbCPDataSourceProvider.java:
##########
@@ -112,9 +113,17 @@ public DataSource create(Configuration hdpConfig, int 
maxPoolSize) throws SQLExc
     
objectPool.setSoftMinEvictableIdleTimeMillis(softMinEvictableIdleTimeMillis);
     objectPool.setLifo(lifo);
 
+    if ("mutex".equalsIgnoreCase(poolName)) {

Review Comment:
   I suggest adding a comment to explain why we are limiting database 
connections related to mutexes, similar to the comment you already put in 
`HikariCPDataSourceProvider`.



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/DbCPDataSourceProvider.java:
##########
@@ -112,9 +113,17 @@ public DataSource create(Configuration hdpConfig, int 
maxPoolSize) throws SQLExc
     
objectPool.setSoftMinEvictableIdleTimeMillis(softMinEvictableIdleTimeMillis);
     objectPool.setLifo(lifo);
 
+    if ("mutex".equalsIgnoreCase(poolName)) {
+      if (timeBetweenEvictionRuns < 0) {
+        objectPool.setTimeBetweenEvictionRunsMillis(30 * 1000);
+      }
+      if (softMinEvictableIdleTimeMillis < 0) {
+        objectPool.setSoftMinEvictableIdleTimeMillis(600 * 1000);
+      }
+    }
     String stmt = dbProduct.getPrepareTxnStmt();
     if (stmt != null) {
-      poolableConnFactory.setValidationQuery(stmt);
+      poolableConnFactory.setConnectionInitSql(Arrays.asList(stmt));

Review Comment:
   Nitpick: consider `Collections#singletonList` for a single element.



##########
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/datasource/TestDataSourceProviderFactory.java:
##########
@@ -143,6 +148,62 @@ public void testCreateDbCpDataSource() throws SQLException 
{
     Assert.assertTrue(ds instanceof PoolingDataSource);
   }
 
+  @Test
+  public void testEvictIdleConnection() throws Exception {
+    String[] dataSourceType = {HikariCPDataSourceProvider.HIKARI, 
DbCPDataSourceProvider.DBCP};
+    try (DataSourceProvider.DataSourceNameConfigurator configurator =
+             new DataSourceProvider.DataSourceNameConfigurator(conf, "mutex")) 
{
+      for (final String type: dataSourceType) {
+        MetastoreConf.setVar(conf, ConfVars.CONNECTION_POOLING_TYPE, type);
+        boolean isHikari = HikariCPDataSourceProvider.HIKARI.equals(type);
+        if (isHikari) {
+          conf.unset("hikaricp.connectionInitSql");
+          // The minimum of idleTimeout is 10s
+          conf.set("hikaricp.idleTimeout", "10000");
+          System.setProperty("com.zaxxer.hikari.housekeeping.periodMs", 
"1000");
+        } else {
+          conf.set("dbcp.timeBetweenEvictionRunsMillis", "1000");
+          conf.set("dbcp.softMinEvictableIdleTimeMillis", "3000");
+          conf.set("dbcp.maxIdle", "0");
+        }
+        DataSourceProvider dsp = 
DataSourceProviderFactory.tryGetDataSourceProviderOrNull(conf);
+        DataSource ds = dsp.create(conf, 5);
+        List<Connection> connections = new ArrayList<>();
+        for (int i = 0; i < 5; i++) {
+          connections.add(ds.getConnection());
+        }
+        HikariPoolMXBean poolMXBean = null;
+        GenericObjectPool objectPool = null;
+        if (isHikari) {
+          poolMXBean = ((HikariDataSource) ds).getHikariPoolMXBean();
+          Assert.assertEquals(5, poolMXBean.getTotalConnections());
+          Assert.assertEquals(5, poolMXBean.getActiveConnections());
+        } else {
+          Method getPool = 
PoolingDataSource.class.getDeclaredMethod("getPool");

Review Comment:
   These 3 lines could be shortened to 1 using `invokeMethod` from Commons Lang 
[MethodUtils](https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/reflect/MethodUtils.html).



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java:
##########
@@ -72,6 +72,17 @@ public DataSource create(Configuration hdpConfig, int 
maxPoolSize) throws SQLExc
       config.setPoolName(poolName);
     }
 
+    // It's kind of a waste to create a fixed size connection pool as same as 
the TxnHandler#connPool,
+    // TxnHandler#connPoolMutex is mostly used for MutexAPI that is primarily 
designed to
+    // provide coarse-grained mutex support to maintenance tasks running 
inside the Metastore,
+    // add minimumIdle=2 and idleTimeout=5min to the pool, so that the 
connection pool can retire
+    // the idle connection aggressively, this will make Metastore more 
scalable especially if
+    // there is a leader in the warehouse.
+    if ("mutex".equals(poolName)) {
+      config.setMinimumIdle(Math.min(maxPoolSize, 2));

Review Comment:
   It looks like the behavior is different between `DbCPDataSourceProvider` and 
`HikariCPDataSourceProvider` now:
   
   * **Hikari**: Set minimum idle connections to 2 (or smaller). Note that this 
will override any user configuration of `hikaricp.minimumIdle` in 
metastore-site.xml or hive-site.xml.
   * **DbCP**: Keep using the same user-configured minimum idle connections, 
and instead override the eviction timing.
   
   Can you please explain why they are different? FWIW, the Hikari logic is the 
one that looks right to me: decrease idle connections for mutexes.



##########
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/datasource/TestDataSourceProviderFactory.java:
##########
@@ -143,6 +148,62 @@ public void testCreateDbCpDataSource() throws SQLException 
{
     Assert.assertTrue(ds instanceof PoolingDataSource);
   }
 
+  @Test
+  public void testEvictIdleConnection() throws Exception {
+    String[] dataSourceType = {HikariCPDataSourceProvider.HIKARI, 
DbCPDataSourceProvider.DBCP};
+    try (DataSourceProvider.DataSourceNameConfigurator configurator =
+             new DataSourceProvider.DataSourceNameConfigurator(conf, "mutex")) 
{
+      for (final String type: dataSourceType) {

Review Comment:
   If a test fails, it will be hard to tell if it failed while testing Hikari 
vs. DbCP. I suggest using customized assert messages that include 
`dataSourceType`.



##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/datasource/HikariCPDataSourceProvider.java:
##########
@@ -72,6 +72,17 @@ public DataSource create(Configuration hdpConfig, int 
maxPoolSize) throws SQLExc
       config.setPoolName(poolName);
     }
 
+    // It's kind of a waste to create a fixed size connection pool as same as 
the TxnHandler#connPool,
+    // TxnHandler#connPoolMutex is mostly used for MutexAPI that is primarily 
designed to
+    // provide coarse-grained mutex support to maintenance tasks running 
inside the Metastore,
+    // add minimumIdle=2 and idleTimeout=5min to the pool, so that the 
connection pool can retire
+    // the idle connection aggressively, this will make Metastore more 
scalable especially if
+    // there is a leader in the warehouse.
+    if ("mutex".equals(poolName)) {

Review Comment:
   `"mutex"` is repeated in a few places, and correctness of this logic depends 
on the `DataSourceProvider` code being in sync with the pool name chosen in 
`TxnHandler`. I guess ideally, all of the data source name strings should be 
moved to shared constants somewhere, maybe as a separate refactoring patch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to