Jonathan Hurley created AMBARI-16830:
----------------------------------------

             Summary: Desired Configuration Cache Expiration Caused 10,000's of 
Database Hits In Large Deployments
                 Key: AMBARI-16830
                 URL: https://issues.apache.org/jira/browse/AMBARI-16830
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.2.2
            Reporter: Jonathan Hurley
            Assignee: Jonathan Hurley
            Priority: Blocker
             Fix For: 2.4.0


In large deployments where the number of hosts * the number of components is 
large (10,000 for example), then the {{ConfigHelper.isStale()}} method could 
make 10,000's of database queries every minute. 

Consider a 3-minute trace:

{code}
server.persistence.properties.eclipselink.profiler=PerformanceMonitor
{code}

{code:title=Time = 3 minutes}
Counter:ReadAllQuery:org.apache.ambari.server.orm.entities.ClusterConfigMappingEntity:null
    11,716

Timer:ReadAllQuery:org.apache.ambari.server.orm.entities.ClusterConfigMappingEntity:null
    80,520,541,000
Timer:ReadAllQuery:org.apache.ambari.server.orm.entities.ClusterConfigMappingEntity:null:ObjectBuilding
    19,741,257,000
Timer:ReadAllQuery:org.apache.ambari.server.orm.entities.ClusterConfigMappingEntity:null:QueryPreparation
    414,000
Timer:ReadAllQuery:org.apache.ambari.server.orm.entities.ClusterConfigMappingEntity:null:RowFetch
    6,032,673,000
Timer:ReadAllQuery:org.apache.ambari.server.orm.entities.ClusterConfigMappingEntity:null:SqlGeneration
    79,000
Timer:ReadAllQuery:org.apache.ambari.server.orm.entities.ClusterConfigMappingEntity:null:SqlPrepare
    232,532,000
Timer:ReadAllQuery:org.apache.ambari.server.orm.entities.ClusterConfigMappingEntity:null:StatementExecute
    33,624,662,000
{code}

The {{ClusterConfigMappingEntity:null}} is requested over 10,000 times. If this 
value exceeds the cache of stale configs (or even if it doesn't) this causes a 
massive performance delay in the Jetty threads since the database is being 
hammered and other {{PropertyProviders}} must wait until it's done.

- Setting the {{server.cache.isStale.expiration}} value to 28800 improves the 
behavior of the system
-- Ambari goes from totally unsuable to usable
-- Startup is still an issue as the code still has to make 10,000's of calls, 
but those flatten out after the cache is populated. So, during startup, it's 
unresponsive.
-- After startup, you can use Ambari to send commands and browse around without 
delay
-- If you change a config, however, the problem returns as the cache is emptied 
and we make 10,000 more calls. This causes Ambari to be unresponsive until the 
cache is repopulated

There are a ton of threads stuck at:
{code}
"qtp-ambari-client-275" prio=10 tid=0x00007f9de801b800 nid=0x6735 waiting for 
monitor entry [0x00007f9dd66e3000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.ambari.server.controller.internal.AbstractProviderModule.checkInit(AbstractProviderModule.java:805)
        - waiting to lock <0x00007fa0744cc3b0> (a 
org.apache.ambari.server.controller.internal.DefaultProviderModule)
        at 
org.apache.ambari.server.controller.internal.AbstractProviderModule.getMetricsServiceType(AbstractProviderModule.java:275)
{code}

They're all blocked by {{qtp-ambari-client-247}}:
{code}
"qtp-ambari-client-247" prio=10 tid=0x00007f9dd8001000 nid=0x5915 runnable 
[0x00007f9ddd0c2000]
   java.lang.Thread.State: RUNNABLE
        at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2961)
        at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2159)
        at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1964)
        at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3316)
        at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:463)
        at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3040)
        at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2288)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2681)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2551)
        - locked <0x00007fa075265510> (a com.mysql.jdbc.JDBC4Connection)
        at 
com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861)
        - locked <0x00007fa075265510> (a com.mysql.jdbc.JDBC4Connection)
        at 
com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1962)
        - locked <0x00007fa075265510> (a com.mysql.jdbc.JDBC4Connection)
        at 
com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeQuery(NewProxyPreparedStatement.java:353)
        at 
org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeSelect(DatabaseAccessor.java:1009)
        at 
org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.basicExecuteCall(DatabaseAccessor.java:644)
        at 
org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeCall(DatabaseAccessor.java:560)
        at 
org.eclipse.persistence.internal.sessions.AbstractSession.basicExecuteCall(AbstractSession.java:2055)
        at 
org.eclipse.persistence.sessions.server.ServerSession.executeCall(ServerSession.java:570)
        at 
org.eclipse.persistence.internal.queries.DatasourceCallQueryMechanism.executeCall(DatasourceCallQueryMechanism.java:242)
        at 
org.eclipse.persistence.internal.queries.DatasourceCallQueryMechanism.executeCall(DatasourceCallQueryMechanism.java:228)
        at 
org.eclipse.persistence.internal.queries.DatasourceCallQueryMechanism.executeSelectCall(DatasourceCallQueryMechanism.java:299)
        at 
org.eclipse.persistence.internal.queries.DatasourceCallQueryMechanism.selectAllRows(DatasourceCallQueryMechanism.java:694)
        at 
org.eclipse.persistence.internal.queries.ExpressionQueryMechanism.selectAllRowsFromTable(ExpressionQueryMechanism.java:2740)
        at 
org.eclipse.persistence.internal.queries.ExpressionQueryMechanism.selectAllRows(ExpressionQueryMechanism.java:2693)
        at 
org.eclipse.persistence.queries.ReadAllQuery.executeObjectLevelReadQuery(ReadAllQuery.java:559)
        at 
org.eclipse.persistence.queries.ObjectLevelReadQuery.executeDatabaseQuery(ObjectLevelReadQuery.java:1175)
        at 
org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:904)
        at 
org.eclipse.persistence.queries.ObjectLevelReadQuery.execute(ObjectLevelReadQuery.java:1134)
        at 
org.eclipse.persistence.queries.ReadAllQuery.execute(ReadAllQuery.java:460)
        at 
org.eclipse.persistence.queries.ObjectLevelReadQuery.executeInUnitOfWork(ObjectLevelReadQuery.java:1222)
        at 
org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.internalExecuteQuery(UnitOfWorkImpl.java:2896)
        at 
org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1857)
        at 
org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1839)
        at 
org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1804)
        at 
org.eclipse.persistence.internal.jpa.QueryImpl.executeReadQuery(QueryImpl.java:258)
        at 
org.eclipse.persistence.internal.jpa.QueryImpl.getResultList(QueryImpl.java:473)
        at 
org.apache.ambari.server.orm.dao.DaoUtils.selectList(DaoUtils.java:62)
        at 
org.apache.ambari.server.orm.dao.ClusterDAO.getClusterConfigMappingEntitiesByCluster(ClusterDAO.java:240)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to