[
https://issues.apache.org/jira/browse/AMBARI-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604030#comment-14604030
]
Hudson commented on AMBARI-12178:
---------------------------------
FAILURE: Integrated in Ambari-branch-2.1 #128 (See
[https://builds.apache.org/job/Ambari-branch-2.1/128/])
AMBARI-12178 - Memory Exhausted During Upgrade Of Large Cluster
(jonathanhurley) (jhurley:
http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=77316937c92ba3465255bac5acd335317f58bdd7)
*
ambari-server/src/main/java/org/apache/ambari/server/controller/internal/StageResourceProvider.java
*
ambari-server/src/main/java/org/apache/ambari/server/orm/entities/HostEntity.java
* ambari-server/src/main/java/org/apache/ambari/server/topology/HostRequest.java
*
ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java
*
ambari-server/src/main/java/org/apache/ambari/server/actionmanager/HostRoleCommand.java
*
ambari-server/src/main/java/org/apache/ambari/server/topology/TopologyManager.java
* ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java
*
ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeGroupResourceProvider.java
> Memory Exhausted During Upgrade Of Large Cluster
> ------------------------------------------------
>
> Key: AMBARI-12178
> URL: https://issues.apache.org/jira/browse/AMBARI-12178
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.1.0
> Reporter: Jonathan Hurley
> Assignee: Jonathan Hurley
> Priority: Blocker
> Fix For: 2.1.0
>
>
> During an upgrade of a large cluster, the memory used by Ambari grows until
> it is fully consumed. This, however, only happens when the Upgrade Dialog
> page is open. If that popup is closed, the memory usage stays relatively
> constant.
> The offending call is:
> {code}
> api/v1/clusters/perf400/upgrades/31?upgrade_groups/UpgradeGroup/status!=PENDING&fields=Upgrade/progress_percent,Upgrade/request_context,Upgrade/request_status,Upgrade/direction,upgrade_groups/UpgradeGroup,upgrade_groups/upgrade_items/UpgradeItem/status,upgrade_groups/upgrade_items/UpgradeItem/context,upgrade_groups/upgrade_items/UpgradeItem/group_id,upgrade_groups/upgrade_items/UpgradeItem/progress_percent,upgrade_groups/upgrade_items/UpgradeItem/request_id,upgrade_groups/upgrade_items/UpgradeItem/skippable,upgrade_groups/upgrade_items/UpgradeItem/stage_id,upgrade_groups/upgrade_items/UpgradeItem/status,upgrade_groups/upgrade_items/UpgradeItem/text&minimal_response=true
> {code}
> Based on heap dumps, the larges offenders are {{StageEnity}} and, as a
> result, {{byte[]}}:
> {noformat}
> Class Name| Objects | Shallow Heap | Retained Heap
> ----------------------------------------------------
> byte[] | 351,907 | 3,147,710,224 |
> ----------------------------------------------------
> Class Name | Objects | Shallow Heap |
> Retained Heap
> --------------------------------------------------------------------------------------------
> org.apache.ambari.server.orm.entities.StageEntity | 192,356 | 18,466,176 |
> 3,075,693,136
> org.apache.ambari.server.orm.entities.StageEntity_ | 0 | 0 |
>
> org.apache.ambari.server.orm.entities.StageEntityPK| 0 | 0 |
>
> --------------------------------------------------------------------------------------------
> {noformat}
> Each {{StageEntity}} is holding about 30k:
> {noformat}
> Class Name
>
>
> |
> Shallow Heap | Retained Heap
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> org.apache.ambari.server.orm.entities.StageEntity @ 0x738e03260
>
>
> |
> 96 | 28,576
> |- <class> class org.apache.ambari.server.orm.entities.StageEntity @
> 0x64058d268
>
>
> | 8 | 8
> |- skippable java.lang.Integer @ 0x6401e9738 0
>
>
> |
> 16 | 16
> |- clusterId java.lang.Long @ 0x64026c908 2
>
>
> |
> 24 | 24
> |- requestId java.lang.Long @ 0x64026d840 31
>
>
> |
> 24 | 24
> |- _persistence_primaryKey
> org.eclipse.persistence.internal.identitymaps.CacheId @ 0x642ce20e0
>
>
> | 24 | 48
> |- _persistence_cacheKey
> org.eclipse.persistence.internal.identitymaps.HardCacheWeakIdentityMap$ReferenceCacheKey
> @ 0x6469cf328
>
> | 104 | 136
> |- request org.apache.ambari.server.orm.entities.RequestEntity @ 0x728d046e8
>
>
> |
> 112 | 432
> |- _persistence_listener
> org.eclipse.persistence.internal.descriptors.changetracking.AttributeChangeListener
> @ 0x72f073f20
>
> | 32 | 32
> |- stageId java.lang.Long @ 0x7350c8b08 1199
>
>
> |
> 24 | 24
> |- logInfo java.lang.String @ 0x7350c8b20 /tmp/ambari
>
>
> |
> 24 | 64
> |- requestContext java.lang.String @ 0x7350c8b38 Restarting DataNode on
> perf400-c-371.c.pramod-thangali.internal
>
> |
> 24 | 168
> |- hostRoleCommands org.eclipse.persistence.indirection.IndirectList @
> 0x738a0ceb0
>
>
> | 64 | 184
> |- roleSuccessCriterias org.eclipse.persistence.indirection.IndirectList @
> 0x738a0cef0
>
> |
> 64 | 184
> |- commandParamsStage byte[141] @ 0x738c46cc8
> {"restart_type":"rolling_upgrade","upgrade_direction":"upgrade","version":"2.2.6.0-2799","target_stack":"HDP-2.2","original_stack":"HDP-2.2"}
>
> | 160 | 160
> |- hostParamsStage byte[776] @ 0x738dc16b0
> {"ambari_db_rca_driver":"org.postgresql.Driver","ambari_db_rca_password":"mapred","ambari_db_rca_url":"jdbc:postgresql://perf400-a-1.c.pramod-thangali.internal/ambarirca","ambari_db_rca_username":"mapred","current_version":"2.2.0.0-2041","db_driver_filenam...
> | 792 | 792
> |- clusterHostInfo byte[26774] @ 0x739006378
> {"nimbus_hosts":["278"],"all_racks":["/default-rack:0-405"],"ambari_server_host":["perf400-a-1.c.pramod-thangali.internal"],"app_timeline_server_hosts":["138"],"hive_mysql_host":["247"],"falcon_server_hosts":["2"],"hbase_master_hosts":["2"],"accumulo_maste...|
> 26,792 | 26,792
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> {noformat}
> It appears as though a local {{Cache}} in
> [ActionDBAccessorImpl|https://github.com/apache/ambari/blob/94c091e280a99e07db5f3910873e70aa3c18394f/ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java#L104]
> is holding on these objects:
> {noformat:title=Shows the cache holding onto a HostEntity which holds onto a
> UnitOfWork map with lots of stale entities}
> Class Name
>
> | Ref. Objects | Shallow Heap | Ref. Shallow Heap | Retained Heap
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> java.lang.Thread @ 0x641af65b8 ambari-action-scheduler Native Stack, Thread
>
> | 76 | 120 | 7,296 | 4,960,776
> |- <Java Local>
> org.apache.ambari.server.actionmanager.ActionDBAccessorImpl$$EnhancerByGuice$$dcf333e8
> @ 0x640538f40 | 75 |
> 248 | 7,200 | 640,497,232
> | '- hostRoleCommandCache
> com.google.common.cache.LocalCache$LocalManualCache @ 0x640474b58
> | 75 |
> 16 | 7,200 | 640,496,984
> | '- localCache com.google.common.cache.LocalCache @ 0x640da1650
>
> | 75 | 128 | 7,200 | 640,496,968
> | '- segments com.google.common.cache.LocalCache$Segment[4] @
> 0x640f27e88
> | 75 | 32 | 7,200 | 640,496,840
> | |- [1] com.google.common.cache.LocalCache$Segment @ 0x6410ee3c8
>
> | 22 | 80 | 2,112 | 151,456,800
> | | |- table java.util.concurrent.atomic.AtomicReferenceArray @
> 0x6470826f8
> | 21 | 16 | 2,016 | 2,080
> | | | '- array java.lang.Object[512] @ 0x65dd9e088
>
> | 21 | 2,064 | 2,016 | 2,064
> | | | |- [346]
> com.google.common.cache.LocalCache$StrongAccessEntry @ 0x670caa3d0
> | 1 | 48
> | 96 | 2,854,000
> | | | | '- valueReference
> com.google.common.cache.LocalCache$StrongValueReference @ 0x670caa418
> | 1 | 16 |
> 96 | 2,853,928
> | | | | '- referent
> org.apache.ambari.server.actionmanager.HostRoleCommand @ 0x670caa430
> | 1 | 128 |
> 96 | 2,853,912
> | | | | '- hostEntity
> org.apache.ambari.server.orm.entities.HostEntity @ 0x66f876d18
> | 1 | 136 |
> 96 | 2,827,496
> | | | | '- _persistence_listener
> org.eclipse.persistence.internal.descriptors.changetracking.AttributeChangeListener
> @ 0x66f89f530| 1 | 32 | 96 |
> 32
> | | | | '- uow
> org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork @
> 0x670ca0b30 | 1 | 360 |
> 96 | 2,826,496
> | | | | '- identityMapAccessor
> org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor @
> 0x66f7fbf38 | 1 | 24 | 96 |
> 2,825,688
> | | | | '- identityMapManager
> org.eclipse.persistence.internal.identitymaps.IdentityMapManager @
> 0x670c2b320 | 1 | 48 | 96 |
> 2,825,664
> | | | | '- identityMaps
> java.util.HashMap @ 0x670c2b350
> | 1 | 48 | 96 |
> 2,824,208
> | | | | '- table
> java.util.HashMap$Node[32] @ 0x670cb1608
> | 1 | 144 | 96 |
> 2,824,160
> | | | | '- [5]
> java.util.HashMap$Node @ 0x670b71bd8
> | 1 | 32 | 96 |
> 1,201,192
> | | | | '- value
> org.eclipse.persistence.internal.identitymaps.UnitOfWorkIdentityMap @
> 0x670c5a390 | 1 | 32 | 96 |
> 1,201,160
> | | | | '- cacheKeys
> java.util.HashMap @ 0x670c2b4d0
> | 1 | 48 | 96 | 1,201,128
> | | | | '- table
> java.util.HashMap$Node[4096] @ 0x66f7c83c8
> | 1 | 16,400 | 96 | 1,201,080
> | | | | '- [3271]
> java.util.HashMap$Node @ 0x670c772e8
> | 1 | 32 | 96 | 200
> | | | | '- value
> org.eclipse.persistence.internal.identitymaps.CacheKey @ 0x66f756e30
> | 1 | 96 | 96 | 96
> | | | | '-
> object org.apache.ambari.server.orm.entities.StageEntity @ 0x66f4f6f98
> | 1 | 96 | 96 | 568
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)