Jonathan Hurley created AMBARI-12178:
----------------------------------------

             Summary: Memory Exhausted During Upgrade Of Large Cluster
                 Key: AMBARI-12178
                 URL: https://issues.apache.org/jira/browse/AMBARI-12178
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.1.0
            Reporter: Jonathan Hurley
            Assignee: Jonathan Hurley
            Priority: Blocker
             Fix For: 2.1.0


During an upgrade of a large cluster, the memory used by Ambari grows until it 
is fully consumed. This, however, only happens when the Upgrade Dialog page is 
open. If that popup is closed, the memory usage stays relatively constant.

The offending call is:
{code}
api/v1/clusters/perf400/upgrades/31?upgrade_groups/UpgradeGroup/status!=PENDING&fields=Upgrade/progress_percent,Upgrade/request_context,Upgrade/request_status,Upgrade/direction,upgrade_groups/UpgradeGroup,upgrade_groups/upgrade_items/UpgradeItem/status,upgrade_groups/upgrade_items/UpgradeItem/context,upgrade_groups/upgrade_items/UpgradeItem/group_id,upgrade_groups/upgrade_items/UpgradeItem/progress_percent,upgrade_groups/upgrade_items/UpgradeItem/request_id,upgrade_groups/upgrade_items/UpgradeItem/skippable,upgrade_groups/upgrade_items/UpgradeItem/stage_id,upgrade_groups/upgrade_items/UpgradeItem/status,upgrade_groups/upgrade_items/UpgradeItem/text&minimal_response=true
{code}

Based on heap dumps, the larges offenders are {{StageEnity}} and, as a result, 
{{byte[]}}:

{noformat}
Class Name| Objects |  Shallow Heap | Retained Heap
----------------------------------------------------
byte[]    | 351,907 | 3,147,710,224 |              
----------------------------------------------------

Class Name                                         | Objects | Shallow Heap | 
Retained Heap
--------------------------------------------------------------------------------------------
org.apache.ambari.server.orm.entities.StageEntity  | 192,356 |   18,466,176 | 
3,075,693,136
org.apache.ambari.server.orm.entities.StageEntity_ |       0 |            0 |   
           
org.apache.ambari.server.orm.entities.StageEntityPK|       0 |            0 |   
           
--------------------------------------------------------------------------------------------
{noformat}

Each {{StageEntity}} is holding about 30k:
{noformat}
Class Name                                                                      
                                                                                
                                                                                
                                                                 | Shallow Heap 
| Retained Heap
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
org.apache.ambari.server.orm.entities.StageEntity @ 0x738e03260                 
                                                                                
                                                                                
                                                                 |           96 
|        28,576
|- <class> class org.apache.ambari.server.orm.entities.StageEntity @ 
0x64058d268                                                                     
                                                                                
                                                                            |   
         8 |             8
|- skippable java.lang.Integer @ 0x6401e9738  0                                 
                                                                                
                                                                                
                                                                 |           16 
|            16
|- clusterId java.lang.Long @ 0x64026c908  2                                    
                                                                                
                                                                                
                                                                 |           24 
|            24
|- requestId java.lang.Long @ 0x64026d840  31                                   
                                                                                
                                                                                
                                                                 |           24 
|            24
|- _persistence_primaryKey 
org.eclipse.persistence.internal.identitymaps.CacheId @ 0x642ce20e0             
                                                                                
                                                                                
                                      |           24 |            48
|- _persistence_cacheKey 
org.eclipse.persistence.internal.identitymaps.HardCacheWeakIdentityMap$ReferenceCacheKey
 @ 0x6469cf328                                                                  
                                                                                
                                |          104 |           136
|- request org.apache.ambari.server.orm.entities.RequestEntity @ 0x728d046e8    
                                                                                
                                                                                
                                                                 |          112 
|           432
|- _persistence_listener 
org.eclipse.persistence.internal.descriptors.changetracking.AttributeChangeListener
 @ 0x72f073f20                                                                  
                                                                                
                                     |           32 |            32
|- stageId java.lang.Long @ 0x7350c8b08  1199                                   
                                                                                
                                                                                
                                                                 |           24 
|            24
|- logInfo java.lang.String @ 0x7350c8b20  /tmp/ambari                          
                                                                                
                                                                                
                                                                 |           24 
|            64
|- requestContext java.lang.String @ 0x7350c8b38  Restarting DataNode on 
perf400-c-371.c.pramod-thangali.internal                                        
                                                                                
                                                                        |       
    24 |           168
|- hostRoleCommands org.eclipse.persistence.indirection.IndirectList @ 
0x738a0ceb0                                                                     
                                                                                
                                                                          |     
      64 |           184
|- roleSuccessCriterias org.eclipse.persistence.indirection.IndirectList @ 
0x738a0cef0                                                                     
                                                                                
                                                                      |         
  64 |           184
|- commandParamsStage byte[141] @ 0x738c46cc8  
{"restart_type":"rolling_upgrade","upgrade_direction":"upgrade","version":"2.2.6.0-2799","target_stack":"HDP-2.2","original_stack":"HDP-2.2"}
                                                                                
                                     |          160 |           160
|- hostParamsStage byte[776] @ 0x738dc16b0  
{"ambari_db_rca_driver":"org.postgresql.Driver","ambari_db_rca_password":"mapred","ambari_db_rca_url":"jdbc:postgresql://perf400-a-1.c.pramod-thangali.internal/ambarirca","ambari_db_rca_username":"mapred","current_version":"2.2.0.0-2041","db_driver_filenam...
  |          792 |           792
|- clusterHostInfo byte[26774] @ 0x739006378  
{"nimbus_hosts":["278"],"all_racks":["/default-rack:0-405"],"ambari_server_host":["perf400-a-1.c.pramod-thangali.internal"],"app_timeline_server_hosts":["138"],"hive_mysql_host":["247"],"falcon_server_hosts":["2"],"hbase_master_hosts":["2"],"accumulo_maste...|
       26,792 |        26,792
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{noformat}

It appears as though a local {{Cache}} in 
[ActionDBAccessorImpl|https://github.com/apache/ambari/blob/94c091e280a99e07db5f3910873e70aa3c18394f/ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java#L104]
 is holding on these objects:
{noformat:title=Shows the cache holding onto a HostEntity which holds onto a 
UnitOfWork map with lots of stale entities}
Class Name                                                                      
                                                                           | 
Ref. Objects | Shallow Heap | Ref. Shallow Heap | Retained Heap
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
java.lang.Thread @ 0x641af65b8  ambari-action-scheduler Native Stack, Thread    
                                                                           |    
       76 |          120 |             7,296 |     4,960,776
|- <Java Local> 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl$$EnhancerByGuice$$dcf333e8
 @ 0x640538f40                                       |           75 |          
248 |             7,200 |   640,497,232
|  '- hostRoleCommandCache com.google.common.cache.LocalCache$LocalManualCache 
@ 0x640474b58                                                               |   
        75 |           16 |             7,200 |   640,496,984
|     '- localCache com.google.common.cache.LocalCache @ 0x640da1650            
                                                                           |    
       75 |          128 |             7,200 |   640,496,968
|        '- segments com.google.common.cache.LocalCache$Segment[4] @ 
0x640f27e88                                                                     
      |           75 |           32 |             7,200 |   640,496,840
|           |- [1] com.google.common.cache.LocalCache$Segment @ 0x6410ee3c8     
                                                                           |    
       22 |           80 |             2,112 |   151,456,800
|           |  |- table java.util.concurrent.atomic.AtomicReferenceArray @ 
0x6470826f8                                                                     
|           21 |           16 |             2,016 |         2,080
|           |  |  '- array java.lang.Object[512] @ 0x65dd9e088                  
                                                                           |    
       21 |        2,064 |             2,016 |         2,064
|           |  |     |- [346] 
com.google.common.cache.LocalCache$StrongAccessEntry @ 0x670caa3d0              
                                             |            1 |           48 |    
            96 |     2,854,000
|           |  |     |  '- valueReference 
com.google.common.cache.LocalCache$StrongValueReference @ 0x670caa418           
                                 |            1 |           16 |                
96 |     2,853,928
|           |  |     |     '- referent 
org.apache.ambari.server.actionmanager.HostRoleCommand @ 0x670caa430            
                                    |            1 |          128 |             
   96 |     2,853,912
|           |  |     |        '- hostEntity 
org.apache.ambari.server.orm.entities.HostEntity @ 0x66f876d18                  
                               |            1 |          136 |                
96 |     2,827,496
|           |  |     |           '- _persistence_listener 
org.eclipse.persistence.internal.descriptors.changetracking.AttributeChangeListener
 @ 0x66f89f530|            1 |           32 |                96 |            32
|           |  |     |              '- uow 
org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork @ 
0x670ca0b30                               |            1 |          360 |       
         96 |     2,826,496
|           |  |     |                 '- identityMapAccessor 
org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor @ 
0x66f7fbf38        |            1 |           24 |                96 |     
2,825,688
|           |  |     |                    '- identityMapManager 
org.eclipse.persistence.internal.identitymaps.IdentityMapManager @ 0x670c2b320  
           |            1 |           48 |                96 |     2,825,664
|           |  |     |                       '- identityMaps java.util.HashMap 
@ 0x670c2b350                                                               |   
         1 |           48 |                96 |     2,824,208
|           |  |     |                          '- table 
java.util.HashMap$Node[32] @ 0x670cb1608                                        
                  |            1 |          144 |                96 |     
2,824,160
|           |  |     |                             '- [5] 
java.util.HashMap$Node @ 0x670b71bd8                                            
                 |            1 |           32 |                96 |     
1,201,192
|           |  |     |                                '- value 
org.eclipse.persistence.internal.identitymaps.UnitOfWorkIdentityMap @ 
0x670c5a390           |            1 |           32 |                96 |     
1,201,160
|           |  |     |                                   '- cacheKeys 
java.util.HashMap @ 0x670c2b4d0                                                 
     |            1 |           48 |                96 |     1,201,128
|           |  |     |                                      '- table 
java.util.HashMap$Node[4096] @ 0x66f7c83c8                                      
      |            1 |       16,400 |                96 |     1,201,080
|           |  |     |                                         '- [3271] 
java.util.HashMap$Node @ 0x670c772e8                                            
  |            1 |           32 |                96 |           200
|           |  |     |                                            '- value 
org.eclipse.persistence.internal.identitymaps.CacheKey @ 0x66f756e30            
|            1 |           96 |                96 |            96
|           |  |     |                                               '- object 
org.apache.ambari.server.orm.entities.StageEntity @ 0x66f4f6f98             |   
         1 |           96 |                96 |           568
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to