Alexander Kolbasov created SENTRY-1907:
------------------------------------------

             Summary: Potential memory optimization when handling big full 
snapshots.
                 Key: SENTRY-1907
                 URL: https://issues.apache.org/jira/browse/SENTRY-1907
             Project: Sentry
          Issue Type: Improvement
          Components: Sentry
    Affects Versions: 2.0.0
            Reporter: Alexander Kolbasov
            Assignee: Alexander Kolbasov
             Fix For: 2.0.0


PathImageRetriever.retrieveFullImage() has the following code:

{code}
      for (Map.Entry<String, Set<String>> pathEnt : pathImage.entrySet()) {
        TPathChanges pathChange = pathsUpdate.newPathChange(pathEnt.getKey());

        for (String path : pathEnt.getValue()) {
          
pathChange.addToAddPaths(Lists.newArrayList(Splitter.on("/").split(path))); // 
here
        }
      }
{code}

We convert many paths objects to list of strings per component so /a/b/c 
becomes {a, b, c}. There are tons of duplicates there, so after we split we 
should intern each component before adding it.

This was observed by code inspection and confirmed by jxray analysis (thanks 
[[email protected]]) which shows that 61% of memory is used by duplicate 
strings and shows the following stack trace:

{code}
4. REFERENCE CHAINS WITH HIGH RETAINED MEMORY (MAY SIGNAL MEMORY LEAK)

 ---- Object tree for GC root(s) Java Local@3c8e00c80 
(org.apache.sentry.hdfs.service.thrift.TPathsUpdate) ----

  4,159,037K (33.4%) (1 of org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
     <-- Java Local@3c8e00c80 
(org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
  4,135,376K (33.3%) (4897951 of j.u.ArrayList)
     <-- {j.u.ArrayList} <-- 
org.apache.sentry.hdfs.service.thrift.TPathChanges.addPaths <-- {j.u.ArrayList} 
<-- org.apache.sentry.hdfs.service.thrift.TPathsUpdate.pathChanges <-- Java 
Local@3c8e00c80 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
  3,652,177K (29.4%) (52086231 objects)
     <-- {j.u.ArrayList} <-- {j.u.ArrayList} <-- 
org.apache.sentry.hdfs.service.thrift.TPathChanges.addPaths <-- {j.u.ArrayList} 
<-- org.apache.sentry.hdfs.service.thrift.TPathsUpdate.pathChanges <-- Java 
Local@3c8e00c80 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
  GC root stack trace:
    
org.apache.sentry.hdfs.service.thrift.TPathsUpdate$TPathsUpdateStandardScheme.write(TPathsUpdate.java:754)
    
org.apache.sentry.hdfs.service.thrift.TPathsUpdate$TPathsUpdateStandardScheme.write(TPathsUpdate.java:671)
    
org.apache.sentry.hdfs.service.thrift.TPathsUpdate.write(TPathsUpdate.java:584)
    
org.apache.sentry.hdfs.service.thrift.TAuthzUpdateResponse$TAuthzUpdateResponseStandardScheme.write(TAuthzUpdateResponse.java:505)
    
org.apache.sentry.hdfs.service.thrift.TAuthzUpdateResponse$TAuthzUpdateResponseStandardScheme.write(TAuthzUpdateResponse.java:435)
    
org.apache.sentry.hdfs.service.thrift.TAuthzUpdateResponse.write(TAuthzUpdateResponse.java:377)
    
org.apache.sentry.hdfs.service.thrift.SentryHDFSService$get_authz_updates_result$get_authz_updates_resultStandardScheme.write(SentryHDFSService.java:3608)
    
org.apache.sentry.hdfs.service.thrift.SentryHDFSService$get_authz_updates_result$get_authz_updates_resultStandardScheme.write(SentryHDFSService.java:3572)
    
org.apache.sentry.hdfs.service.thrift.SentryHDFSService$get_authz_updates_result.write(SentryHDFSService.java:3523)
    org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
    org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    
org.apache.sentry.hdfs.SentryHDFSServiceProcessorFactory$ProcessorWrapper.process(SentryHDFSServiceProcessorFactory.java:47)
    
org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123)
    
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
    
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to