Alexander Kolbasov created SENTRY-1907:
------------------------------------------
Summary: Potential memory optimization when handling big full
snapshots.
Key: SENTRY-1907
URL: https://issues.apache.org/jira/browse/SENTRY-1907
Project: Sentry
Issue Type: Improvement
Components: Sentry
Affects Versions: 2.0.0
Reporter: Alexander Kolbasov
Assignee: Alexander Kolbasov
Fix For: 2.0.0
PathImageRetriever.retrieveFullImage() has the following code:
{code}
for (Map.Entry<String, Set<String>> pathEnt : pathImage.entrySet()) {
TPathChanges pathChange = pathsUpdate.newPathChange(pathEnt.getKey());
for (String path : pathEnt.getValue()) {
pathChange.addToAddPaths(Lists.newArrayList(Splitter.on("/").split(path))); //
here
}
}
{code}
We convert many paths objects to list of strings per component so /a/b/c
becomes {a, b, c}. There are tons of duplicates there, so after we split we
should intern each component before adding it.
This was observed by code inspection and confirmed by jxray analysis (thanks
[[email protected]]) which shows that 61% of memory is used by duplicate
strings and shows the following stack trace:
{code}
4. REFERENCE CHAINS WITH HIGH RETAINED MEMORY (MAY SIGNAL MEMORY LEAK)
---- Object tree for GC root(s) Java Local@3c8e00c80
(org.apache.sentry.hdfs.service.thrift.TPathsUpdate) ----
4,159,037K (33.4%) (1 of org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
<-- Java Local@3c8e00c80
(org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
4,135,376K (33.3%) (4897951 of j.u.ArrayList)
<-- {j.u.ArrayList} <--
org.apache.sentry.hdfs.service.thrift.TPathChanges.addPaths <-- {j.u.ArrayList}
<-- org.apache.sentry.hdfs.service.thrift.TPathsUpdate.pathChanges <-- Java
Local@3c8e00c80 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
3,652,177K (29.4%) (52086231 objects)
<-- {j.u.ArrayList} <-- {j.u.ArrayList} <--
org.apache.sentry.hdfs.service.thrift.TPathChanges.addPaths <-- {j.u.ArrayList}
<-- org.apache.sentry.hdfs.service.thrift.TPathsUpdate.pathChanges <-- Java
Local@3c8e00c80 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
GC root stack trace:
org.apache.sentry.hdfs.service.thrift.TPathsUpdate$TPathsUpdateStandardScheme.write(TPathsUpdate.java:754)
org.apache.sentry.hdfs.service.thrift.TPathsUpdate$TPathsUpdateStandardScheme.write(TPathsUpdate.java:671)
org.apache.sentry.hdfs.service.thrift.TPathsUpdate.write(TPathsUpdate.java:584)
org.apache.sentry.hdfs.service.thrift.TAuthzUpdateResponse$TAuthzUpdateResponseStandardScheme.write(TAuthzUpdateResponse.java:505)
org.apache.sentry.hdfs.service.thrift.TAuthzUpdateResponse$TAuthzUpdateResponseStandardScheme.write(TAuthzUpdateResponse.java:435)
org.apache.sentry.hdfs.service.thrift.TAuthzUpdateResponse.write(TAuthzUpdateResponse.java:377)
org.apache.sentry.hdfs.service.thrift.SentryHDFSService$get_authz_updates_result$get_authz_updates_resultStandardScheme.write(SentryHDFSService.java:3608)
org.apache.sentry.hdfs.service.thrift.SentryHDFSService$get_authz_updates_result$get_authz_updates_resultStandardScheme.write(SentryHDFSService.java:3572)
org.apache.sentry.hdfs.service.thrift.SentryHDFSService$get_authz_updates_result.write(SentryHDFSService.java:3523)
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
org.apache.sentry.hdfs.SentryHDFSServiceProcessorFactory$ProcessorWrapper.process(SentryHDFSServiceProcessorFactory.java:47)
org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123)
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)