[
https://issues.apache.org/jira/browse/SENTRY-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Misha Dmitriev updated SENTRY-1827:
-----------------------------------
Fix Version/s: (was: sentry-ha-redesign)
> Minimize TPathsDump thrift message used in HDFS sync
> ----------------------------------------------------
>
> Key: SENTRY-1827
> URL: https://issues.apache.org/jira/browse/SENTRY-1827
> Project: Sentry
> Issue Type: Improvement
> Affects Versions: 1.8.0
> Reporter: Misha Dmitriev
> Assignee: Misha Dmitriev
> Fix For: 1.8.0
>
>
> We obtained a heap dump taken from the JVM running Hive Metastore at the time
> when Sentry HDFS sync operation was performed. I've analyzed this dump with
> jxray (www.jxray.com) and found that more than 19% of memory is wasted due to
> empty or suboptimally-sized Java collections:
> {code}
> 9. BAD COLLECTIONS
> Total collections: 54,057,249 Bad collections: 31,569,606 Overhead:
> 5,292,821K (19.3%)
> {code}
> Most of these collections come from thrift classes used by the Sentry plugin,
> see below. The associated memory waste can be significantly reduced or
> eliminated if these collections were allocated lazily and then with the
> initial capacity smaller than the default 16 elements for HashMap/HashSet.
> {code}
> 1,869,023K (6.8%): j.u.HashSet: 3388670 of 1-elem 979,537K (3.6%), 5897806
> of empty 552,919K (2.0%), 1010321 of small 336,566K (1.2%)
> <-- org.apache.sentry.hdfs.service.thrift.TPathEntry.children <--
> {j.u.HashMap}.values <--
> org.apache.sentry.hdfs.service.thrift.TPathsDump.nodeMap <--
> org.apache.sentry.hdfs.service.thrift.TPathsUpdate.pathsDump <-- Java
> Local@7fea0851c360 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
> 1,190,050K (4.3%): j.u.HashMap: 3382765 of 1-elem 898,546K (3.3%), 1005341
> of small 291,503K (1.1%)
> <-- org.apache.sentry.hdfs.HMSPaths$Entry.children <--
> org.apache.sentry.hdfs.HMSPaths$Entry.{parent} <-- {j.u.HashSet} <--
> {j.u.TreeMap}.values <-- org.apache.sentry.hdfs.HMSPaths.authzObjToPath <--
> org.apache.sentry.hdfs.UpdateableAuthzPaths.paths <--
> org.apache.sentry.hdfs.MetastorePlugin.authzPaths <-- Java Local@7fe4fe84e030
> (org.apache.sentry.hdfs.MetastorePlugin)
> 969,442K (3.5%): j.u.TreeSet: 5907188 of 1-elem 969,148K (3.5%)
> <-- org.apache.sentry.hdfs.service.thrift.TPathEntry.authzObjs <--
> {j.u.HashMap}.values <--
> org.apache.sentry.hdfs.service.thrift.TPathsDump.nodeMap <--
> org.apache.sentry.hdfs.service.thrift.TPathsUpdate.pathsDump <-- Java
> Local@7fea0851c360 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
> 487,690K (1.8%): j.u.TreeSet: 4801877 of empty 487,690K (1.8%)
> <-- org.apache.sentry.hdfs.HMSPaths$Entry.authzObjs <--
> org.apache.sentry.hdfs.HMSPaths$Entry.{parent} <-- {j.u.HashSet} <--
> {j.u.TreeMap}.values <-- org.apache.sentry.hdfs.HMSPaths.authzObjToPath <--
> org.apache.sentry.hdfs.UpdateableAuthzPaths.paths <--
> org.apache.sentry.hdfs.MetastorePlugin.authzPaths <-- Java Local@7fe4fe84e030
> (org.apache.sentry.hdfs.MetastorePlugin)
> 415,064K (1.5%): j.u.HashMap: 5897806 of empty 414,689K (1.5%)
> <-- org.apache.sentry.hdfs.HMSPaths$Entry.children <-- {j.u.HashSet}
> <-- {j.u.TreeMap}.values <-- org.apache.sentry.hdfs.HMSPaths.authzObjToPath
> <-- org.apache.sentry.hdfs.UpdateableAuthzPaths.paths <--
> org.apache.sentry.hdfs.MetastorePlugin.authzPaths <-- Java Local@7fe4fe84e030
> (org.apache.sentry.hdfs.MetastorePlugin)
> {code}
> Additionally, a significant percentage of memory is wasted due to duplicate
> strings:
> {code}
> 7. DUPLICATE STRINGS
> Total strings: 29,986,017 Unique strings: 9,640,413 Duplicate values:
> 4,897,743 Overhead: 2,570,746K (9.4%)
> {code}
> Of them, more than 1/3 come from sentry:
> {code}
> 917,331K (3.3%), 10517636 dup strings (498477 unique), 10517636 dup backing
> arrays:
> <-- org.apache.sentry.hdfs.service.thrift.TPathEntry.pathElement <--
> {j.u.HashMap}.values <-- org.apache.sen
> try.hdfs.service.thrift.TPathsDump.nodeMap <--
> org.apache.sentry.hdfs.service.thrift.TPathsUpdate.pathsDump <-- Ja
> va Local@7fea0851c360 (org.apache.sentry.hdfs.service.thrift.TPathsUpdate)
> {code}
> These can be eliminated by inserting String.intern() calls in the appropriate
> places.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)