[
https://issues.apache.org/jira/browse/SENTRY-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151161#comment-16151161
]
Alexander Kolbasov commented on SENTRY-1909:
--------------------------------------------
>From code review discussions:
[[email protected]] says:
HashSets use a lot more memory than ArrayLists because of the difference in
their internal design. If you are interested, I can explain more, but for now
take a look at the small test below. It creates the same number of HashSets and
then ArrayLists, with the same workload, and measures how much memory they take
each time. Here are the results:
{code}
$ java TestMem
Used memory by sets: 1240 MB
Used memory by lists: 156 MB
{code}
Here is the code:
{code}
import java.util.ArrayList;
import java.util.HashSet;
public class TestMem {
public static final int NUM_OBJS = 1000 * 1000;
public static final int NUM_STRS = 30;
public static final String[] STRINGS = new String[NUM_STRS];
public static void main(String args[]) {
// Fill the strings array once
for (int i = 0; i < NUM_STRS; i++) {
STRINGS[i] = Integer.toString(i);
}
Object[] listsOrSets = new Object[NUM_OBJS];
System.gc();
for (int i = 0; i < NUM_OBJS; i++) {
HashSet<String> set = new HashSet<>(NUM_STRS);
for (int j = 0; j < NUM_STRS; j++) {
set.add(STRINGS[j]);
}
listsOrSets[i] = set;
}
reportMemory("sets");
System.gc();
for (int i = 0; i < NUM_OBJS; i++) {
ArrayList<String> list = new ArrayList<>(NUM_STRS);
for (int j = 0; j < NUM_STRS; j++) {
list.add(STRINGS[j]);
}
listsOrSets[i] = list;
}
reportMemory("lists");
}
private static void reportMemory(String s) {
System.gc();
Runtime r = Runtime.getRuntime();
long usedMemInMB = (r.totalMemory() - r.freeMemory()) / 1024 / 1024;
System.out.println("Used memory by " + s + ": " + usedMemInMB + " MB");
}
}
{code}
> Improvements for memory usage when full path snapshot is sent from Sentry to
> NN
> -------------------------------------------------------------------------------
>
> Key: SENTRY-1909
> URL: https://issues.apache.org/jira/browse/SENTRY-1909
> Project: Sentry
> Issue Type: Improvement
> Components: Sentry
> Affects Versions: 2.0.0
> Reporter: Alexander Kolbasov
> Assignee: Alexander Kolbasov
> Attachments: SENTRY-1909.01.patch
>
>
> While looking at SENTRY-1907 I noticed another thing.
> {{sentryStore.retrieveFullPathsImage()}} uses {{Map<String, Set<String>>}} as
> a snapshot representation. This isn't needed since {{Map<String,
> Collection<String>>}} is sufficient and we can represent it using ArrayList.
> This is much more efficient representation.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)