[ 
https://issues.apache.org/jira/browse/SENTRY-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151161#comment-16151161
 ] 

Alexander Kolbasov commented on SENTRY-1909:
--------------------------------------------

>From code review discussions:

[[email protected]] says:
HashSets use a lot more memory than ArrayLists because of the difference in 
their internal design. If you are interested, I can explain more, but for now 
take a look at the small test below. It creates the same number of HashSets and 
then ArrayLists, with the same workload, and measures how much memory they take 
each time. Here are the results:

{code}
$ java TestMem
Used memory by  sets: 1240 MB
Used memory by  lists: 156 MB
{code}

Here is the code:

{code}
import java.util.ArrayList;
import java.util.HashSet;

public class TestMem {
  public static final int NUM_OBJS = 1000 * 1000;
  public static final int NUM_STRS = 30;
  public static final String[] STRINGS = new String[NUM_STRS];

public static void main(String args[]) {
    // Fill the strings array once
    for (int i = 0; i < NUM_STRS; i++) {
      STRINGS[i] = Integer.toString(i);
    }

Object[] listsOrSets = new Object[NUM_OBJS];

System.gc();

for (int i = 0; i < NUM_OBJS; i++) {
  HashSet<String> set = new HashSet<>(NUM_STRS);
  for (int j = 0; j < NUM_STRS; j++) {
    set.add(STRINGS[j]);
  }
  listsOrSets[i] = set;
}

reportMemory("sets");

System.gc();

for (int i = 0; i < NUM_OBJS; i++) {
  ArrayList<String> list = new ArrayList<>(NUM_STRS);
  for (int j = 0; j < NUM_STRS; j++) {
    list.add(STRINGS[j]);
  }
  listsOrSets[i] = list;
}

reportMemory("lists");

}

private static void reportMemory(String s) {
    System.gc();
    Runtime r = Runtime.getRuntime();
    long usedMemInMB = (r.totalMemory() - r.freeMemory()) / 1024 / 1024;
    System.out.println("Used memory by  " + s + ": " + usedMemInMB + " MB");
  }
}
{code}

> Improvements for memory usage when full path snapshot is sent from Sentry to 
> NN
> -------------------------------------------------------------------------------
>
>                 Key: SENTRY-1909
>                 URL: https://issues.apache.org/jira/browse/SENTRY-1909
>             Project: Sentry
>          Issue Type: Improvement
>          Components: Sentry
>    Affects Versions: 2.0.0
>            Reporter: Alexander Kolbasov
>            Assignee: Alexander Kolbasov
>         Attachments: SENTRY-1909.01.patch
>
>
> While looking at SENTRY-1907 I noticed another thing. 
> {{sentryStore.retrieveFullPathsImage()}} uses {{Map<String, Set<String>>}} as 
> a snapshot representation. This isn't needed since {{Map<String, 
> Collection<String>>}} is sufficient and we can represent it using ArrayList. 
> This is much more efficient representation. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to