> On Sept. 5, 2017, 9:44 p.m., Vamsee Yarlagadda wrote: > > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryHDFSServiceProcessor.java > > Lines 103 (patched) > > <https://reviews.apache.org/r/62096/diff/1/?file=1815913#file1815913line103> > > > > At this point, we doubled the memory requirement as permUpdates and > > retPermUpdates are both in memory. > > Is there anything we can do to reduce this footprint?e.g: Use an > > iterator to traverse and proactively call remove() for every next() so that > > we only use permUpdates.size() units of memory?
It's not 2x size. If you look at what update.toThrift() does, it only returns the TPermissionUpdate object referenced on the PermissionsUpdate and create a list of those already referenced objects. This does not create a copy of the object, right? Let me know if this is incorrect. The list size would be 2x, but the size of each list item does not increase the memory. Anyway, doing an iterator does not apply for full images because getAllPermsUpdatesFrom() returns a singleton list when a full image is retrieved. I think we could improve the memory footprint if we avoid return TPermissionsUpdate instead of PermissionsUpdate from getAllPermsUpdatesFrom() and perhaps make HDFS->Sentry requests in batches (for deltas only) as well. For full images, then it is more complicated, but we could use batches for the MAuthzPathsMapping when querying the DB and iterate over the retrieved batches and create the TPermissionsUpdate. Doing so requires more time and testing. What do you think? - Sergio ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62096/#review184587 ----------------------------------------------------------- On Sept. 5, 2017, 9:11 p.m., Sergio Pena wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/62096/ > ----------------------------------------------------------- > > (Updated Sept. 5, 2017, 9:11 p.m.) > > > Review request for sentry, Alexander Kolbasov, Brian Towles, Na Li, and > Vamsee Yarlagadda. > > > Bugs: sentry-1919 > https://issues.apache.org/jira/browse/sentry-1919 > > > Repository: sentry > > > Description > ------- > > The patch uses an atomic boolean flag that prevents multiple HDFS requests to > get the paths updates (either deltas or full images) by returning an empty > paths updates list if the flag is set as not available. > > The reason to prevent for deltas as well is because the code change was much > easier to do in once place (SentryHDFSServiceProcessor). We shouldn't get any > issues if we do it this way (unless the reviewer thinks otherwise). > > > Diffs > ----- > > > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryHDFSServiceProcessor.java > 6221f3d01b2d86ec257bcec290c9b3b0527a6e34 > > > Diff: https://reviews.apache.org/r/62096/diff/1/ > > > Testing > ------- > > > Thanks, > > Sergio Pena > >
