> On Sept. 5, 2017, 9:44 p.m., Vamsee Yarlagadda wrote: > > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryHDFSServiceProcessor.java > > Lines 103 (patched) > > <https://reviews.apache.org/r/62096/diff/1/?file=1815913#file1815913line103> > > > > At this point, we doubled the memory requirement as permUpdates and > > retPermUpdates are both in memory. > > Is there anything we can do to reduce this footprint?e.g: Use an > > iterator to traverse and proactively call remove() for every next() so that > > we only use permUpdates.size() units of memory? > > Sergio Pena wrote: > It's not 2x size. If you look at what update.toThrift() does, it only > returns the TPermissionUpdate object referenced on the PermissionsUpdate and > create a list of those already referenced objects. This does not create a > copy of the object, right? Let me know if this is incorrect. The list size > would be 2x, but the size of each list item does not increase the memory. > > Anyway, doing an iterator does not apply for full images because > getAllPermsUpdatesFrom() returns a singleton list when a full image is > retrieved. I think we could improve the memory footprint if we avoid return > TPermissionsUpdate instead of PermissionsUpdate from getAllPermsUpdatesFrom() > and perhaps make HDFS->Sentry requests in batches (for deltas only) as well. > > For full images, then it is more complicated, but we could use batches > for the MAuthzPathsMapping when querying the DB and iterate over the > retrieved batches and create the TPermissionsUpdate. Doing so requires more > time and testing. > > What do you think?
As long as we are creating a shallow copy, I am fine with the existing approach. Thanks for the explanation. - Vamsee ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62096/#review184587 ----------------------------------------------------------- On Sept. 5, 2017, 10:41 p.m., Sergio Pena wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/62096/ > ----------------------------------------------------------- > > (Updated Sept. 5, 2017, 10:41 p.m.) > > > Review request for sentry, Alexander Kolbasov, Brian Towles, Na Li, and > Vamsee Yarlagadda. > > > Bugs: sentry-1919 > https://issues.apache.org/jira/browse/sentry-1919 > > > Repository: sentry > > > Description > ------- > > The patch uses an atomic boolean flag that prevents multiple HDFS requests to > get the paths updates (either deltas or full images) by returning an empty > paths updates list if the flag is set as not available. > > The reason to prevent for deltas as well is because the code change was much > easier to do in once place (SentryHDFSServiceProcessor). We shouldn't get any > issues if we do it this way (unless the reviewer thinks otherwise). > > > Diffs > ----- > > > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryHDFSServiceProcessor.java > 6221f3d01 > > > Diff: https://reviews.apache.org/r/62096/diff/2/ > > > Testing > ------- > > > Thanks, > > Sergio Pena > >
