> On Sept. 5, 2017, 9:44 p.m., Vamsee Yarlagadda wrote:
> > sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryHDFSServiceProcessor.java
> > Lines 103 (patched)
> > <https://reviews.apache.org/r/62096/diff/1/?file=1815913#file1815913line103>
> >
> >     At this point, we doubled the memory requirement as permUpdates and 
> > retPermUpdates are both in memory.
> >     Is there anything we can do to reduce this footprint?e.g: Use an 
> > iterator to traverse and proactively call remove() for every next() so that 
> > we only use permUpdates.size() units of memory?
> 
> Sergio Pena wrote:
>     It's not 2x size. If you look at what update.toThrift() does, it only 
> returns the TPermissionUpdate object referenced on the PermissionsUpdate and 
> create a list of those already referenced objects. This does not create a 
> copy of the object, right? Let me know if this is incorrect. The list size 
> would be 2x, but the size of each list item does not increase the memory.
>     
>     Anyway, doing an iterator does not apply for full images because 
> getAllPermsUpdatesFrom() returns a singleton list when a full image is 
> retrieved. I think we could improve the memory footprint if we avoid return 
> TPermissionsUpdate instead of PermissionsUpdate from getAllPermsUpdatesFrom() 
> and perhaps make HDFS->Sentry requests in batches (for deltas only) as well.
>     
>     For full images, then it is more complicated, but we could use batches 
> for the MAuthzPathsMapping when querying the DB and iterate over the 
> retrieved batches and create the TPermissionsUpdate. Doing so requires more 
> time and testing.
>     
>     What do you think?

As long as we are creating a shallow copy, I am fine with the existing 
approach. Thanks for the explanation.


- Vamsee


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62096/#review184587
-----------------------------------------------------------


On Sept. 5, 2017, 10:41 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62096/
> -----------------------------------------------------------
> 
> (Updated Sept. 5, 2017, 10:41 p.m.)
> 
> 
> Review request for sentry, Alexander Kolbasov, Brian Towles, Na Li, and 
> Vamsee Yarlagadda.
> 
> 
> Bugs: sentry-1919
>     https://issues.apache.org/jira/browse/sentry-1919
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> The patch uses an atomic boolean flag that prevents multiple HDFS requests to 
> get the paths updates (either deltas or full images) by returning an empty 
> paths updates list if the flag is set as not available.
> 
> The reason to prevent for deltas as well is because the code change was much 
> easier to do in once place (SentryHDFSServiceProcessor). We shouldn't get any 
> issues if we do it this way (unless the reviewer thinks otherwise).
> 
> 
> Diffs
> -----
> 
>   
> sentry-hdfs/sentry-hdfs-service/src/main/java/org/apache/sentry/hdfs/SentryHDFSServiceProcessor.java
>  6221f3d01 
> 
> 
> Diff: https://reviews.apache.org/r/62096/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>

Reply via email to