Vamsee Yarlagadda created SENTRY-1779:
-----------------------------------------

             Summary: HDFS full snapshot should limit to a set of path prefixes 
                 Key: SENTRY-1779
                 URL: https://issues.apache.org/jira/browse/SENTRY-1779
             Project: Sentry
          Issue Type: Improvement
          Components: Hdfs Plugin
    Affects Versions: 1.5.1, sentry-ha-redesign
            Reporter: Vamsee Yarlagadda


Currently when the cluster starts up, HDFS requests aa full snapshot from 
Sentry and Sentry returns a complete list of all privileges and permissions to 
HDFS plugin and upon receiving the data, the plugin filters the content to a 
subset that matches the prefixes. And this happens every time during the 
service restart (HDFS) or upon the expiry (every 24hrs). So during this time, 
Sentry is doing the heavy lifting work of loading all the metadata on to the 
memory to send the full snapshot to HDFS even though HDFS might not care about 
most of the data. During this time, the memory requirement for Sentry spikes 
and could hit OOM given if the metadata can get huge over time.

A better option would be that the plugin asks for full snapshot for a list of 
prefixes. And Sentry would query the database for permissions by filtering with 
the paths supplied. Thereby, reducing the memory usage of Sentry and also 
reducing the amount of data being transferred over to the HDFS. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to