Vamsee Yarlagadda created SENTRY-1779:
-----------------------------------------
Summary: HDFS full snapshot should limit to a set of path prefixes
Key: SENTRY-1779
URL: https://issues.apache.org/jira/browse/SENTRY-1779
Project: Sentry
Issue Type: Improvement
Components: Hdfs Plugin
Affects Versions: 1.5.1, sentry-ha-redesign
Reporter: Vamsee Yarlagadda
Currently when the cluster starts up, HDFS requests aa full snapshot from
Sentry and Sentry returns a complete list of all privileges and permissions to
HDFS plugin and upon receiving the data, the plugin filters the content to a
subset that matches the prefixes. And this happens every time during the
service restart (HDFS) or upon the expiry (every 24hrs). So during this time,
Sentry is doing the heavy lifting work of loading all the metadata on to the
memory to send the full snapshot to HDFS even though HDFS might not care about
most of the data. During this time, the memory requirement for Sentry spikes
and could hit OOM given if the metadata can get huge over time.
A better option would be that the plugin asks for full snapshot for a list of
prefixes. And Sentry would query the database for permissions by filtering with
the paths supplied. Thereby, reducing the memory usage of Sentry and also
reducing the amount of data being transferred over to the HDFS.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)