Kalyan, thank you for the detailed explanation. I missed the cache. And yeah, we're running into SENTRY-2422. We'll monitor and wait.
Thanks again! Cheers, Lars On Mon, Nov 19, 2018 at 4:18 PM Kalyan Kumar Kalvagadda <kkal...@cloudera.com.invalid> wrote: > Lars, > > > 1. SENTRY-2461 <https://issues.apache.org/jira/browse/SENTRY-2461> > should > definitely be fixed. Current approach is not efficient. > 2. > 3. Sentry currently checks if the notification is already processed. > It is in HiveNotificationFetcher.java > > > > 1. > > > > try { > > if (cache.contains(hash) || > sentryStore.isNotificationProcessed(hash)) { > > cache.add(hash); > > > > LOGGER.debug("Ignoring HMS notification already processed: ID = > {}", id); > > return false; > > } > > > > > > But there is issue(SENTRY-2422 > <https://issues.apache.org/jira/browse/SENTRY-2422>) in this check. > "isNotificationProcessed" > checks if the notification is processed by looking at SENTRY_PATH_CHANGE > table. > This is where the issue exists as sentry doesn't record all the > notifications in SENTRY_PATH_CHANGE table. Let's say if there is alter > table where the name and location is not change, sentry doesn't record it. > > Simple solution for this is to use SENTRY_HMS_NOTIFICATION_ID table instead > of SENTRY_PATH_CHANGE to verify if the notification is processed. > This will introduce new issues with the Hive version 2.3.3. In this version > of Hive, event id generated in NOTIFICATION_LOG table could be out-of-order > and even have duplicate event-id's. That is why we are working on > bumping up the hive version which the fix before making changes in Sentry. > > 1. > 1. > 2. SENTRY-2422 <https://issues.apache.org/jira/browse/SENTRY-2422> > will > be fixed after bumping up the hive version. For now, as a work around > you > can reduce the purge time by providing the configuration " > > sentry.store.clean.period.seconds" to avoid OOM memory issue. > > > > *Thanks,Kalyan Kumar Kalvagadda* | Software Engineer > t. (469) 279- <0000000000>5732 > cloudera.com <https://www.cloudera.com> > > [image: Cloudera] <https://www.cloudera.com/> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera > on LinkedIn] <https://www.linkedin.com/company/cloudera> > ------------------------------ > > > On Mon, Nov 19, 2018 at 5:04 AM Lars Francke <lars.fran...@gmail.com> > wrote: > > > Hi, > > > > we are/were facing an issue where we were running into an OOM error in > > Sentry. This happened right after startup and it is because of > > SENTRY-2461[1]. > > > > We're trying to understand how this is supposed to work. > > > > We see HmsFollower waking up every 500ms. > > It looks at the current notification Id we have stored > > in SENTRY_HMS_NOTIFICATION_ID (e.g. 100). > > It then subtracts 1 (e.g. 99) and retrieves any notifications newer than > > that (e.g. "100"). > > HmsFollower then calls NotificationProcessor#processNotificationEvent to > > process the event. > > If that method returns false we explicitly persist the notification Id > > > > There's a few things that we find weird about this: > > * We see hundreds of duplicates being processed because we never check if > > we already have processed the same notification id. Now I do see comments > > in the code that Hive reuses those ids so this might be intentional? > > > > * But we also store these duplicate notification ids over and over. In > the > > normal scenario where the id hasn't changed in the last 500ms we just > store > > it again and because the table does not have a constraint on duplicates > it > > stores tens of thousands of duplicates for us at least until the purging > > process kicks in 12h later > > > > I guess our question is: Is this normal behavior? Should we see thousands > > of duplicates and should Sentry reprocess the last notification over and > > over? > > > > Thank you very much! > > > > Cheers, > > Lars > > > > [1] <https://issues.apache.org/jira/browse/SENTRY-2461> > > >