[
https://issues.apache.org/jira/browse/OOZIE-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570838#comment-14570838
]
Jaydeep Vishwakarma commented on OOZIE-2179:
--------------------------------------------
[~rkanter], I am not seeing any assignee for this issue. Are you working on it?
> Use HDFS INotify to track HDFS data dependencies instead of polling
> -------------------------------------------------------------------
>
> Key: OOZIE-2179
> URL: https://issues.apache.org/jira/browse/OOZIE-2179
> Project: Oozie
> Issue Type: New Feature
> Components: coordinator
> Reporter: Robert Kanter
>
> Instead of polling the NN every minute for Coordinators, we should look into
> using the new INotify feature in HDFS-6634. It allows you to get a stream of
> events from HDFS. Internally, it still uses a polling mechanism for now, but
> even so, it would likely be more efficient and less heavy-handed than what
> we're doing.
> We'd probably still have to check if the directory exists when a coordinator
> action starts in case we missed the event, but while waiting for an HDFS
> dependency to be available, we can use INotify.
> For HCat dependencies we still have a backup polling of 10 minutes in case a
> JMS message is missed or lost. I don't think we'll need to do this for
> INotify because you can view past events as long as you keep track of the
> event ID. For example, if you restart Oozie and we kept track of the last ID
> Oozie looked at, we could resume from there without losing anything.
> The INotify stream is asynchronous, so we won't receive a notification
> immediately. We should look into the guarantees of how long it can take for
> the notification to show up.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)