[ https://issues.apache.org/jira/browse/HDFS-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005158#comment-13005158 ]
Tsz Wo (Nicholas), SZE commented on HDFS-1742: ---------------------------------------------- > Cluster admins or, more likely, Hadoop developers. ... Sounds good. I suggest you to update the description for avoiding confusion. > It depends on how do you define "internally". ... I actually mean the same as your suggested, i.e. not for end users to provide a {{JobInProgressListener}} class. > Provide hooks / callbacks to execute some code based on events happening in > HDFS (file / directory creation, opening, closing, etc) > ----------------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-1742 > URL: https://issues.apache.org/jira/browse/HDFS-1742 > Project: Hadoop HDFS > Issue Type: New Feature > Components: name-node > Reporter: Mikhail Yakshin > Labels: features, polling > > We're working on a system that runs various Hadoop job continuously, based on > the data that appears in HDFS: for example, we have a job that works on day's > worth of data and creates output in {{/output/YYYY/MM/DD}}. For input, it > should wait for directory with externally uploaded data as > {{/input/YYYY/MM/DD}} to appear, and also wait for previous day's data to > appear, i.e. {{/output/YYYY/MM/DD-1}}. > Obviously, one of the possible solutions is polling once in a while for > files/directories we're waiting for, but generally it's a bad solution. The > better one is something like file alteration monitor or [inode activity > notifiers|http://en.wikipedia.org/wiki/Inotify], such as ones implemented in > Linux filesystems. > Basic idea is that one can specify (inject) some code that will be executed > on every major event happening in HDFS, such as: > * File created / open > * File closed > * File deleted > * Directory created > * Directory deleted > I see simplistic implementation as following: NN defines some interfaces that > implement callback/hook mechanism - i.e. something like: > {code} > interface NameNodeCallback { > public void onFileCreate(SomeFileInformation f); > public void onFileClose(SomeFileInformation f); > public void onFileDelete(SomeFileInformation f); > ... > } > {code} > A user creates a class that implements this method and loads it somehow (for > example, using an extra jar in classpath) in NameNode's JVM. NameNode > includes a configuration option that specifies names of such class(es) - then > NameNode instantiates them and calls methods from them (in a separate thread) > on every valid event happening. > This would allow systems such as I've described in the beginning to be > implemented without polling. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira