[
https://issues.apache.org/jira/browse/HDFS-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345544#comment-15345544
]
Zhe Zhang commented on HDFS-8940:
---------------------------------
This is a very useful optimization. Thanks [~mingma] for proposing the ideas
and design.
Just wanted to quickly follow-up and see if there's any interest in continuing
the work? We are facing a similar issue where NN suffers from large listing
operations.
Based on some initial requirements collection, I think in our environment Kafka
is a better fit. This is because some workloads do need to monitor for a
potentially very large number of events. E.g. ETL service monitoring all newly
generated files under a subtree and synching them to another cluster. If
dependency is a concern, maybe this optimization can be a standalone project
which depends on both HDFS and Kafka? It seems the current design uses inotify
APIs in a pretty clean way.
We should also think of a way to quantify the potential benefit -- e.g.
assuming we have the audit log from a large cluster, how to estimate the
reduced locking contention if all listing ops are eliminated (and inotify ops
added)? IIUC inotify calls don't {{readLock}} the namespace.
> Support for large-scale multi-tenant inotify service
> ----------------------------------------------------
>
> Key: HDFS-8940
> URL: https://issues.apache.org/jira/browse/HDFS-8940
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Ming Ma
> Attachments: Large-Scale-Multi-Tenant-Inotify-Service.pdf
>
>
> HDFS-6634 provides the core inotify functionality. We would like to extend
> that to provide a large-scale service that ten of thousands of clients can
> subscribe to.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]