Wei Zhou commented on HDFS-7343:
Thanks [~anu] for reviewing the design document and great comments!
For your comments:
1. Is this service attempting to become the answer for all administrative
problems of HDFS? In other words, Is this service is trying to be a catch all
I am not able to see what is common between caching and file placement and
between running distcp for remote replication and running balancer and disk
For long run, SSM is going to provide user an end-to-end storage management
automation solution. Any facility can be used in this project towards the
solution. The use cases and examples listed in the document just give examples
of the possible scenarios where SSM can be used and what can SSM do. SSM can
help from different angles by using these facilities.
But then in the latter parts of the document we drag in distcp, disk balancer
and balancer issues. SSM might be the right place for these in the long run,
but my suggestion is to focus on the core parts of the service and then extend
it to other things once the core is stabilized.
You are absolutely right, we have to be focused on implementing the core
part/module now instead of involving too much beyond at the same time, it's the
basis of other functions.
2. Do we need a new rules language – Would you please consider using a language
which admins will already know, for example, if we can write these rules in
time I have to configure Kerberos rules, I have to lookup the mini-regex
meanings. I am worried that this little rule language will blow up and once it
is in, this is something that we will need to support for the long term.
Yes, it's a very good question and we do have thought about it before. We aimed
at providing administrator/user a simple and specific rule language without
touching too much besides the rule logic itself. In fact, a rule is very simple
that only have to declare when and which action to be implied on some objects
(can be a file, node, etc.). A general and systematic language like python or
java script maybe too heavy for defining a rule.
3. In this model we are proposing a push model where the datanode and Namenode
pushes data to some kafka endpoint. I would prefer if namenode and datanode was
not aware of this service at all. This service can easily connect to namenode
and read almost all the data which is needed. If you need extra RPC to be added
in datanode and namenode that would an option too. But creating a dependency
from namenode and all datanodes in a cluster seems to be something that you
want to do after very careful consideration. If we move to a pull model, you
might not even need kafka service to be running in the initial versions.
Good point! This is also a very good way to implement SSM.
If using pull model, the advantage are:
(1) No dependency on Kafka service, and it’s indeed much easier for
development, testing and deployment.
(2) Closer relationship with HDFS which may be able to support features that
cannot be done in the model described in the design document.
The disadvantage are:
(1) It may have potential performance issue. SSM have to know the messages
timely in order to work effectively. In order to decrease the overhead of
getting messages, SSM have to query NameNodes for the messages on a very high
frequency all the time. It’s also very hard for SSM to query DataNodes one by
one to get messages in a large scale cluster.
(2) It simplifies the process of message collecting and management. If SSM
stopped by user or crashed while the HDFS cluster is still working, then
messages from nodes shall be lost without Kafka, and it’s not friendly for SSM
to collect historical data.
Above, both of the models are workable and we may need more discussion on it.
What’s your opinion?
> HDFS smart storage management
> Key: HDFS-7343
> URL: https://issues.apache.org/jira/browse/HDFS-7343
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Kai Zheng
> Assignee: Wei Zhou
> Attachments: HDFS-Smart-Storage-Management.pdf
> As discussed in HDFS-7285, it would be better to have a comprehensive and
> flexible storage policy engine considering file attributes, metadata, data
> temperature, storage type, EC codec, available hardware capabilities,
> user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution
> to provide smart storage management service in order for convenient,
> intelligent and effective utilizing of erasure coding or replicas, HDFS cache
> facility, HSM offering, and all kinds of tools (balancer, mover, disk
> balancer and so on) in a large cluster.
This message was sent by Atlassian JIRA
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org