[
https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805521#comment-15805521
]
Anu Engineer commented on HDFS-7343:
------------------------------------
[~zhouwei] Thank you for addressing all the issues I had. The updated design
doc looks excellent. if there are any JIRAs that you need help with, please let
me know and I will be happy to chip in. I am almost at a +1. However, I had
some questions and comments. Please treat the following sections as Questions
(Things that I don’t understand and would like to know), comments (some
subjective comments, please feel free to ignore them), nitpick (completely
ignore them, written down to avoid someone else asking the same question later).
h4. Questions
{noformat}
NameNode then stores the data into database. In this way, SSM has no need to
maintain state checkpoints.
{noformat}
1. I would like to understand the technical trade-offs that was considered in
making this choice. Other applications that do this, like Amabri chooses to
store this data in a database maintained within the application. However, for
SSM you are choosing to store them in Namenode. In fact when I look at the
architecture diagram (it is very well drawn, thank you) It looks to me that it
is trivial to have the LevelDB on the SSM side instead of Namenode. So I am
wondering what advantage we have by maintaining more metadata on namenode side.
2. 1. In Rule Engine section, can we throttle the number of times a
particular rule is executed in a time window? What I am trying to prevent is
some kind of flapping where two opposite rules are triggered continuously.
3. Also there is a reference {noformat} SSM also stores some data (for example,
rules) into database through NameNode. {noformat} Do we need to store the rules
inside Namenode ? Would it make more sense to store it in SSM itself. The
reason why I am asking is that in future I see that this platform could be
leveraged by Hive or HBase. If that is that case, having an independent rule
store might be more interesting than a pure HDFS one.
4. {noformat} HA supporting will be considered later {noformat} I am all for
postponing HA support, but I am not able to understand how we are going to
store rules in namenode and ignore HA. Are we going to say SSM will not work if
HA is enabled? Most clusters we see are HA enabled. However, if we avoid
dependencies on Namenode, SSM might work with HA enabled cluster. I cannot see
anything in SSM that inherently prevents it from working with an HA enabled
cluster.
5. {noformat} SSM also exports interface (for example, through RESTful API) for
administrator and client to manage SSM or query information.{noformat} May be
something for later, but how do you intend to protect this end point? Is it
going to be Kerberos? Usually I would never ask a service to consider security
till you have to core modules, but in this case the potential for abuse is very
high. I understand that we might not have it in the initial releases of SSM,
but it might be good to think about it.
6. How do we prevent a run-away rule? Let me give you an example, recently I
was cleaning a cluster and decided to run “rm –rf “ on the data directories.
Each datanode had more than 12 million files and just a normal file system
operation was taking forever. So a rule like *file.path matches "/fooA/*.dat"*,
might run forever (I am looking at you , Hive). Are you planning to provide a
timeout on the execution of a rule or is it that rule will run until we reach
the end of processing? if we don’t have timeouts, it might be hard to honor
other rules which want to run at a specific time. Even with multiple threads,
you might not be able to make too much progress and since most of these rules
are going to be run against Namenode and you would have a limited bandwidth to
work with.
7. On the HDFS client querying SSM before writing, what happens if the SSM is
down? Will client wait and retry , potentially making I/O slower and eventually
bypassing SSM ? Have you considered using Storage Policy Satisfier,
HDFS-10285. Even if SSM is down or a client does not talk to SSM, we could rely
on SPS to move the data to the right location. Some of your storage manager
functionality can leverage what is being done in Storage Policy Satisfier. So
can you please clarify how you will handle clients that does not talk to SSM
and what happens to I/O when SSM is down.
h4. Comments
1. I still share some of the concerns voiced by [~andrew.wang]. It is going to
be challenging to create a set of static rules for changing conditions of the
cluster, especially when works loads are different. But sometimes we learn
surprising things by doing rather than talking. I would love to learn how this
is working out in real world clusters. If you have any data to share I would l
appreciate it.
h4. Nitpic
{noformat}
HSM, Cache, SPS, DataNode Disk Balancer(HDFS-1312) and EC to do the actual
data manipulation work.
{noformat}
I think we have accidentally omitted reference to our classic balancer here.
> HDFS smart storage management
> -----------------------------
>
> Key: HDFS-7343
> URL: https://issues.apache.org/jira/browse/HDFS-7343
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Kai Zheng
> Assignee: Wei Zhou
> Attachments: HDFS-Smart-Storage-Management-update.pdf,
> HDFS-Smart-Storage-Management.pdf
>
>
> As discussed in HDFS-7285, it would be better to have a comprehensive and
> flexible storage policy engine considering file attributes, metadata, data
> temperature, storage type, EC codec, available hardware capabilities,
> user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution
> to provide smart storage management service in order for convenient,
> intelligent and effective utilizing of erasure coding or replicas, HDFS cache
> facility, HSM offering, and all kinds of tools (balancer, mover, disk
> balancer and so on) in a large cluster.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]