[
https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849385#comment-15849385
]
Andrew Wang commented on HDFS-7343:
-----------------------------------
Hi Wei, thanks for posting the new doc. At a high-level I think the scope of
this project is really big. The doc doesn't go into sufficient implementation
detail for me to really understand what's involved in the first development
phase. Splitting this into further phases would help. One possible staging:
* Define triggers and the data required to implement them
* Data collection from HDFS
* Implement the different actions
* The rules syntax and definition
Other comments:
* Could you describe how you will satisfy usecases 4 and 5 in more detail?
* Please describe the complete set of NameNode changes required, particularly
the use of LevelDB and additional state
* The lack of HA means this will be a non-starter for many production
deployments, could you comment on the difficulty of implementing HA? This
should really be covered in the initial design.
* Why are the StorageManager and CacheManager treated as separate components?
If the StorageManager incorporates storage policies, EC, S3, etc, it already
seems quite general
* Why are ChangeStoragePolicy and EnforceStoragePolicy separate actions? Is
there a usecase to changing the SP but not moving the data?
Metric collection:
* What is the set of metrics do you plan to collect from HDFS?
* Right now we don't have centralized read statistics which would be obviously
useful to implement a caching policy. Is there a plan to implement this?
Triggers:
* Could you provide a complete description of the trigger syntax? Notably, I
don't see a way to "hash" the time in the examples.
* How often does the SSM wake up to check rules?
Conditions:
* Could you provide a complete list of conditions that are planned?
* How do you plan to implement accessCount over a time range?
* Any other new metrics or information you plan to add to HDFS as part of this
work?
* Prefer we use atime or ctime rather than "age", since they're more specific
Object matching:
* Could you provide a complete definition of the object matching syntax?
* Do rules support basic boolean operators like AND, OR, NOT for objects and
conditions?
* Is there a reason you chose to implement regex matches rather than file
globbing for path matching? Are these regexs on the full path, or per path
component?
* Aren't many of these matches going to require listing the complete
filesystem? Or are you planning to use HDFS inotify?
Actions:
* The "cache" action is underspecified, what cache pool is used?
* How often does the SSM need to poll the NN to get information? How much
information each time? Some triggers might require listing a lot of the
namespace.
* Can actions happen concurrently? Is there a way of limiting concurrency?
* Can you run multiple actions in a rule? Is there a syntax for defining
"functions"?
* Are there substitutions that can be used to reference the filename, e.g.
"${file}"? Same for DN objects, the diskbalancer needs the DN host:port.
Operational questions:
* Is there an audit log for actions taken by the SSM?
* Is there a way to see when each action started, stopped, and its status?
* How are errors and logs from actions exposed?
* What metrics are exposed by the SSM?
* Why are there configuration options to enable individual actions? Isn't this
behavior already defined by the rules file?
* Why does the SSM need a "dfs.ssm.enabled" config? Is there a usecase for
having an SSM service started, but not enabled?
* Is the rules file dynamically refreshable?
* What do we do if the rules file is malformed? What do we do if there are
conflicting rules or multiple matches?
* dfs.ssm.msg.datanode.interval is described as the polling interval for the
NN, typo?
* What happens if multiple SSMs are accidentally started?
> HDFS smart storage management
> -----------------------------
>
> Key: HDFS-7343
> URL: https://issues.apache.org/jira/browse/HDFS-7343
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Kai Zheng
> Assignee: Wei Zhou
> Attachments: HDFS-Smart-Storage-Management.pdf,
> HDFS-Smart-Storage-Management-update.pdf, move.jpg
>
>
> As discussed in HDFS-7285, it would be better to have a comprehensive and
> flexible storage policy engine considering file attributes, metadata, data
> temperature, storage type, EC codec, available hardware capabilities,
> user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution
> to provide smart storage management service in order for convenient,
> intelligent and effective utilizing of erasure coding or replicas, HDFS cache
> facility, HSM offering, and all kinds of tools (balancer, mover, disk
> balancer and so on) in a large cluster.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]