[ 
https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849385#comment-15849385
 ] 

Andrew Wang commented on HDFS-7343:
-----------------------------------

Hi Wei, thanks for posting the new doc. At a high-level I think the scope of 
this project is really big. The doc doesn't go into sufficient implementation 
detail for me to really understand what's involved in the first development 
phase. Splitting this into further phases would help. One possible staging:

* Define triggers and the data required to implement them
* Data collection from HDFS
* Implement the different actions
* The rules syntax and definition

Other comments:

* Could you describe how you will satisfy usecases 4 and 5 in more detail?
* Please describe the complete set of NameNode changes required, particularly 
the use of LevelDB and additional state
* The lack of HA means this will be a non-starter for many production 
deployments, could you comment on the difficulty of implementing HA? This 
should really be covered in the initial design.
* Why are the StorageManager and CacheManager treated as separate components? 
If the StorageManager incorporates storage policies, EC, S3, etc, it already 
seems quite general
* Why are ChangeStoragePolicy and EnforceStoragePolicy separate actions? Is 
there a usecase to changing the SP but not moving the data?

Metric collection:
* What is the set of metrics do you plan to collect from HDFS?
* Right now we don't have centralized read statistics which would be obviously 
useful to implement a caching policy. Is there a plan to implement this?

Triggers:
* Could you provide a complete description of the trigger syntax? Notably, I 
don't see a way to "hash" the time in the examples.
* How often does the SSM wake up to check rules?

Conditions:
* Could you provide a complete list of conditions that are planned?
* How do you plan to implement accessCount over a time range?
* Any other new metrics or information you plan to add to HDFS as part of this 
work?
* Prefer we use atime or ctime rather than "age", since they're more specific

Object matching:
* Could you provide a complete definition of the object matching syntax?
* Do rules support basic boolean operators like AND, OR, NOT for objects and 
conditions?
* Is there a reason you chose to implement regex matches rather than file 
globbing for path matching? Are these regexs on the full path, or per path 
component?
* Aren't many of these matches going to require listing the complete 
filesystem? Or are you planning to use HDFS inotify?

Actions:
* The "cache" action is underspecified, what cache pool is used?
* How often does the SSM need to poll the NN to get information? How much 
information each time? Some triggers might require listing a lot of the 
namespace.
* Can actions happen concurrently? Is there a way of limiting concurrency?
* Can you run multiple actions in a rule? Is there a syntax for defining 
"functions"?
* Are there substitutions that can be used to reference the filename, e.g. 
"${file}"? Same for DN objects, the diskbalancer needs the DN host:port.

Operational questions:
* Is there an audit log for actions taken by the SSM?
* Is there a way to see when each action started, stopped, and its status?
* How are errors and logs from actions exposed?
* What metrics are exposed by the SSM?
* Why are there configuration options to enable individual actions? Isn't this 
behavior already defined by the rules file?
* Why does the SSM need a "dfs.ssm.enabled" config? Is there a usecase for 
having an SSM service started, but not enabled?
* Is the rules file dynamically refreshable?
* What do we do if the rules file is malformed? What do we do if there are 
conflicting rules or multiple matches?
* dfs.ssm.msg.datanode.interval is described as the polling interval for the 
NN, typo?
* What happens if multiple SSMs are accidentally started?

> HDFS smart storage management
> -----------------------------
>
>                 Key: HDFS-7343
>                 URL: https://issues.apache.org/jira/browse/HDFS-7343
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kai Zheng
>            Assignee: Wei Zhou
>         Attachments: HDFS-Smart-Storage-Management.pdf, 
> HDFS-Smart-Storage-Management-update.pdf, move.jpg
>
>
> As discussed in HDFS-7285, it would be better to have a comprehensive and 
> flexible storage policy engine considering file attributes, metadata, data 
> temperature, storage type, EC codec, available hardware capabilities, 
> user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution 
> to provide smart storage management service in order for convenient, 
> intelligent and effective utilizing of erasure coding or replicas, HDFS cache 
> facility, HSM offering, and all kinds of tools (balancer, mover, disk 
> balancer and so on) in a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to