[
https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941287#comment-15941287
]
Lei (Eddy) Xu commented on HDFS-7343:
-------------------------------------
Hi, [~drankye] and [~zhouwei]
Thanks for the updated design docs. It is a very good write up.
I have a few questions.
* IIUC, SSM needs to maintain stats of each file / block in NN, which adds O(n)
overhead to JVM heap, where n is {# of blocks or # of files}. Additionally,
when SSM pulling these stats from NN, what kind of overhead we are expecting?
If SCM pulls the full block map / file namespace, would that be a performance
hit to NN?
* I think that in the general design, we should work on define a good interface
time series store for metrics, instead of specifying RocksDB. RocksDB might be
a good implementation for now.
* In Figure 1 of general design, it is not clear to me why both SSM and NN need
to persist metrics in two _separated_ rocksdb? If NN needs to persist metrics
to rocksdb, does that mean both ANN and SNN in a HA setup need to persist them?
What about the HA of SSM?
* Rules. How stale or extensible the syntax will be? would the syntax be easy
for other applications to generate / parse / validate and etc? What would be
your opinion of using json or YAML for the syntax? Would it be possible that
when adding a rule, SSM can verify the rule based on HSM/EC/SPS policies and
etc?
* in Phase 1 design, {{file.accessCount(interval)}}. How to update the
accessCount ? how many samples of accessCount to be maintained the accessCount
during an interval? What if SSM failed to pull for the the accessCount?
* Maybe we can define SSM rules as _soft_ rules, while make HSM/EC and etc
rules as hard rules?
Thanks!
> HDFS smart storage management
> -----------------------------
>
> Key: HDFS-7343
> URL: https://issues.apache.org/jira/browse/HDFS-7343
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Kai Zheng
> Assignee: Wei Zhou
> Attachments: HDFSSmartStorageManagement-General-20170315.pdf,
> HDFS-Smart-Storage-Management.pdf,
> HDFSSmartStorageManagement-Phase1-20170315.pdf,
> HDFS-Smart-Storage-Management-update.pdf, move.jpg
>
>
> As discussed in HDFS-7285, it would be better to have a comprehensive and
> flexible storage policy engine considering file attributes, metadata, data
> temperature, storage type, EC codec, available hardware capabilities,
> user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution
> to provide smart storage management service in order for convenient,
> intelligent and effective utilizing of erasure coding or replicas, HDFS cache
> facility, HSM offering, and all kinds of tools (balancer, mover, disk
> balancer and so on) in a large cluster.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]