[
https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952175#comment-15952175
]
Wei Zhou commented on HDFS-7343:
--------------------------------
Thanks for [~eddyxu] and [~drankye] for the discussion on SSM design. On file
{{accessCount}}, we plan to use SQL database to store and track these data. As
showed in the chart,
- SSM polls {{accessCount}} data from NN to get file access count info happened
in the time interval (for example, 5s).
- Create a table to store the info and insert the table name into table
{{access_count_table}}.
- Then file access count of last time interval can be calculated by
accumulating data in tables that their {{start time}} and {{end time}} falls in
the interval.
- To control the total amount of data, second-level of {{accessCount}} tables
will be aggregated into minute-level, hour-level, day-level, month-level and
year-level. The longer the time from now, the larger the granularity for
aggregation. More accurate data kept for near now than long ago.
The excel file attached demos more info about the data and tables maintained in
SSM.
!access_count_tables.jpg!
> HDFS smart storage management
> -----------------------------
>
> Key: HDFS-7343
> URL: https://issues.apache.org/jira/browse/HDFS-7343
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Kai Zheng
> Assignee: Wei Zhou
> Attachments: access_count_tables.jpg,
> HDFSSmartStorageManagement-General-20170315.pdf,
> HDFS-Smart-Storage-Management.pdf,
> HDFSSmartStorageManagement-Phase1-20170315.pdf,
> HDFS-Smart-Storage-Management-update.pdf, move.jpg, tables_in_ssm.xlsx
>
>
> As discussed in HDFS-7285, it would be better to have a comprehensive and
> flexible storage policy engine considering file attributes, metadata, data
> temperature, storage type, EC codec, available hardware capabilities,
> user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution
> to provide smart storage management service in order for convenient,
> intelligent and effective utilizing of erasure coding or replicas, HDFS cache
> facility, HSM offering, and all kinds of tools (balancer, mover, disk
> balancer and so on) in a large cluster.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]