[ 
https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952175#comment-15952175
 ] 

Wei Zhou commented on HDFS-7343:
--------------------------------

Thanks for [~eddyxu] and [~drankye] for the discussion on SSM design. On file 
{{accessCount}}, we plan to use SQL database to store and track these data. As 
showed in the chart,
- SSM polls {{accessCount}} data from NN to get file access count info happened 
in the time interval (for example, 5s).
- Create a table to store the info and insert the table name into table 
{{access_count_table}}.
- Then file access count of last time interval can be calculated by 
accumulating data in tables that their {{start time}} and {{end time}} falls in 
the interval.
- To control the total amount of data, second-level of {{accessCount}} tables 
will be aggregated into minute-level, hour-level, day-level, month-level and 
year-level. The longer the time from now, the larger the granularity for 
aggregation. More accurate data kept for near now than long ago. 

The excel file attached demos more info about the data and tables maintained in 
SSM. 

!access_count_tables.jpg!

> HDFS smart storage management
> -----------------------------
>
>                 Key: HDFS-7343
>                 URL: https://issues.apache.org/jira/browse/HDFS-7343
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kai Zheng
>            Assignee: Wei Zhou
>         Attachments: access_count_tables.jpg, 
> HDFSSmartStorageManagement-General-20170315.pdf, 
> HDFS-Smart-Storage-Management.pdf, 
> HDFSSmartStorageManagement-Phase1-20170315.pdf, 
> HDFS-Smart-Storage-Management-update.pdf, move.jpg, tables_in_ssm.xlsx
>
>
> As discussed in HDFS-7285, it would be better to have a comprehensive and 
> flexible storage policy engine considering file attributes, metadata, data 
> temperature, storage type, EC codec, available hardware capabilities, 
> user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution 
> to provide smart storage management service in order for convenient, 
> intelligent and effective utilizing of erasure coding or replicas, HDFS cache 
> facility, HSM offering, and all kinds of tools (balancer, mover, disk 
> balancer and so on) in a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to