[jira] [Commented] (HDFS-7343) HDFS smart storage management

Rakesh R (JIRA) Thu, 06 Apr 2017 06:27:58 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958923#comment-15958923
 ]


Rakesh R commented on HDFS-7343:
--------------------------------

Thanks [~zhouwei] for more details about the data points.
bq. Create a table to store the info and insert the table name into table 
access_count_table.
It looks like lot of tables will be created to capture time period details, 
sec_1...sec_n, min_1...min_n, hour_1...hour_n, day_1....day_n, 
month_1...month_12 etc. I hope these tables will be deleted after performing 
the aggregation functions. Again, it may exhaust DB by growing the number of 
tables if the aggregation time is longer, right?. Just a plain thought to 
minimize the number of time spec tables, how about capturing {{access_time}} as 
a column field and update {{access_time}} of respective {{fid}}? I think, using 
the {{access_time}} attribute, we would be able to filter out specific 
{{fid_access_count}} between a certain {{start_time}} and {{end_time}}.

Table {{seconds_level}} => composite key {{acess_time}} and {{fid}} to uniquely 
identify each row in the table.
||acess_time||fid||count||
|sec-2017-03-31-12-59-45|3|1|
|sec-2017-03-31-12-59-45|2|1|

Again, for faster aggregation function probably we could maintain separate 
{{tables per units of time}} like below. After the aggregate function, we could 
delete those rows used for aggregation.

(1) seconds_level
(2) minutes_level
(3) hours_level
(4) days_level
(5) weeks_level
(6) months_level
(7) years_level

> HDFS smart storage management
> -----------------------------
>
>                 Key: HDFS-7343
>                 URL: https://issues.apache.org/jira/browse/HDFS-7343
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kai Zheng
>            Assignee: Wei Zhou
>         Attachments: access_count_tables.jpg, 
> HDFSSmartStorageManagement-General-20170315.pdf, 
> HDFS-Smart-Storage-Management.pdf, 
> HDFSSmartStorageManagement-Phase1-20170315.pdf, 
> HDFS-Smart-Storage-Management-update.pdf, move.jpg, tables_in_ssm.xlsx
>
>
> As discussed in HDFS-7285, it would be better to have a comprehensive and 
> flexible storage policy engine considering file attributes, metadata, data 
> temperature, storage type, EC codec, available hardware capabilities, 
> user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution 
> to provide smart storage management service in order for convenient, 
> intelligent and effective utilizing of erasure coding or replicas, HDFS cache 
> facility, HSM offering, and all kinds of tools (balancer, mover, disk 
> balancer and so on) in a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-7343) HDFS smart storage management

Reply via email to