[
https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958923#comment-15958923
]
Rakesh R commented on HDFS-7343:
--------------------------------
Thanks [~zhouwei] for more details about the data points.
bq. Create a table to store the info and insert the table name into table
access_count_table.
It looks like lot of tables will be created to capture time period details,
sec_1...sec_n, min_1...min_n, hour_1...hour_n, day_1....day_n,
month_1...month_12 etc. I hope these tables will be deleted after performing
the aggregation functions. Again, it may exhaust DB by growing the number of
tables if the aggregation time is longer, right?. Just a plain thought to
minimize the number of time spec tables, how about capturing {{access_time}} as
a column field and update {{access_time}} of respective {{fid}}? I think, using
the {{access_time}} attribute, we would be able to filter out specific
{{fid_access_count}} between a certain {{start_time}} and {{end_time}}.
Table {{seconds_level}} => composite key {{acess_time}} and {{fid}} to uniquely
identify each row in the table.
||acess_time||fid||count||
|sec-2017-03-31-12-59-45|3|1|
|sec-2017-03-31-12-59-45|2|1|
Again, for faster aggregation function probably we could maintain separate
{{tables per units of time}} like below. After the aggregate function, we could
delete those rows used for aggregation.
(1) seconds_level
(2) minutes_level
(3) hours_level
(4) days_level
(5) weeks_level
(6) months_level
(7) years_level
> HDFS smart storage management
> -----------------------------
>
> Key: HDFS-7343
> URL: https://issues.apache.org/jira/browse/HDFS-7343
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Kai Zheng
> Assignee: Wei Zhou
> Attachments: access_count_tables.jpg,
> HDFSSmartStorageManagement-General-20170315.pdf,
> HDFS-Smart-Storage-Management.pdf,
> HDFSSmartStorageManagement-Phase1-20170315.pdf,
> HDFS-Smart-Storage-Management-update.pdf, move.jpg, tables_in_ssm.xlsx
>
>
> As discussed in HDFS-7285, it would be better to have a comprehensive and
> flexible storage policy engine considering file attributes, metadata, data
> temperature, storage type, EC codec, available hardware capabilities,
> user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution
> to provide smart storage management service in order for convenient,
> intelligent and effective utilizing of erasure coding or replicas, HDFS cache
> facility, HSM offering, and all kinds of tools (balancer, mover, disk
> balancer and so on) in a large cluster.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]