Below is a write-up on tiering counters (bz 1275917) I give three options, and
I think option (1) and (3) are doable. (2) is harder and would need more
discussion.
Currently counters give limited information on tiering behavior. They are just
a raw count of the number of files moved each direction. The overall feature is
much less usable as a result.
Generally counters should work with future tiering use cases, i.e. tier
according to location or some other policy.
$ gluster volume tier vol1 status
Node Promoted files Demoted files Status
--------- --------- --------- ---------
localhost 20 30 in progress
172.17.60.18 0 0 in progress
172.17.60.19 0 0 in progress
172.17.60.20 0 0 in progress
(1)
Customers want to know the total number of files / MB on a tier at any one
time. I propose we query the database on the bricks for each tier, to get a
count of the number of files.
$ gluster volume tier vol1 status
Node Promoted files /hot count Demoted files / cold count
Status
--------- --------- ---------
---------
localhost 20 / 500 30 /2000
in progress
172.17.60.18 0 0
in progress
172.17.60.19 0 0
in progress
172.17.60.20 0 0
in progress
(2)
People need to know the ratio of I/Os served by the hot tier to the cold tier.
For an administrator, if 90% of your I/Os go to the hot tier, this is good. If
only 20% are served by the hot tier, this is bad, and there is a
misconfiguration.
Something like this is what we want:
$ gluster volume tier vol1 status
Node Promoted files Demoted files Read Hit rate
Write Hit Rate Status
--------- --------- --------- ---------
------- --------
localhost 0 0 80%
75% in progress
The difficulty is how to capture that. When we read a large file, it is broken
up into multiple individual reads. Each piece is a single read FOP. Should we
consider each FOP individually? Or does only the first "hit" to the hot tier
count?
Also, when an FOP comes in, it will first look on one tier, and then the other
tier. The callback to the FOP checks success or failure. It is only when the
file is found on none of the subvolumes that the FOP returns an error. New code
needs to deal with this complexity. If there is failure on the cold tier but
success on the hot tier, the "hit count" should be bumped.
We probably do not want to update the "hit rate" on all FOPs.
(3)
A simpler new counter to implement is the #MB promoted or demoted. I think that
could be satisfied in a separate patch and could be done quicker.
This output with (2) and (3):
$ gluster volume tier vol1 status
Node Promoted files/MB Demoted files/MB Read Hit rate
Write Hit Rate Status
--------- --------- --------- ---------
------- --------
localhost 120/2033MB 50/1044MB 80%
75% in progress
_______________________________________________
Gluster-devel mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-devel