Re: [Gluster-Maintainers] Metrics: and how to get them out from gluster

Aravinda Thu, 31 Aug 2017 23:59:07 -0700

On 09/01/2017 10:57 AM, Amar Tumballi wrote:

Disclaimer: This email is long, and did take significant time towrite. Do take time and read, review and give feedback, so we can havesome metrics related tasks done by Gluster 4.0
---
** History:*
To understand what is happening inside GlusterFS process, over theyears, we have opened many bugs and also coded few things with regardto statedump, and did put some effort into io-stats translator toimprove the gluster's monitoring capabilities.
But surely there is more required! And some glimpse of it is capturedin [1], [2], [3] & [4]. Also, I did send an email to this group [5]about possibilities of capturing this information.
** Current problem:*
When we talk about metrics or monitoring, we have to consider givingout these data to a tool which can preserve the readings in a periodictime, without a time graph, no metrics will make sense! So, the firstchallenge itself is how to get them out? Should getting the metricsout from each process need 'glusterd' interacting? or should we usesignals? Which leads us to *'challenge #1'.*
Next is, should we depend on io-stats to do the reporting? If yes, howto get information from between any two layers? Should we provideio-stats in between all the nodes of translator graph? or should weutilize STACK_WIND/UNWIND framework to get the details? This is our*'challenge #2'*
Once the above decision will be taken, then the question is, "whatabout 'metrics' from other translators? Who gives it out (ie, dumpsit?)? Why do we need something similar to statedump, and can't we readinfo from statedump itself?". But when we say 'metrics', we shouldhave a key and a number associated with it, statedump has lot more,and no format. If its different from statedump, then what is ouranswer for translator code to give out metrics? This is our*'challenge #3*'
If we get a solution to above challenges, then I guess we are in adecent shape for further development. Lets go through them one by one,in detail.
** Problems and proposed solutions:*

*a) how to dump metrics data ?*
Currently, I propose signal handler way, as it will give control forus to choose what are the processes we need to capture information on,and will be much faster than communicating through another tool. Alsoconsidering we need to have these metrics taken every 10sec or so,there will be a need for efficient way to get this out.
But even there, we have challenges, because we have already chosenboth USR1 and USR2 signal handlers, one for statedump, another fortoggling latency monitoring respectively. It makes sense to continueto have statedump use USR1, but toggling options should be technically(for correctness too) be handled by glusterd volume set options, andthere should be a way to handle it in a better way by our'reconfigure()' framework in graph-switch. Proposal sent in githubissue #303 [6].
If we are good with above proposal, then we can make use of USR2 formetrics dump. Next issue will be about the format of the file itself,which we will discuss at the end of the email.


How about using UDP to push data from Gluster processes?

- Signal handling only required while reloading vol file when metricsenabled/disabled- Push it to pre-configured UDP address(socket file), if listener existsit will capture metrics else UDP message is lost- Will not affect the io performance since it is asynchronous and erroris not checked by the sender.- If metrics is lost, will not impact any data/io. We don't need crashconsistency or high accuracy while collecting metrics.- Receiver can receive data fast without blocking incoming data and canproduce outputs in different formats asynchronously.


Usage:
- Enable metrics using volopt

- Start udp server(Metrics receiver), Example ./gluster-metrics-receiver(socket file should be predefined say `/var/run/gluster/metrics.socket`)


Limitations:

Similar to `strace` command, running two receiver same time is notpossible. It is possible to run multiple receivers by having differentsocket address for each process/pid. For example, `metrics.<pid>.socket`and receiver will listen using `./gluster-metrics-receiver -p <pid>`

NOTE: Above approach is already implemented in 'experimental' branch,excluding handling of [6].
*b) where to measure the latency and fops counts?*
One of the possible way is to load io-stats in between all the nodes,but it has its own limitations. Mainly, how to configure options ineach of this translator, will having too many translators slow downoperation ? (ie, create one extra 'frame' for every fop, and in agraph of 20 xlator, it will be 20 extra frame creates for a single fop).
I propose we handle this in 'STACK_WIND/UNWIND' macros itself, andprovide a placeholder to store all this data in translator structureitself. This will be more cleaner, and no changes are required in codebase, other than in 'stack.h (and some in xlator.h)'.
Also, we can provide 'option monitoring enable' (or disable) option asa default option for every translator, and can handle it atxlator_init() time itself. (This is not a blocker for 4.0, but good tohave). Idea proposed @ github #304 [7].
NOTE: this approach is working pretty good already at 'experimental'branch, excluding [7]. Depending on feedback, we can improve it further.
*c) framework for xlators to provide private metrics*
One possible solution is to use statedump functions. But to causeleast disruption to an existing code, I propose 2 new methods.'dump_metrics()', and 'reset_metrics()' to xlator methods, which canbe dl_open()'d to xlator structure.
'dump_metrics()' dumps the private metrics in the expected format, andwill be called from the global dump-metrics framework, and'reset_metrics()' would be called from a CLI command when someonewants to restart metrics from 0 to check / validate few things in arunning cluster. Helps debug-ability.
Further feedback welcome.
NOTE: a sample code is already implemented in 'experimental' branch,and protocol/server xlator uses this framework to dump metrics fromrpc layer, and client connections.
*d) format of the 'metrics' file.*
If you want any plot-able data on a graph, you need key (should bestring), and value (should be a number), collected over time. So, thisfile should output data for the monitoring systems and not exactly forthe debug-ability. We have 'statedump' for debug-ability.
So, I propose a plain text file, where data would be dumped like below.

```
# anything starting from # would be treated as comment.
<key><space><value>
# anything after the value would be ignored.
```
Any better solutions are welcome. Ideally, we should keep thisfriendly for external projects to consume, like tendrl [8] orgraphite, prometheus etc. Also note that, once we agree to the format,it would be very hard to change it as external projects would use it.
I would like to hear the feedback from people who are experienced withmonitoring systems here.
NOTE: the above format works fine with 'glustermetrics' project [9]and is working decently on 'experimental' branch.
------

** Discussions:*

Let me know how you all want to take the discussion forward?
Should we get to github, and discuss on each issue? or should I rebaseand send the current patches from experimental to 'master' branch anddiscuss in our review system? Or should we continue on the email here!
Regards,
Amar

References:

[1] - https://github.com/gluster/glusterfs/issues/137
[2] - https://github.com/gluster/glusterfs/issues/141
[3] - https://github.com/gluster/glusterfs/issues/275
[4] - https://github.com/gluster/glusterfs/issues/168
[5] -http://lists.gluster.org/pipermail/maintainers/2017-August/002954.html(last email of the thread).
[6] - https://github.com/gluster/glusterfs/issues/303
[7] - https://github.com/gluster/glusterfs/issues/304
[8] - https://github.com/Tendrl
[9] - https://github.com/amarts/glustermetrics

--
Amar Tumballi (amarts)


_______________________________________________
maintainers mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/maintainers



--
regards
Aravinda VK
http://aravindavk.in

_______________________________________________
maintainers mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/maintainers

Re: [Gluster-Maintainers] Metrics: and how to get them out from gluster

Reply via email to