On Mon, Oct 26, 2020 at 8:21 PM JuanPablo AJ <jpabl...@gmail.com> wrote:
> Jesper, > thanks a lot for your email, your answer was a hand in the dark forest of > doubts. > > I will start trying the load generator wrk2. > > About "instrument, profile, observe", yes, I added the gops agent but > until now I don't have any conclusion related to that information. > > I'm a proponent of adding metrics into your production systems running code. If the system has low load, you can certainly pay the overhead of such metrics. If the system has high load, you can always sample and only pick a fraction of every request (1% say). I'm happy to pay the cost of 5-10% on my production systems if the sacrifice means I know what is going on. Observability is formally defined as a way to determine the state of a system based on its outputs[0]. If you start having metrics along your genuine program output, you stand a far better chance at figuring out what is going on inside the system. Also, metrics tend to be proactive: problems can show themselves in metrics long before the critical threshold of system failure is hit. Good algorithms and data structures which your metrics package could endorse. Either directly, or as a variant thereof: * Vitter's algorithm R. It is related to a Fisher-Yates shuffle in a peculiar and interesting way. Though you may have to drop or decay the reservoir unless you are measuring the whole window. * Gil Tene's HdrHistogram. This essentially tracks a histogram based on the observation of floating point numbers: If we regard the exponent as buckets, each containing a set of mantissa buckets, we can quickly increment a bucket (a few nanoseconds). And the exponent-nature means we have high resolution close to 0 and less resolution away from 0. But this is often what one wants: if something takes 5 minutes, you often don't care if it was 5 minutes and 34 microseconds, so the approximation is sound. HdrHistogram also supports some nice algebraic properties such as merging (It forms a commutative monoid with the empty histogram as neutral element, and merging as the composition operation). * HyperLogLog-based data structure ideas: accept approximate values in exchange for much smaller data storage needs. * Decay ideas: If you keep a pair of (value, timestamp), you can decay the value over time according to some curve you decide. Keep an array of these and you can track top popular items efficiently. Go through the array and weed out any value which decays under a noise floor periodically to keep it down. I'm not saying you should implement these things yourself. I'm saying that a good metrics package will do that for you, and you should endorse it. The key is to figure out which metrics your application needs and then you need to add those. The SRE handbooks I linked earlier have some good starting points on what to measure. But nothing beats having knowledge of the internals of a system so you can add the better metrics yourself. At-a-glance blackbox metrics are nice. However, they often simply tells you something is wrong, but not what. In general, descriptive statistics is the tool you need to understand system behavior in the modern world. Infrastructures are simply too complex nowadays. For more pin-point understanding, a profiler might work really well, but the more concurrency a system has, the harder it is to gleam anything meaningful from a profile[1]. [0] Hat tip to Charity Majors for recognizing this from control theory. [1] This is the same reason debuggers can have a hard time in a distributed setting. Your program is halted, but half of the program lives behind an API not under your control. And the timeout is lurking. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAGrdgiW3YS8VTstp7gx8WN4hEqN%3DioUbEBahb7g7jC%3DnXEGGKw%40mail.gmail.com.