Hi Encrico, Andor, The PR has been updated. https://github.com/apache/zookeeper/pull/1698. Can you help reviewing it, so it can be merged?
Thanks, Li On Tue, May 18, 2021 at 5:01 PM Li Wang <li4w...@gmail.com> wrote: > Hi Andor, Ted, Enrico, > > Thanks again for your discussions and inputs. The changes have been > submitted. https://github.com/apache/zookeeper/pull/1698 > > Would you mind reviewing the PR? > > Best, > > Li > > On Wed, May 5, 2021 at 10:00 AM Li Wang <li4w...@gmail.com> wrote: > >> Thanks Andor! >> >> Li >> >> On Tue, May 4, 2021 at 2:00 PM Andor Molnar <an...@apache.org> wrote: >> >>> Awesome work Li, indeed. >>> I’m happy to review the PR. Thanks for the contribution. >>> >>> Andor >>> >>> >>> >>> > On 2021. May 4., at 3:49, Li Wang <li4w...@gmail.com> wrote: >>> > >>> > Hi, >>> > >>> > Thanks Ted and Enrico again for the inputs and discussions. I would >>> like to >>> > share some updates from my side. >>> > >>> > 1. The main issue is that Prometheus summary computation is expensive >>> and >>> > locked/synchronized. To reduce the perf impact, I changed the >>> > PrometheusLabelledSummary.add() operation to be async, so it is >>> handled by >>> > separate threads from the main thread. >>> > >>> > 2. Ran benchmark against the changes. The perf of read and write >>> operations >>> > are significantly improved. . >>> > >>> > a) For read operation, the avg latency is reduced about 50% and avg >>> > throughout is increased about 95%. >>> > b) For write operation, the avg latency is reduced about 28% and avg >>> > throughput is increased about 20%. >>> > >>> > 3. I opened a JIRA for the issue and can submit a PR for the change. >>> > >>> > https://issues.apache.org/jira/browse/ZOOKEEPER-4289 >>> > >>> > Please let me know if you have any thoughts or questions. >>> > >>> > Thanks, >>> > >>> > Li >>> > >>> > >>> > >>> > On Wed, Apr 28, 2021 at 11:32 PM Enrico Olivelli <eolive...@gmail.com> >>> > wrote: >>> > >>> >> Il giorno mer 28 apr 2021 alle ore 21:48 Li Wang <li4w...@gmail.com> >>> ha >>> >> scritto: >>> >>> >>> >>> Hi Enrico, >>> >>> >>> >>> Please see my comments inline. >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Li. >>> >>> >>> >>> On Mon, Apr 26, 2021 at 10:42 PM Enrico Olivelli < >>> eolive...@gmail.com> >>> >>> wrote: >>> >>> >>> >>>> Sorry for top posting. >>> >>>> >>> >>>> I have never seen in other applications that Prometheus has such a >>> >>>> significant impact. >>> >>> >>> >>> >>> >>> Please see my reply in the previous response. >>> >>> >>> >>> >>> >>>> >>> >>>> The first things that come into my mind: >>> >>>> - collect a couple of dumps with some perf tool and dig into the >>> >> problem >>> >>> >>> >>> >>> >>> Heap dump, thread dump and CPU profiling with Prometheus enabled vs >>> >>> disabled have been collected. That's how we found the issue. >>> >>> >>> >>> >>> >>>> - verify that we have the latest version of Prometheus client >>> >>>> >>> >>> >>> >>> ZK uses 0.9.0 and the latest version of Prometheus client is 0.10.0. >>> >>> Checked in the change logs, it doesn't look like there are major >>> changes >>> >>> between these two. >>> https://github.com/prometheus/client_java/releases We >>> >>> can upgrade the version and see if it makes any difference. >>> >>> >>> >>> - tune the few knobs we have in the Prometheus client >>> >>>> >>> >>> >>> >>> What are the knobs we can tune? Can you provide more info on this? >>> >> >>> >> Unfortunately there is nothing we can tune directly with the zookeeper >>> >> configuration file >>> >> >>> >> In ZK code we currently have only some basic configuration of >>> >> summaries, and we are using mostly the default values >>> >> >>> >> >>> https://github.com/apache/zookeeper/blob/11c07921c15e2fb7692375327b53f26a583b77ca/zookeeper-metrics-providers/zookeeper-prometheus-metrics/src/main/java/org/apache/zookeeper/metrics/prometheus/PrometheusMetricsProvider.java#L340 >>> >> there is no way to fine tune the buckets and all of the structures. >>> >> >>> >> we should work on looking for optimal values >>> >> >>> > >>> > >>> > >>> > >>> >> >>> >> >>> >>> >>> >>>> >>> >>>> In Apache Pulsar and in Apache Bookkeeper we have some >>> customizations >>> >> in >>> >>>> the Prometheus metrics collectors, we could the a look and port >>> those >>> >> to >>> >>>> Zookeeper (initially I worked on the Prometheus integration based >>> on my >>> >>>> usecases I have with Pulsar, Bookkeeper and other systems that >>> already >>> >> use >>> >>>> Prometheus, but here in Zookeeper we are using the basic Prometheus >>> >> client) >>> >>>> >>> >>> >>> >>> Are there any customizations for minimizing the performance impact? >>> Is >>> >>> there anything that we can port to ZK to help the issue? >>> >> >>> >> Yes, there is much we can port, this is the Prometheus Metrics >>> >> provider in BookKeeper >>> >> >>> >> >>> https://github.com/apache/bookkeeper/tree/master/bookkeeper-stats-providers/prometheus-metrics-provider/src/main/java/org/apache/bookkeeper/stats/prometheus >>> >> >>> >> Enrico >>> >> >>> >>> >>> >>> >>> >>>> Enrico >>> >>>> >>> >>>> Il Mar 27 Apr 2021, 06:35 Ted Dunning <ted.dunn...@gmail.com> ha >>> >> scritto: >>> >>>> >>> >>>>> Batching metrics reporting is very similar to option (c) but with >>> >> locking >>> >>>>> like option (a). That can usually be made faster by passing a >>> >> reference >>> >>>> to >>> >>>>> the metrics accumulator to the reporting thread which can do the >>> >> batch >>> >>>>> update without locks. Usually requires ping-pong metrics >>> >> accumulators so >>> >>>>> that a thread can accumulate in one accumulator for a bit, pass >>> that >>> >> to >>> >>>> the >>> >>>>> merge thread and switch to using the alternate accumulator. Since >>> all >>> >>>>> threads typically report at the same time, this avoids a stampede >>> on >>> >> the >>> >>>>> global accumulator. >>> >>>>> >>> >>>>> >>> >>>>> On Mon, Apr 26, 2021 at 9:30 PM Li Wang <li4w...@gmail.com> wrote: >>> >>>>> >>> >>>>>> batching metrics reporting can help. For example, in the >>> >>>> CommitProcessor, >>> >>>>>> increasing the maxCommitBatchSize helps improving the the >>> >> performance >>> >>>> of >>> >>>>>> write operation. >>> >>>>>> >>> >>>>>> >>> >>>>>> On Mon, Apr 26, 2021 at 9:21 PM Li Wang <li4w...@gmail.com> >>> wrote: >>> >>>>>> >>> >>>>>>> Yes, I am thinking that handling metrics reporting in a separate >>> >>>>> thread, >>> >>>>>>> so it doesn't impact the "main" thread. >>> >>>>>>> >>> >>>>>>> Not sure about the idea of merging at the end of a reporting >>> >> period. >>> >>>>> Can >>> >>>>>>> you elaborate a bit on it? >>> >>>>>>> >>> >>>>>>> Thanks, >>> >>>>>>> >>> >>>>>>> Li >>> >>>>>>> >>> >>>>>>> On Mon, Apr 26, 2021 at 9:11 PM Ted Dunning < >>> >> ted.dunn...@gmail.com> >>> >>>>>> wrote: >>> >>>>>>> >>> >>>>>>>> Would it help to keep per thread metrics that are either >>> >> reported >>> >>>>>>>> independently or are merged at the end of a reporting period? >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> On Mon, Apr 26, 2021 at 8:51 PM Li Wang <li4w...@gmail.com> >>> >> wrote: >>> >>>>>>>> >>> >>>>>>>>> Hi Community, >>> >>>>>>>>> >>> >>>>>>>>> I've done further investigation on the issue and found the >>> >>>> following >>> >>>>>>>>> >>> >>>>>>>>> 1. The perf of the read operation was decreased due to the >>> >> lock >>> >>>>>>>> contention >>> >>>>>>>>> in the Prometheus TimeWindowQuantiles APIs. 3 out of 4 >>> >>>>>> CommitProcWorker >>> >>>>>>>>> threads were blocked on the TimeWindowQuantiles.insert() API >>> >> when >>> >>>>> the >>> >>>>>>>> test >>> >>>>>>>>> was. >>> >>>>>>>>> >>> >>>>>>>>> 2. The perf of the write operation was decreased because of >>> >> the >>> >>>> high >>> >>>>>> CPU >>> >>>>>>>>> usage from Prometheus Summary type of metrics. The CPU usage >>> >> of >>> >>>>>>>>> CommitProcessor increased about 50% when Prometheus was >>> >> disabled >>> >>>>>>>> compared >>> >>>>>>>>> to enabled (46% vs 80% with 4 CPU, 63% vs 99% with 12 CPU). >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> Prometheus integration is a great feature, however the >>> >> negative >>> >>>>>>>> performance >>> >>>>>>>>> impact is very significant. I wonder if anyone has any >>> >> thoughts >>> >>>> on >>> >>>>>> how >>> >>>>>>>> to >>> >>>>>>>>> reduce the perf impact. >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> Thanks, >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> Li >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> On Tue, Apr 6, 2021 at 12:33 PM Li Wang <li4w...@gmail.com> >>> >>>> wrote: >>> >>>>>>>>> >>> >>>>>>>>>> Hi, >>> >>>>>>>>>> >>> >>>>>>>>>> I would like to reach out to the community to see if anyone >>> >> has >>> >>>>> some >>> >>>>>>>>>> insights or experience with the performance impact of >>> >> enabling >>> >>>>>>>> prometheus >>> >>>>>>>>>> metrics. >>> >>>>>>>>>> >>> >>>>>>>>>> I have done load comparison tests for Prometheus enabled vs >>> >>>>> disabled >>> >>>>>>>> and >>> >>>>>>>>>> found the performance is reduced about 40%-60% for both >>> >> read and >>> >>>>>> write >>> >>>>>>>>>> oeprations (i.e. getData, getChildren and createNode). >>> >>>>>>>>>> >>> >>>>>>>>>> The load test was done with Zookeeper 3.7, cluster size of 5 >>> >>>>>>>> participants >>> >>>>>>>>>> and 5 observers, each ZK server has 10G heap size and 4 >>> >> cpu, 500 >>> >>>>>>>>> concurrent >>> >>>>>>>>>> users sending requests. >>> >>>>>>>>>> >>> >>>>>>>>>> The performance impact is quite significant. I wonder if >>> >> this >>> >>>> is >>> >>>>>>>>> expected >>> >>>>>>>>>> and what are things we can do to have ZK performing the same >>> >>>> while >>> >>>>>>>>>> leveraging the new feature of Prometheus metric. >>> >>>>>>>>>> >>> >>>>>>>>>> Best, >>> >>>>>>>>>> >>> >>>>>>>>>> Li >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>> >>> >>>>>>> >>> >>>>>> >>> >>>>> >>> >>>> >>> >> >>> >>>