Ok from my side.
Few more details about tracing spi updates, based on mentioned above
discussion with Nikolay and Nikita.

Tracing provides enough data for a performance profiling tool, actually
only root spans are required. However, according to Nikinta,
root-span-tracing has a 7-8% performance drop in comparison to 1-2%
performance drop of the performance profiling tool. It's the main reason to
have given tool as is right now. In order to reuse TracingSPI for a
profiling tool internals, few modifications should be made to increase
tracing performance:

   - Add support for non-strings tags and log points: primitives, etc.
   - Add ability to postpone adding span tags and log points to the very
   and of span tree creation.
   - Probably some sort of tags caching could also help.

Best regards,
Alexander

пн, 14 дек. 2020 г. в 12:48, Nikolay Izhikov <nizhi...@apache.org>:

> Hello, Igniters.
>
> We discussed this feature privately with Alexander and Nikita.
> Here are the results we want to share with the community:
>
> 0. In the end, both, performance statistic tool and tracing should use the
> same API.
> 1. We should improve the Tracing API, so it able to be used for gathering
> information about all operations without a significant performance drop.
>
> I propose to go as follows:
>
> 1. Merge current PR as is after final review. My intention is to provide a
> tool for users that can be used in the real-world production environment.
> 2. Improve the Tracing API.
> 3. Combine both tools under the same API.
>
> > 14 дек. 2020 г., в 10:42, Alexander Lapin <lapin1...@gmail.com>
> написал(а):
> >
> > Hello Igniters,
> >
> > Because the tracing causes performance drop 52% [4] and can not be
> >> used for collecting statistics about all queries in production
> >> deployments. The performance drop of the profiling tool is less than
> >> 2% and it can be used in production. I have benchmarked the tracing
> >> and got the results:
> >>
> >> -2% when configured OpenCensusTracingSpi and all scopes disabled
> >> -52% for TX scope (IgnitePutTxBenchmark)
> >> -58% for SQL scope  (IgniteSqlQueryBenchmark)
> >>
> >> Such a performance drop is significant to not use the tracing in
> >> production.
> >>
> > We've rerun tracing benchmarks based on more realistic scenarios and got
> a
> > 10-15% performance drop in case of sampling-rate 1 (all transactions were
> > traced). More realistic scenarios means that we don't test tracing
> > performance if the system is in overdraft state but add some sort of
> micro
> > throttling (1 millisecond) between operations, transactions in our case.
> > *IgnitePutTxBenchmark*
> >
> > Green: Case 1: NoopTracingSpi
> >
> > Blue: Case 2: OpenCensusTracingSpi (disabled)
> >
> > Red: Case 3: OpenCensusTracingSpi, --scope TX --sampling-rate 0.1
> >
> > Black: Case 5: *ControlCenter* + OpenCensusTracingSpi, --scope TX
> > --sampling-rate 0.1
> >
> > Violet: Case 4: OpenCensusTracingSpi, --scope TX --sampling-rate 1
> > Yellow: Case 6: ControlCenter + OpenCensusTracingSpi, --scope TX
> > --sampling-rate
> >
> > I have considered the possibility to reuse the tracing API. If
> >> statistics collecting will be implemented with the TracingSpi then we
> >> get a performance drop due to:
> >> - Transferring tracing context over the network.
> >> - Using ThreadLocal for spans
> >> - Converting primitives and objects to string and vice versa. (API
> >> supports only String-based tags and values)
> >> - Generating span objects
> >>
> > @Nikita Amelchev Could you please share numbers?
> >
> > Best regards,
> > Alexander
> >
> > пн, 7 дек. 2020 г. в 17:24, Nikolay Izhikov <nizhi...@apache.org>:
> >
> >> Hello, Nikita.
> >>
> >> Makes sense.
> >>
> >> I will take a look.
> >>
> >>> 7 дек. 2020 г., в 15:29, Nikita Amelchev <nsamelc...@gmail.com>
> >> написал(а):
> >>>
> >>> Hello, Igniters.
> >>>
> >>> I have implemented the profiling tool [1, 2]. It writes duration and
> >>> other parameters of user operations (scan, SQL query, transactions,
> >>> tasks, jobs, CQ, etc) to a local file. This info can be used in
> >>> various cases. The main goal is to build the performance report to
> >>> analyze the count and duration of user queries [3].
> >>>
> >>> We already have the tracing with similar functionality but I think
> >>> Ignite should have both tools - tracing and profiling.
> >>>
> >>> Because the tracing causes performance drop 52% [4] and can not be
> >>> used for collecting statistics about all queries in production
> >>> deployments. The performance drop of the profiling tool is less than
> >>> 2% and it can be used in production. I have benchmarked the tracing
> >>> and got the results:
> >>>
> >>> -2% when configured OpenCensusTracingSpi and all scopes disabled
> >>> -52% for TX scope (IgnitePutTxBenchmark)
> >>> -58% for SQL scope  (IgniteSqlQueryBenchmark)
> >>>
> >>> Such a performance drop is significant to not use the tracing in
> >> production.
> >>>
> >>> I have considered the possibility to reuse the tracing API. If
> >>> statistics collecting will be implemented with the TracingSpi then we
> >>> get a performance drop due to:
> >>> - Transferring tracing context over the network.
> >>> - Using ThreadLocal for spans
> >>> - Converting primitives and objects to string and vice versa. (API
> >>> supports only String-based tags and values)
> >>> - Generating span objects
> >>>
> >>> I have benchmarked implementations on the yardstick’s
> >>> IgniteGetBenchmark. The tracing context transferring over the network
> >>> was disabled. The results:
> >>> - Tracing API implementation - 8% performance drop.
> >>> - Proposed implementation - 2% performance drop.
> >>>
> >>> I think this is a significant drop and implementation with reuse
> >>> tracing API should not be used. The cluster profiling should have as
> >>> little performance drop as possible to be used in production. The
> >>> tracing will be used for the detailed investigation.
> >>>
> >>> WDYT?
> >>>
> >>> The tool is ready to be reviewed [3, 5].
> >>>
> >>> [1] https://issues.apache.org/jira/browse/IGNITE-12666
> >>> [2]
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool
> >>> [3] https://github.com/apache/ignite-extensions/pull/16
> >>> [4]
> >>
> https://issues.apache.org/jira/secure/attachment/13016636/Tracing%20benchmarks.docx
> >>> [5] https://github.com/apache/ignite/pull/7693
> >>>
> >>> ср, 24 июн. 2020 г. в 23:31, Saikat Maitra <saikat.mai...@gmail.com>:
> >>>>
> >>>> Hi Nikita,
> >>>>
> >>>> The changes in this PR looks good.
> >>>>
> >>>> https://github.com/apache/ignite-extensions/pull/16
> >>>>
> >>>> Regards,
> >>>> Saikat
> >>>>
> >>>> On Mon, Jun 22, 2020 at 12:03 PM Nikolay Izhikov <nizhi...@apache.org
> >
> >>>> wrote:
> >>>>
> >>>>> Hello, Igniters.
> >>>>>
> >>>>> I think that inside Ignite core we should name this feature as
> >>>>> «performance statistics»
> >>>>> We already have «cache statistics».
> >>>>> Data that is collected by performance statistics can be used not only
> >> for
> >>>>> profiling but to solve other tasks.
> >>>>>
> >>>>>
> >>>>>> 22 июня 2020 г., в 14:00, Nikita Amelchev <nsamelc...@gmail.com>
> >>>>> написал(а):
> >>>>>>
> >>>>>> Hi, guys.
> >>>>>>
> >>>>>> I have mentioned components under the MIT license in the LICENSE
> file.
> >>>>>>
> >>>>>> Saikat, I have fixed PR according to your suggestions. Thanks for
> >> taking
> >>>>> a look.
> >>>>>
> >>>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best wishes,
> >>> Amelchev Nikita
> >>
> >>
>
>

Reply via email to