Ok from my side. Few more details about tracing spi updates, based on mentioned above discussion with Nikolay and Nikita.
Tracing provides enough data for a performance profiling tool, actually only root spans are required. However, according to Nikinta, root-span-tracing has a 7-8% performance drop in comparison to 1-2% performance drop of the performance profiling tool. It's the main reason to have given tool as is right now. In order to reuse TracingSPI for a profiling tool internals, few modifications should be made to increase tracing performance: - Add support for non-strings tags and log points: primitives, etc. - Add ability to postpone adding span tags and log points to the very and of span tree creation. - Probably some sort of tags caching could also help. Best regards, Alexander пн, 14 дек. 2020 г. в 12:48, Nikolay Izhikov <nizhi...@apache.org>: > Hello, Igniters. > > We discussed this feature privately with Alexander and Nikita. > Here are the results we want to share with the community: > > 0. In the end, both, performance statistic tool and tracing should use the > same API. > 1. We should improve the Tracing API, so it able to be used for gathering > information about all operations without a significant performance drop. > > I propose to go as follows: > > 1. Merge current PR as is after final review. My intention is to provide a > tool for users that can be used in the real-world production environment. > 2. Improve the Tracing API. > 3. Combine both tools under the same API. > > > 14 дек. 2020 г., в 10:42, Alexander Lapin <lapin1...@gmail.com> > написал(а): > > > > Hello Igniters, > > > > Because the tracing causes performance drop 52% [4] and can not be > >> used for collecting statistics about all queries in production > >> deployments. The performance drop of the profiling tool is less than > >> 2% and it can be used in production. I have benchmarked the tracing > >> and got the results: > >> > >> -2% when configured OpenCensusTracingSpi and all scopes disabled > >> -52% for TX scope (IgnitePutTxBenchmark) > >> -58% for SQL scope (IgniteSqlQueryBenchmark) > >> > >> Such a performance drop is significant to not use the tracing in > >> production. > >> > > We've rerun tracing benchmarks based on more realistic scenarios and got > a > > 10-15% performance drop in case of sampling-rate 1 (all transactions were > > traced). More realistic scenarios means that we don't test tracing > > performance if the system is in overdraft state but add some sort of > micro > > throttling (1 millisecond) between operations, transactions in our case. > > *IgnitePutTxBenchmark* > > > > Green: Case 1: NoopTracingSpi > > > > Blue: Case 2: OpenCensusTracingSpi (disabled) > > > > Red: Case 3: OpenCensusTracingSpi, --scope TX --sampling-rate 0.1 > > > > Black: Case 5: *ControlCenter* + OpenCensusTracingSpi, --scope TX > > --sampling-rate 0.1 > > > > Violet: Case 4: OpenCensusTracingSpi, --scope TX --sampling-rate 1 > > Yellow: Case 6: ControlCenter + OpenCensusTracingSpi, --scope TX > > --sampling-rate > > > > I have considered the possibility to reuse the tracing API. If > >> statistics collecting will be implemented with the TracingSpi then we > >> get a performance drop due to: > >> - Transferring tracing context over the network. > >> - Using ThreadLocal for spans > >> - Converting primitives and objects to string and vice versa. (API > >> supports only String-based tags and values) > >> - Generating span objects > >> > > @Nikita Amelchev Could you please share numbers? > > > > Best regards, > > Alexander > > > > пн, 7 дек. 2020 г. в 17:24, Nikolay Izhikov <nizhi...@apache.org>: > > > >> Hello, Nikita. > >> > >> Makes sense. > >> > >> I will take a look. > >> > >>> 7 дек. 2020 г., в 15:29, Nikita Amelchev <nsamelc...@gmail.com> > >> написал(а): > >>> > >>> Hello, Igniters. > >>> > >>> I have implemented the profiling tool [1, 2]. It writes duration and > >>> other parameters of user operations (scan, SQL query, transactions, > >>> tasks, jobs, CQ, etc) to a local file. This info can be used in > >>> various cases. The main goal is to build the performance report to > >>> analyze the count and duration of user queries [3]. > >>> > >>> We already have the tracing with similar functionality but I think > >>> Ignite should have both tools - tracing and profiling. > >>> > >>> Because the tracing causes performance drop 52% [4] and can not be > >>> used for collecting statistics about all queries in production > >>> deployments. The performance drop of the profiling tool is less than > >>> 2% and it can be used in production. I have benchmarked the tracing > >>> and got the results: > >>> > >>> -2% when configured OpenCensusTracingSpi and all scopes disabled > >>> -52% for TX scope (IgnitePutTxBenchmark) > >>> -58% for SQL scope (IgniteSqlQueryBenchmark) > >>> > >>> Such a performance drop is significant to not use the tracing in > >> production. > >>> > >>> I have considered the possibility to reuse the tracing API. If > >>> statistics collecting will be implemented with the TracingSpi then we > >>> get a performance drop due to: > >>> - Transferring tracing context over the network. > >>> - Using ThreadLocal for spans > >>> - Converting primitives and objects to string and vice versa. (API > >>> supports only String-based tags and values) > >>> - Generating span objects > >>> > >>> I have benchmarked implementations on the yardstick’s > >>> IgniteGetBenchmark. The tracing context transferring over the network > >>> was disabled. The results: > >>> - Tracing API implementation - 8% performance drop. > >>> - Proposed implementation - 2% performance drop. > >>> > >>> I think this is a significant drop and implementation with reuse > >>> tracing API should not be used. The cluster profiling should have as > >>> little performance drop as possible to be used in production. The > >>> tracing will be used for the detailed investigation. > >>> > >>> WDYT? > >>> > >>> The tool is ready to be reviewed [3, 5]. > >>> > >>> [1] https://issues.apache.org/jira/browse/IGNITE-12666 > >>> [2] > >> > https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool > >>> [3] https://github.com/apache/ignite-extensions/pull/16 > >>> [4] > >> > https://issues.apache.org/jira/secure/attachment/13016636/Tracing%20benchmarks.docx > >>> [5] https://github.com/apache/ignite/pull/7693 > >>> > >>> ср, 24 июн. 2020 г. в 23:31, Saikat Maitra <saikat.mai...@gmail.com>: > >>>> > >>>> Hi Nikita, > >>>> > >>>> The changes in this PR looks good. > >>>> > >>>> https://github.com/apache/ignite-extensions/pull/16 > >>>> > >>>> Regards, > >>>> Saikat > >>>> > >>>> On Mon, Jun 22, 2020 at 12:03 PM Nikolay Izhikov <nizhi...@apache.org > > > >>>> wrote: > >>>> > >>>>> Hello, Igniters. > >>>>> > >>>>> I think that inside Ignite core we should name this feature as > >>>>> «performance statistics» > >>>>> We already have «cache statistics». > >>>>> Data that is collected by performance statistics can be used not only > >> for > >>>>> profiling but to solve other tasks. > >>>>> > >>>>> > >>>>>> 22 июня 2020 г., в 14:00, Nikita Amelchev <nsamelc...@gmail.com> > >>>>> написал(а): > >>>>>> > >>>>>> Hi, guys. > >>>>>> > >>>>>> I have mentioned components under the MIT license in the LICENSE > file. > >>>>>> > >>>>>> Saikat, I have fixed PR according to your suggestions. Thanks for > >> taking > >>>>> a look. > >>>>> > >>>>> > >>> > >>> > >>> > >>> -- > >>> Best wishes, > >>> Amelchev Nikita > >> > >> > >