[ 
https://issues.apache.org/jira/browse/IGNITE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16078100#comment-16078100
 ] 

ASF GitHub Bot commented on IGNITE-5473:
----------------------------------------

GitHub user AMashenkov opened a pull request:

    https://github.com/apache/ignite/pull/2262

    IGNITE-5473: Create ignite troubleshooting logger.

    Partial fix.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gridgain/apache-ignite ignite-5473

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/ignite/pull/2262.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2262
    
----
commit 6abe8ec92f016a18883332de2bc177fbab30a4c1
Author: Alexey Goncharuk <alexey.goncha...@gmail.com>
Date:   2017-06-14T18:37:54Z

    WIP.

commit 4ab6d52cfa2f0dda6170ad1dff80d4a42c2a0706
Author: Andrey V. Mashenkov <andrey.mashen...@gmail.com>
Date:   2017-06-30T11:45:18Z

    WIP.

----


> Create ignite troubleshooting logger
> ------------------------------------
>
>                 Key: IGNITE-5473
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5473
>             Project: Ignite
>          Issue Type: Improvement
>          Components: general
>    Affects Versions: 2.0
>            Reporter: Alexey Goncharuk
>            Priority: Critical
>              Labels: important, observability
>             Fix For: 2.2
>
>
> Currently, we have two extremes of logging - either INFO wich logs almost 
> nothing, or DEBUG, which will pollute logs with too verbose messages.
> We should create a 'troubleshooting' logger, which should be easily enabled 
> (via a system property, for example) and log all stability-critical node and 
> cluster events:
>  * Connection events (both communication and discovery), handshake status
>  * ALL ignored messages and skipped actions (even those we assume are safe to 
> ignore)
>  * Partition exchange stages and timings
>  * Verbose discovery state changes (this should make it easy to understand 
> the reason for 'Node has not been connected to the topology')
>  * Transaction failover stages and actions
>  * All unlogged exceptions
>  * Responses that took more than N milliseconds when in normal they should 
> return right away
>  * Long discovery SPI messages processing times
>  * Managed service deployment stages
>  * Marshaller mappings registration and notification
>  * Binary metadata registration and notification
>  * Continuous query registration / notification
> (add more)
> The amount of logging should be chosen accurately so that it would be safe to 
> enable this logger in production clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to