[ 
https://issues.apache.org/jira/browse/IGNITE-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Goncharuk updated IGNITE-5473:
-------------------------------------
    Description: 
Currently, we have two extremes of logging - either INFO wich logs almost 
nothing, or DEBUG, which will pollute logs with too verbose messages.

We should create a 'troubleshooting' logger, which should be easily enabled 
(via a system property, for example) and log all stability-critical node and 
cluster events:
 * Connection events (both communication and discovery), handshake status
 * ALL ignored messages and skipped actions (even those we assume are safe to 
ignore)
 * Partition exchange stages and timings
 * Verbose discovery state changes (this should make it easy to understand the 
reason for 'Node has not been connected to the topology')
 * Transaction failover stages and actions
 * All unlogged exceptions
 * Responses that took more than N milliseconds when in normal they should 
return right away
 * Long discovery SPI messages processing times
 * Managed service deployment stages
(add more)

The amount of logging should be chosen accurately so that it would be safe to 
enable this logger in production clusters.

  was:
Currently, we have two extremes of logging - either INFO wich logs almost 
nothing, or DEBUG, which will pollute logs with too verbose messages.

We should create a 'troubleshooting' logger, which should be easily enabled 
(via a system property, for example) and log all stability-critical node and 
cluster events:
 * Connection events (both communication and discovery), handshake status
 * ALL ignored messages and skipped actions (even those we assume are safe to 
ignore)
 * Partition exchange stages and timings
 * Verbose discovery state changes (this should make it easy to understand the 
reason for 'Node has not been connected to the topology')
 * Transaction failover stages and actions
 * All unlogged exceptions
 * Responses that took more than N milliseconds when in normal they should 
return right away
(add more)

The amount of logging should be chosen accurately so that it would be safe to 
enable this logger in production clusters.


> Create ignite troubleshooting logger
> ------------------------------------
>
>                 Key: IGNITE-5473
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5473
>             Project: Ignite
>          Issue Type: Improvement
>          Components: general
>    Affects Versions: 2.0
>            Reporter: Alexey Goncharuk
>            Priority: Critical
>              Labels: important
>             Fix For: 2.2
>
>
> Currently, we have two extremes of logging - either INFO wich logs almost 
> nothing, or DEBUG, which will pollute logs with too verbose messages.
> We should create a 'troubleshooting' logger, which should be easily enabled 
> (via a system property, for example) and log all stability-critical node and 
> cluster events:
>  * Connection events (both communication and discovery), handshake status
>  * ALL ignored messages and skipped actions (even those we assume are safe to 
> ignore)
>  * Partition exchange stages and timings
>  * Verbose discovery state changes (this should make it easy to understand 
> the reason for 'Node has not been connected to the topology')
>  * Transaction failover stages and actions
>  * All unlogged exceptions
>  * Responses that took more than N milliseconds when in normal they should 
> return right away
>  * Long discovery SPI messages processing times
>  * Managed service deployment stages
> (add more)
> The amount of logging should be chosen accurately so that it would be safe to 
> enable this logger in production clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to