[ 
https://issues.apache.org/jira/browse/CASSANDRA-13460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17416781#comment-17416781
 ] 

Stefan Miklosovic commented on CASSANDRA-13460:
-----------------------------------------------

Hi [~mck], I have implemented diagnostic events logging into Chronicle queues 
in this branch (1), it is quite a big patch and it is not finished yet fully 
but I think this is enough for the first evaluation and to discuss this earlier 
to avoid any communication and expectation issues.

The main "work" is done in DiagnosticEventService and 
DiagnosticEventPersistence.. DiagnosticEventPersistence is based on "consumers" 
which are used for subscription. Implementation-wise, before this patch, there 
was already a consumer which was putting everything into memory. I implement 
diagnostic event logger on Chronicle queues in such a way that it is just 
another consumer but by consuming these events we are putting them into 
Chronicle queue instead to some in-memory structures. Upon disabling this 
diagnostic logger, this consumer is just unsubscribed.

>From user's point of view, diagnostic events functionality has to be enabled 
>in order to be able to enable diagnostic logging. Logging into Chronicle 
>queues is not possible if diagnostic framework is disabled. On the other hand, 
>diagnostic logging into Chronicle queues might be enabled and disabled on 
>demand, similarly as it is done for audit. However, regardless of diagnostic 
>logging into Chronicle queues being enabled on disabled, they are always put 
>into the memory as it was before. There is a JMX method via which a user may 
>read these events on demand but they can not be read on demand  from arbitrary 
>position from Chronicle queue if they are written to disk. Hence user can 
>still inspect these events on the fly from in-memory buffer, as it was before, 
>but they are all persisted to disk if he choose so.

I have also extracted the common parts of BinLogger into separate abstract 
class and I created org.apache.cassandra.log package where it is located. Audit 
logging and Diagnostic logging is very similar and I found myself repeated a 
lot of code all over again in order to implement this so I simplified it a lot. 
I have also extracted commont stuff for options too.

I have also implemented diagnosticlogviewer tool, similar to auditlogviewer - 
my question here is if we want to also make some "generic" tool which would 
audit and diagnostic viewers extend because right now it is basically the same 
stuff except few changes which are mostly cosmetic. Hence I would like to know 
if you think it makes sense to try to extract common parts.

I have also implemented nodetool commands for disable, enable diagnostic 
logging and for its status, similar to audit log.

I would love to hear your feedback here, especially about the overall 
high-level implementation I did here so I am not doing something which is might 
be eventually rejected because of different expectations.

(1) https://github.com/instaclustr/cassandra/tree/CASSANDRA-13460-2

> Diag. Events: Add local persistency
> -----------------------------------
>
>                 Key: CASSANDRA-13460
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13460
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Legacy/Observability
>            Reporter: Stefan Podkowinski
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 4.x
>
>         Attachments: 0001-Add-persistency-for-events-to-system-keyspace.patch
>
>
> Some generated events will be rather less frequent but very useful for 
> retroactive troubleshooting. E.g. all events related to bootstraping and 
> gossip would probably be worth saving, as they might provide valuable 
> insights and will consume very little resources in low quantities. Imaging if 
> we could e.g. in case of CASSANDRA-13348 just ask the user to -run a tool 
> like {{./bin/diagdump BootstrapEvent}} on each host, to get us a detailed log 
> of all relevant events-  provide a dump of all events as described in the 
> [documentation|https://github.com/spodkowinski/cassandra/blob/WIP-13460/doc/source/operating/diag_events.rst].
>  
> This could be done by saving events white-listed in cassandra.yaml to a local 
> table. Maybe using a TTL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to