[
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918251#comment-13918251
]
Lyuben Todorov commented on CASSANDRA-5483:
-------------------------------------------
Are the latest 3 patches supposed to be incrementally added onto
{{trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch}}
and
{{trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch}}?
As in
{noformat}
1 - apply
trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch
2 - apply
trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch
3 - apply one of the three latest patches (v3, v4 or v5)
{noformat}
v5 Does a lot of refactoring that I think is outside the scope of this ticket
(but might be worth it's own ticket as the idea is good), so my vote is for v3,
but I'm getting a NoSuchMethod exception, can you post a branch with all the
patches added onto trunk (for v3)?
The exception:
{noformat}
java.lang.NoSuchMethodException: forceRepairAsync(java.lang.String, boolean,
java.util.Collection, java.util.Collection, boolean, boolean, boolean,
[Ljava.lang.String;)
at
com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:168)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:135)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
at
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
at
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
at
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
at
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at sun.rmi.transport.Transport$1.run(Transport.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{noformat}
bq. I am thinking of calling the new table something generic like
system_traces.trace_logs. I also assume, that like system_traces.events
I'd say events is pretty generic, the new table should show that the traces
aren't query related like in events. If we are going to add new tables to the
trace CF it's worth thinking about refactoring events into something more
specific and adding new tables with names that carry meaning. Another possible
solution is to add a "command" field to system_traces.events where it can allow
users to retrieve data about specific events, e.g. [~jbellis] WDYT?
{noformat}
SELECT * FROM system_traces.events;
session_id | ... | thread | command
--------------------------------------+ ... +-----------+---------
09d48eb0-a2f1-11e3-9f04-7d9e3709bf93 | ... | Thrift:1 | REPAIR
29084f90-a2f3-11e3-9f04-7d9e3709bf93 | ... | Thrift:1 | QUERY
(2 rows)
SELECT * FROM system_traces.events WHERE command='REPAIR';
session_id | ... | thread | command
--------------------------------------+ ... +-----------+---------
09d48eb0-a2f1-11e3-9f04-7d9e3709bf93 | ... | Thrift:1 | REPAIR
(1 rows)
{noformat}
bq. the rows in this table should expire, though perhaps not as fast as 24
hours.
+1, repairs can take a very long time so this should be configurable with the
default perhaps being around 30 days, but with incremental repairs (in 2.1) it
will end up logging a lot of data, still a better choice than users doing
regular repairs missing out on information.
bq. One last thing I wanted to ask is about the possibility of trace log
levels. What is the minimum amount of trace log information you would find
useful, the next amount, and so on? Should it just follow the loglevel?
Trace is supposed to give as much info as possible and tends to be used for
debugging problems, e.g. slow queries or in this case, repairs taking too long,
so its important to include useful information but not spam logs with every
detail. Different log levels might be useful, but in this ticket the aim is to
track progress of repairs, so logging each repair command's completion should
be sufficient.
> Repair tracing
> --------------
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
> Issue Type: Improvement
> Components: Tools
> Reporter: Yuki Morishita
> Assignee: Ben Chan
> Priority: Minor
> Labels: repair
> Attachments: test-5483-system_traces-events.txt,
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch,
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
> tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt,
> [email protected],
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch,
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
> v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing
> stores traces to system keyspace. With it, you don't have to lookup each log
> file to see what was the status and how it performed the repair you invoked.
> Instead, you can query the repair log with session ID to see the state and
> stats of all nodes involved in that repair session.
--
This message was sent by Atlassian JIRA
(v6.2#6252)