[jira] [Commented] (CASSANDRA-5483) Repair tracing

Ben Chan (JIRA) Mon, 10 Nov 2014 14:43:07 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14205451#comment-14205451
 ]


Ben Chan commented on CASSANDRA-5483:
-------------------------------------

Sorry; SMART and reallocated sectors. Which only really accounts for about 2-3 
days of the delay. But just to say that it wasn't merely a case of hangover 
from too much Halloween candy.

----

Updated https://github.com/usrbincc/cassandra/tree/5483-review (currently at 
commit 202a2e2e5e602).

Merges cleanly with trunk (at commit d286ac7d072fe), building and testing 
cleanly.

I decided to be a little opinionated and did some refactoring along the lines 
of my Oct 23 message.
- Used a TraceState#waitActivity function instead of TraceState#isDone 
(waitActivity gets closer to doing only what it says on the tin; makes it less 
hairy to comment).
- Moved all exponential backoff timeout code to StorageService#createQueryThread

In addition, I renamed TraceState#enableNotifications to 
TraceState#enableActivityNotification to attempt (naming is hard) to avoid 
confusion with TraceState#setNotificationHandle, which is entirely unrelated.

Note: beyond having made this opinionated edit, I'm not planning to be 
particularly opinionated about advocating for it. All of that code should 
eventually go away once there is some way to get notified about table updates 
instead of having to do all that messy polling.

Extra note: Cassandra triggers seem to be very close to what is needed, if only 
they could be specified to run on a given node (i.e. the node that is being 
repaired). The last time I checked on this, this wasn't possible.

----

Unfiltered traces:

- The extra traces are generic message send-receive traces that existed prior 
to this patch. They were originally there for query tracing, which benefits 
from more detailed tracing.
- These extra traces were filtered out for repair up until v16 of this patch. 
This means that any discussions of trace messages prior to that point are 
referring to the filtered traces.

But I can't say that they're doing any real harm. I mean, it's only 3x the 
traces (estimated), and not an order of magnitude or more.

It's probably fine as it is. I certainly can't unequivocally state that there's 
no use for those extra traces. Besides, extra information can always be 
filtered out at a higher level (assuming it's tagged appropriately).

> Repair tracing
> --------------
>
>                 Key: CASSANDRA-5483
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Yuki Morishita
>            Assignee: Ben Chan
>            Priority: Minor
>              Labels: repair
>             Fix For: 3.0
>
>         Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5483) Repair tracing

Reply via email to