[jira] [Comment Edited] (CASSANDRA-5483) Repair tracing

Ben Chan (JIRA) Wed, 11 Jun 2014 10:39:34 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027905#comment-14027905
 ]


Ben Chan edited comment on CASSANDRA-5483 at 6/11/14 5:37 PM:
--------------------------------------------------------------

I'm sorry I can't think this through very deeply at the moment, so please allow 
a little slack if I say something incorrect. I'm writing this during a small 
window of time (where I can't do anything else) in a crisis I'm dealing with.

bq. Why is command part of events instead of sessions?

It was a somewhat arbitrary design decision. I can move it over to sessions. 
The only real reason was historical (see the Mar 3 comment); it was a 
proof-of-concept that never got followed up upon until just now.

bq. Also: should use an enum internally. Logging as string representation is 
fine.

Just to be clear, you mean Java code should work with an enum, and the actual 
cql table column is fine as a string?

The code actually does use an enum (of sorts; not an Enum proper), the 
traceType. The traceType is passed to Tracing#getCommandName to look up the 
String for command name.

bq. It makes people grumpy when we break JMX signatures. Can we add a new 
overload instead, preserving the old? This should cut down on some of the code 
churn in StorageService as well.

I will admit that I didn't really consider the entire ecosystem of tools that 
use JMX to control Cassandra (though note that I did mention the JMX api change 
in a Mar 8 comment "... the server isn't going to work with old versions of 
nodetool. And a newer nodetool still won't work with older server versions.").

bq. It's a minor thing to get hung up on, but I'm not wild about all the work 
needed to propagate TTLs around. Is it really super important to persist repair 
traces much longer than queries? If so, what if we used a separate table and 
just allowed users to modify the default ttl? (The original trace code predates 
default TTLs, I think, or we would have made use of it there.)

I guess the question is how many different types of things (bootstrap, repair, 
query, decommission, ... anything else?) might eventually end up being traced. 
If n is small, then having n tables may be fine.

The logic was this (see Mar 1-3 discussion): Repair can take a long time, so 24 
hours may be too short of a ttl.

I recall reading about problematic repairs taking several days, which wouldn't 
mix well with a 24 hour ttl.

bq. Also have a nagging feeling that the notify/wait logic in StorageService + 
TraceState is more complex than necessary.

If there is guaranteed to only ever be one consumer of notifications at a time, 
then the updated v15 logic seems fine. But if there are ever two threads 
polling the same TraceState, then there could be dropped notifications. The 
problem (I believe) is that all consumers may not be in a wait state at the 
moment when notifyAll is signalled. This means a notification could be missed, 
right? I'm not experienced in Java concurrency, and this isn't the best time 
for me to slowly think things through, so it's quite possible I'm wrong here.

But it does seem reasonable there will only ever be one polling thread for any 
given tracing session, so the v15 code should work fine in that respect, as far 
as my cursory inspection goes.

Note, however, that the polling in this case is a heuristic only. Meaning that 
it's *likely* that an external trace happened somewhere around this time plus 
or minus (as far as I know, there is no way in Cassandra to be notified of cf 
updates). Which means that the actual trace may only arrive *after* the 
notification, meaning that for two notifications ~maxWait seconds apart, your 
logging of events might be maxWait seconds late:

{noformat}
time  action                result
----  ------                ------
   0  receive notification  no events found
  10  event A
1000  receive notification  sendNotification(event A)
{noformat}

This is why I had an exponential backoff. Because I wanted to poll with high 
frequency for a likely event, polling less and less often as the notification 
recedes into the past. There are, of course, endless tweaks possible to this 
basic algorithm. It depends upon what you can assume about the likely time 
distribution of the events hinted at by the notification.

{noformat}
time  action                result
----  ------                ------
   0  receive notification  no events found
   2  poll                  no events found
   4  poll                  no events found
   8  poll                  no events found
  10  event A
  16  poll                  sendNotification(event A)
  32  poll                  no events found
1000  receive notification  no events found
{noformat}

bq. I should note I've made no attempt to corroborate this behaviour is 
sensible; I've only simplified it.

Any feedback would be welcome. As I've said before, heuristics are messy. I 
talked about the reasoning behind my design decisions, and a possibility for an 
alternate implementation (with attendant tradeoffs) in a Mar 17 comment. I 
honestly thought I'd get more comments on it at the time, but it's possible the 
patch had already gotten into TL; DR territory even then.

---

Okay my short break from reality is over. Time to panic.

---

Edit: some more dead time (hurry up and wait). Thought some more about trace 
notifications.



was (Author: usrbincc):
I'm sorry I can't think this through very deeply at the moment, so please allow 
a little slack if I say something incorrect. I'm writing this during a small 
window of time (where I can't do anything else) in a crisis I'm dealing with.

bq. Why is command part of events instead of sessions?

It was a somewhat arbitrary design decision. I can move it over to sessions. 
The only real reason was historical (see the Mar 3 comment); it was a 
proof-of-concept that never got followed up upon until just now.

bq. Also: should use an enum internally. Logging as string representation is 
fine.

Just to be clear, you mean Java code should work with an enum, and the actual 
cql table column is fine as a string?

The code actually does use an enum (of sorts; not an Enum proper), the 
traceType. The traceType is passed to Tracing#getCommandName to look up the 
String for command name.

bq. It makes people grumpy when we break JMX signatures. Can we add a new 
overload instead, preserving the old? This should cut down on some of the code 
churn in StorageService as well.

I will admit that I didn't really consider the entire ecosystem of tools that 
use JMX to control Cassandra (though note that I did mention the JMX api change 
in a Mar 8 comment "... the server isn't going to work with old versions of 
nodetool. And a newer nodetool still won't work with older server versions.").

bq. It's a minor thing to get hung up on, but I'm not wild about all the work 
needed to propagate TTLs around. Is it really super important to persist repair 
traces much longer than queries? If so, what if we used a separate table and 
just allowed users to modify the default ttl? (The original trace code predates 
default TTLs, I think, or we would have made use of it there.)

I guess the question is how many different types of things (bootstrap, repair, 
query, decommission, ... anything else?) might eventually end up being traced. 
If n is small, then having n tables may be fine.

The logic was this (see Mar 1-3 discussion): Repair can take a long time, so 24 
hours may be too short of a ttl.

I recall reading about problematic repairs taking several days, which wouldn't 
mix well with a 24 hour ttl.

bq. Also have a nagging feeling that the notify/wait logic in StorageService + 
TraceState is more complex than necessary.

If there is guaranteed to only be one consumer of notifications at a time, then 
the updated v15 logic seems fine. But if there are ever two traces going on 
(either of different or the same type; are you allowed to have two simultaneous 
repairs of different keyspaces?) which require update notifications, then there 
could be dropped notifications. The problem (I believe) is that all consumers 
may not be in a wait state at the moment when notifyAll is signalled. This 
means a notification could be missed, right? I'm not experienced in Java 
concurrency, and this isn't the best time for me to slowly think things 
through, so it's quite possible I'm wrong here.

However, if you can be completely sure there will never be concurrent repair 
traces happening on the same node, or any other trace types (whatever types are 
added in the future) that require update notifications in order to implement 
on-the-fly reporting, then that issue is moot, and v15 should work fine, as far 
as my cursory inspection goes.

bq. I should note I've made no attempt to corroborate this behaviour is 
sensible; I've only simplified it.

Any feedback would be welcome. As I've said before, heuristics are messy. I 
talked about the reasoning behind my design decisions, and a possibility for an 
alternate implementation (with attendant tradeoffs) in a Mar 17 comment. I 
honestly thought I'd get more comments on it at the time, but it's possible the 
patch had already gotten into TL; DR territory even then.

---

Okay my short break from reality is over. Time to panic.


> Repair tracing
> --------------
>
>                 Key: CASSANDRA-5483
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Yuki Morishita
>            Assignee: Ben Chan
>            Priority: Minor
>              Labels: repair
>         Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 5483-v15.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> [email protected], 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-5483) Repair tracing

Reply via email to