[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Chan updated CASSANDRA-5483:
--------------------------------

    Attachment: cqlsh-left-justify-text-columns.patch

Public TODO list; please comment if any should be not-TODO:
* Trace streaming and/or lack thereof (I think hooking {{Differencer#run}} and 
related threads should be enough).
* Maybe exclude {{system_traces}} from repair if a repair trace is going on. 
There seems to be a feedback loop triggering multiple repair commands otherwise.
* Maybe add a placeholder row with a null {{duration}} for ongoing repair 
sessions. Makes it easier to find the {{session_id}} for queries. Update with 
the final duration at the end.
* Populate {{started_at}}, {{request}}, etc in {{system_traces.sessions}}.
* Send the {{session_id}} back to nodetool.
* Shorten/simplify trace messages.
* Verbose option; dump all traces to nodetool.

Implementation thoughts follow; please warn of potential problems.

---

Verbose option:

To send local traces back to nodetool, adding a parallel {{sendNotification}} 
is easy enough. Getting the remote traces seems like it would involve 
monitoring updates to {{system_traces.events}}.

At first I thought triggers, but the docs say that triggers run on the 
coordinator node, which is not necessarily the node you're repairing. So that 
leaves polling the table with heuristics that are hopefully good enough to 
reduce the amount of extra work.

---

Simplify trace messages:

Skipping to the point of difference:

It looks like each sub-RepairSession has a unique session id (a timeuuid but 
different from either {{session_id}} or {{event_id}}). Here is a section of the 
select above aligned and simplified to increase SNR. The redacted parts are 
identical.
{noformat}
[repair #fedc3790-...] Received merkle tree for events from /127.0.0.1
[repair #fef40550-...] new session: will sync /127.0.0.1, /127.0.0.2 on range 
(3074457345618258602,-9223372036854775808] for system_traces.[sessions, events]
[repair #fef40550-...] requesting merkle trees for sessions (to [/127.0.0.2, 
/127.0.0.1])
[repair #fedc3790-...] session completed successfully
[repair #fef40550-...] Sending completed merkle tree to /127.0.0.1 for 
system_traces/sessions
{noformat}
In the example above, you can see some overlap in the repair session traces, so 
the sub-session_id (so to speak) has some use in distinguishing these. Since 
this sub-session_id only has to be unique for a particular repair session, 
maybe it would be worth it to map each one to a small integer?

For convenience, I attached a small, not-very-pretty patch that left-justifies 
columns of type text in cqlsh (makes it easier to read the traces).

---

Trace streaming:

Is there a simple way to create a situation where a repair requires streaming? 
Here is what I'm currently doing, but it doesn't work.

{noformat}
#/bin/sh
ccm create $(mktemp -u 5483-XXX) &&
ccm populate -n 3 &&
ccm updateconf --no-hinted-handoff &&
ccm start &&
ccm node1 cqlsh <<"E"
CREATE SCHEMA s1
WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };

CREATE TABLE s1.users (
  user_id varchar PRIMARY KEY,
  first varchar,
  last varchar,
  age int)
WITH read_repair_chance = 0.0;

INSERT INTO s1.users (user_id, first, last, age)
  VALUES ('jsmith', 'John', 'Smith', 42);
E

ccm node1 stop &&
python - <<"E" | ccm node2 cqlsh
import random as r
fs=["John","Art","Skip","Doug","Koala"]
ls=["Jackson","Jacobs","Jefferson","Smythe"]
for (f, l) in [(f,l) for f in fs for l in ls]:
  print (
    "insert into s1.users (user_id, age, first, last) "
    "values('%s', %d, '%s', '%s');"
  ) % ((f[0]+l).lower(), r.randint(10,100), f, l)
E
ccm node2 cqlsh <<"E"
select count(*) from s1.users;
E
ccm node1 start
ccm node1 cqlsh <<"E"
select count(*) from s1.users;
E
nodetool -p $(ccm node1 show | awk -F= '/jmx_port/{print $2}') repair -tr s1
{noformat}

The problem is that despite disabling hinted handoff and setting 
{{read_repair_chance}} to 0, the endpoints are still reported as consistent in 
{{Differencer#run}}. Yet node1 is clearly missing some rows prior to the 
repair, and has them at the end. Somehow the streaming repair is getting done 
somewhere other than {{Differencer#run}}. Is some sort of handoff still being 
done somewhere? I'm sure there is something simple, but I'm missing it.


> Repair tracing
> --------------
>
>                 Key: CASSANDRA-5483
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Yuki Morishita
>            Assignee: Ben Chan
>            Priority: Minor
>              Labels: repair
>         Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> [email protected], 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to