[jira] [Commented] (CASSANDRA-5483) Repair tracing

Ben Chan (JIRA) Thu, 12 Jun 2014 17:13:28 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030076#comment-14030076
 ]


Ben Chan commented on CASSANDRA-5483:
-------------------------------------

{quote}
the exponential backoff is maintained, it's only defined more simply (as far as 
I could read you doubled the wait time every second timeout, which is what 
happens now also).
{quote}

Sorry, I was speed-reading the code; I saw {{wait();}} (i.e. with no timeout), 
so I immediately knew there couldn't be a timeout, exponential or otherwise.

I mostly ignored what wouldn't affect execution (stupid mental code optimizer).

If you have {{wait(wait);}} and reset {{wait}} (the {{long}}) to 
{{minWaitMillis}} in {{if (hasNotifications)}} then that should be about right. 
I've attached a patch that does this 
([^5483-v15-02-Hook-up-exponential-backoff-functionality.patch]).

I apologize for reading-code-while-distracted, thus causing me to go on about 
the exponential backoff. I'll try to learn from this experience.

---

I liked your exponential backoff calculation code; I took the liberty of 
tweaking it to double exactly in 
[^5483-v15-03-Exact-doubling-for-exponential-backoff.patch].

---

{quote}
As the code is defined now we explicitly create a repair session and spawn a 
specific thread to process it, so we have a guarantee there's only one thread. 
It doesn't matter if the consumer is in its method when notifyAll is called; if 
it isn't it will receive the notification as soon as it next enters the method.
{quote}

I agree with you.

When I said "only ever" before, I was talking about future code, not the code 
in its current state. Had I considered the alternate interpretation of "edge 
case in the current code", I would have clarified better, but there we are.

Just to reiterate, I'm reasonably cognizant that for the code in its current 
state, only a single thread is accessing the notification for a given 
TraceState, and how this allows for a simpler implementation.

All I was trying to say in my previous comment was that notification upon 
activity is a general capability, and in future code, should there ever be a 
use case for multiple threads accessing the same activity notification, then 
the implementation will have to be changed. (This also holds, should there ever 
be a use case that requires something other than exponential backoff.)

Our differences turn out to be philosophical only, and I'm not all that 
attached to mine.

I've already accepted [^5483-v15.patch] and have built my fixes for (some of) 
the Jun 10 comments on top of it. I need some more clarification before I can 
be sure what is required in order to resolve the remaining issues.

---

Test code, modified and simplified because of the updated system_traces.* 
schemas (I may upload an updated {{ccm-repair-test}} if we have many more patch 
iterations).

{noformat}
read -r JMXGET <<E
/jmx_port/{p=\$2;} \
/binary/{split(\$2,a,/\047/);h=a[2];} \
END{printf("bin/nodetool -h %s -p %s\n",h,p,cmd);}
E

ccm_nodetool() {
  local N=$1
  shift
  $(ccm $N show | awk -F= "$JMXGET") "$@"
}

# git checkout 85956ae
W=https://issues.apache.org/jira/secure/attachment
for url in \
  $W/12649785/5483-v15.patch \
  $W/12650175/5483-v15-02-Hook-up-exponential-backoff-functionality.patch \
  $W/12650174/5483-v15-03-Exact-doubling-for-exponential-backoff.patch \
  $W/12650173/5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch \
  $W/12650172/5483-v15-05-Move-command-column-to-system_traces.sessions.patch
do
  { [ -e $(basename $url) ] || curl -sO $url; } && git apply $(basename $url)
done &&
ant clean && ant &&
./ccm-repair-test -kR &&
ccm node1 stop &&
ccm node1 clear &&
ccm node1 start &&
ccm_nodetool node1 repair -tr
{noformat}


> Repair tracing
> --------------
>
>                 Key: CASSANDRA-5483
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Yuki Morishita
>            Assignee: Ben Chan
>            Priority: Minor
>              Labels: repair
>         Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> [email protected], 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5483) Repair tracing

Reply via email to