[ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954746#comment-13954746
 ] 

Ben Chan edited comment on CASSANDRA-5483 at 3/30/14 5:24 PM:
--------------------------------------------------------------

Ouch. After running a before and after test, I'm 99% sure this was the problem. 
There was some obviously wrong code in {{waitActivity}} (an older version used 
0 instead of -1 to signify "done"; I apparently forgot to update everything 
when I changed this).

Sorry about removing the previous patch. It didn't have the correct {{git diff 
-p}} parameters.

For convenience:

{noformat}
W=https://issues.apache.org/jira/secure/attachment
for url in \
  $W/12637720/5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch
do [ -e $(basename $url) ] || curl -sO $url; done &&
git apply 5483-v09-*.patch &&
ant clean && ant
{noformat}

Here's what I used to test with; I get slower and slower repairs, with a hang 
on the 5th repair with the "before" code, and consistent 10-second repairs with 
the "after" code.

{noformat}
cat > ccm-nodetool <<"EE"
#!/bin/sh

# ccm doesn't let us call nodetool with options, but we still need to get the
# host and port config from it.
read -r JMXGET <<E
/jmx_port/{p=\$2;} \
/binary/{split(\$2,a,/\047/);h=a[2];} \
END{printf("bin/nodetool -h %s -p %s\n",h,p);}
E

NODETOOL=$(ccm $1 show | awk -F= "$JMXGET")
shift
$NODETOOL "$@"
EE

chmod +x ccm-nodetool

for x in $(seq 3); do 
  for y in $(seq 2); do
    echo repair node$x \#$y
    ./ccm-nodetool node$x repair -tr
  done
done
{noformat}

edit: minor awk code cleanup, properly nest heredocs.



was (Author: usrbincc):
Ouch. After running a before and after test, I'm 99% sure this was the problem. 
There was some obviously wrong code in {{waitActivity}} (an older version used 
0 instead of -1 to signify "done"; I apparently forgot to update everything 
when I changed this).

Sorry about removing the previous patch. It didn't have the correct {{git diff 
-p}} parameters.

For convenience:

{noformat}
W=https://issues.apache.org/jira/secure/attachment
for url in \
  $W/12637720/5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch
do [ -e $(basename $url) ] || curl -sO $url; done &&
git apply 5483-v09-*.patch &&
ant clean && ant
{noformat}

Here's what I used to test with; I get slower and slower repairs, with a hang 
on the 5th repair with the "before" code, and consistent 10-second repairs with 
the "after" code.

{noformat}
cat > ccm-nodetool <<"E"
#!/bin/sh

# ccm doesn't let us call nodetool with options, but we still need to get the
# host and port config from it.
read -r JMXGET <<E
/jmx_port/{p=\$2;} \
/binary/{split(\$2,a,/\047/);h=a[2];} \
END{printf("bin/nodetool -h %s -p %s\n",h,p,cmd);}
E

NODETOOL=$(ccm $1 show | awk -F= "$JMXGET")
shift
$NODETOOL "$@"
E
chmod +x ccm-nodetool
for x in $(seq 3); do 
  for y in $(seq 2); do
    echo repair node$x \#$y
    ./ccm-nodetool node$x repair -tr
  done
done
{noformat}


> Repair tracing
> --------------
>
>                 Key: CASSANDRA-5483
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Yuki Morishita
>            Assignee: Ben Chan
>            Priority: Minor
>              Labels: repair
>         Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, ccm-repair-test, 
> cqlsh-left-justify-text-columns.patch, prerepair-vs-postbuggedrepair.diff, 
> test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> tr...@8ebeee1-5483-v01-002-simple-repair-tracing.txt, 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to