[jira] [Commented] (CASSANDRA-5483) Repair tracing

Joshua McKenzie (JIRA) Wed, 22 Oct 2014 12:29:54 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180384#comment-14180384
 ]


Joshua McKenzie commented on CASSANDRA-5483:
--------------------------------------------

Initial observations / thoughts:

90 days on default repair trace TTL seems long to me, especially w/incremental 
being new default.

DebuggableThreadPoolExecutor:
- 'createWithCachedTheadpool' is misspelled -> needs an r in Thread
- Do we intend to create w/Integer.MAX_VALUE as 'size' on this method?  If so, 
name should reflect it like other methods in file.

MessageOut:
- traceParameters isn't used elsewhere so logic can be nested inside 
constructor where it's called and remove unnecessary method call indirection.

OutboundTcpConnection:
- Why do we assume that if TraceType is null the type is QUERY?  We should 
probably be explicitly setting that for all tracing types going forward if we 
plan on expanding the use of the enum

StorageService:
- nit: declaring queryThread in createRepairTask but aren't doing anything with 
it -> we should just call the method w/out declaration if we don't intend to 
use it
- usage of bitwise ops to toggle si for indexing into seen array is 
unnecessarily complex.  A distinct 0/1 and toggle is more clear - there's no 
need for inline index flipping w/bitwise ops as it's not gaining us any 
performance or code clarity.
- nit: If not intending to use Object result from allSessions.get(), just 
remove the declaration and call the method
- Swallowing Throwable on line 2828 w/out any logging or comment as to why is 
unclear.  Using exception handling as control flow is step back from the 
sameThreadExecutor error handling approach taken prior as we lose naming and 
intention information.  Does this gain us something that I'm not aware of?

TraceState
- Right now isDone is only called by the query thread but there's nothing in 
the method signature or comments of that method documenting the 
doubling-every-second-run algorithm it's using.
- the implementation of isDone looks like a premature optimization.  I would 
prefer a more clear implementation of 'double every second run' than:
{code}
            else if (haveWaited)
            {
                wait = Math.min((wait ^ 1) << (wait & 1), maxWaitMillis);
                return false;
            }
            else
            {
                haveWaited = true;
                try
                {
                    wait(wait & ~1);
                }
{code}

Now that I'm more familiar with the changes here I'm going to need to re-read 
the notes on this ticket and test out / inspect output.  The above is just from 
my first read-through of the code yesterday.

> Repair tracing
> --------------
>
>                 Key: CASSANDRA-5483
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Yuki Morishita
>            Assignee: Ben Chan
>            Priority: Minor
>              Labels: repair
>             Fix For: 3.0
>
>         Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> [email protected], 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5483) Repair tracing

Reply via email to