[
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180384#comment-14180384
]
Joshua McKenzie commented on CASSANDRA-5483:
--------------------------------------------
Initial observations / thoughts:
90 days on default repair trace TTL seems long to me, especially w/incremental
being new default.
DebuggableThreadPoolExecutor:
- 'createWithCachedTheadpool' is misspelled -> needs an r in Thread
- Do we intend to create w/Integer.MAX_VALUE as 'size' on this method? If so,
name should reflect it like other methods in file.
MessageOut:
- traceParameters isn't used elsewhere so logic can be nested inside
constructor where it's called and remove unnecessary method call indirection.
OutboundTcpConnection:
- Why do we assume that if TraceType is null the type is QUERY? We should
probably be explicitly setting that for all tracing types going forward if we
plan on expanding the use of the enum
StorageService:
- nit: declaring queryThread in createRepairTask but aren't doing anything with
it -> we should just call the method w/out declaration if we don't intend to
use it
- usage of bitwise ops to toggle si for indexing into seen array is
unnecessarily complex. A distinct 0/1 and toggle is more clear - there's no
need for inline index flipping w/bitwise ops as it's not gaining us any
performance or code clarity.
- nit: If not intending to use Object result from allSessions.get(), just
remove the declaration and call the method
- Swallowing Throwable on line 2828 w/out any logging or comment as to why is
unclear. Using exception handling as control flow is step back from the
sameThreadExecutor error handling approach taken prior as we lose naming and
intention information. Does this gain us something that I'm not aware of?
TraceState
- Right now isDone is only called by the query thread but there's nothing in
the method signature or comments of that method documenting the
doubling-every-second-run algorithm it's using.
- the implementation of isDone looks like a premature optimization. I would
prefer a more clear implementation of 'double every second run' than:
{code}
else if (haveWaited)
{
wait = Math.min((wait ^ 1) << (wait & 1), maxWaitMillis);
return false;
}
else
{
haveWaited = true;
try
{
wait(wait & ~1);
}
{code}
Now that I'm more familiar with the changes here I'm going to need to re-read
the notes on this ticket and test out / inspect output. The above is just from
my first read-through of the code yesterday.
> Repair tracing
> --------------
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
> Issue Type: Improvement
> Components: Tools
> Reporter: Yuki Morishita
> Assignee: Ben Chan
> Priority: Minor
> Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt,
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch,
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch,
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch,
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
> 5483-v07-08-Fix-brace-style.patch,
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
> 5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch,
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch,
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch,
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch,
> 5483-v08-14-Poll-system_traces.events.patch,
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch,
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch,
> 5483-v10-17-minor-bugfixes-and-changes.patch,
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch,
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch,
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch,
> 5483-v14-01-squashed.patch,
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch,
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch,
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch,
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch,
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch,
> ccm-repair-test, cqlsh-left-justify-text-columns.patch,
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt,
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch,
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
> tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt,
> [email protected],
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch,
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
> v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing
> stores traces to system keyspace. With it, you don't have to lookup each log
> file to see what was the status and how it performed the repair you invoked.
> Instead, you can query the repair log with session ID to see the state and
> stats of all nodes involved in that repair session.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)