[
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180613#comment-14180613
]
Ben Chan commented on CASSANDRA-5483:
-------------------------------------
I'm going to dissapoint you and say that a lot of the reasons for how things
are is historical, or due to code transformations intended to precisely
maintain some aspect of the old behavior, without trying to question the logic
behind that particular behavior.
In other words, I've just been trying to add this one particular behavior
(repair tracing) without otherwise changing program behavior. This is probably
most obvious around the trace calls, where (hopefully) the log messages are
unchanged, even when they're different from the trace messages. I really need
to go back through that with a fine-tooth comb, because I think there were some
places where I may have changed capitalization. Or worse.
----
Most of your comments were straightforward. Just need some clarification on
these:
{quote}
DebuggableThreadPoolExecutor:
Do we intend to create w/Integer.MAX_VALUE as 'size' on this method?
{quote}
Yes, in order to preserve the behavior of the previous code (see my 2014-03-08
comment and the discussion surrounding it).
{quote}
If so, name should reflect it like other methods in file.
{quote}
Just to confirm (though I'm mostly sure this is what you mean), Do you mean
that the function name should be changed to something like
{{createCachedThreadpoolWithMaximumPoolSize}}?
{quote}
OutboundTcpConnection:
Why do we assume that if TraceType is null the type is QUERY?
{quote}
If an older server version ever initiates a trace, then traceType could be
null. And if it *is* null, that means that it's a QUERY traceType (since that's
the only kind of trace available in previous versions). If a newer server
version ever sends a null traceType, that should probably be an error, though
nothing like that is being done yet.
{quote}
StorageService:
Swallowing Throwable on line 2828 w/out any logging or comment as to why is
unclear. Using exception handling as control flow is step back from the
sameThreadExecutor error handling approach taken prior as we lose naming and
intention information. Does this gain us something that I'm not aware of?
{quote}
Before saying anything else: yes, I probably should have commented the
exception-swallowing.
I touched on this code change earlier in the paragraph that begins "Note that I
switched some code". In a nutshell: preserving behavior of previous code.
Whether I succeeded or not is a different issue, but at least let me run
through my thought process so that you can help me find the holes in it. Here
is a redacted version of the code:
{noformat}
// Previous version:
Futures.addCallback(allSessions, new FutureCallback<Object>()
{
public void onSuccess(@Nullable Object result)
{
// onSuccess stuff ...
repairComplete();
}
public void onFailure(Throwable t)
{
// onFailure stuff (empty, except for repairComplete) ...
repairComplete();
}
private void repairComplete()
{
// repairComplete stuff ...
}
}, MoreExecutors.sameThreadExecutor());
{noformat}
Here are my assumptions:
- {{// onSuccess stuff ...}} does not throw any exceptions out of its code
block.
- {{// repairComplete stuff ...}} works the same no matter which thread it runs
in (assuming it's called at the proper time, of course).
- The guava library used corresponds to the source code found at
{{build/lib/sources/guava-16.0.1-sources.jar}}.
{noformat}
// Review version:
try
{
allSessions.get();
// onSuccess stuff ...
}
catch (Throwable t)
{
// onFailure stuff (empty, except for repairComplete) ...
}
// repairComplete stuff ...
{noformat}
Since {{allSessions.get()}} is the only thing that can throw to the catch
block, then its success or failure should (almost, see below) be the only way
to trigger the onSuccess and onFailure "branches" that correspond to the
original code.
Intended difference:
- {{// repairComplete stuff ...}} should now run in the main repair thread,
instead of in the thread of the last exiting subtask in allSessions.
Unintended differences:
- Futures#addCallback uses Uninterruptibles#getUninterruptibly instead of a
plain Future#get, the main difference being that it will keep retrying
Future#get even in the face of InterruptedException. This may be important, so
I could do this also.
{quote}
TraceState
Right now isDone is only called by the query thread but there's nothing in the
method signature or comments of that method documenting the
doubling-every-second-run algorithm it's using.
{quote}
The doubling behavior used to be contained in the query thread; "isDone" used
to be a (more or less) simple wait with timeout.
Agree on needing comments. In the interests of not gratuitously rearranging
code, I'll merely see about adding better comments and making the code less
magical.
----
I can't get to any code-editing at the moment, but probably sometime tomorrow.
The patch in its current form should still be fine if you're reviewing the
functionality, though.
> Repair tracing
> --------------
>
> Key: CASSANDRA-5483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
> Project: Cassandra
> Issue Type: Improvement
> Components: Tools
> Reporter: Yuki Morishita
> Assignee: Ben Chan
> Priority: Minor
> Labels: repair
> Fix For: 3.0
>
> Attachments: 5483-full-trunk.txt,
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch,
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch,
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch,
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
> 5483-v07-08-Fix-brace-style.patch,
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
> 5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch,
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch,
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch,
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch,
> 5483-v08-14-Poll-system_traces.events.patch,
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch,
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch,
> 5483-v10-17-minor-bugfixes-and-changes.patch,
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch,
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch,
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch,
> 5483-v14-01-squashed.patch,
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch,
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch,
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch,
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch,
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch,
> ccm-repair-test, cqlsh-left-justify-text-columns.patch,
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt,
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch,
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
> tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt,
> [email protected],
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch,
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
> v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing
> stores traces to system keyspace. With it, you don't have to lookup each log
> file to see what was the status and how it performed the repair you invoked.
> Instead, you can query the repair log with session ID to see the state and
> stats of all nodes involved in that repair session.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)