[jira] [Commented] (CASSANDRA-5483) Repair tracing

Ben Chan (JIRA) Wed, 22 Oct 2014 15:05:07 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180613#comment-14180613
 ]


Ben Chan commented on CASSANDRA-5483:
-------------------------------------

I'm going to dissapoint you and say that a lot of the reasons for how things 
are is historical, or due to code transformations intended to precisely 
maintain some aspect of the old behavior, without trying to question the logic 
behind that particular behavior.

In other words, I've just been trying to add this one particular behavior 
(repair tracing) without otherwise changing program behavior. This is probably 
most obvious around the trace calls, where (hopefully) the log messages are 
unchanged, even when they're different from the trace messages. I really need 
to go back through that with a fine-tooth comb, because I think there were some 
places where I may have changed capitalization. Or worse.

----

Most of your comments were straightforward. Just need some clarification on 
these:

{quote}
DebuggableThreadPoolExecutor:

Do we intend to create w/Integer.MAX_VALUE as 'size' on this method?
{quote}

Yes, in order to preserve the behavior of the previous code (see my 2014-03-08 
comment and the discussion surrounding it).

{quote}
If so, name should reflect it like other methods in file.
{quote}

Just to confirm (though I'm mostly sure this is what you mean), Do you mean 
that the function name should be changed to something like 
{{createCachedThreadpoolWithMaximumPoolSize}}?

{quote}
OutboundTcpConnection:

Why do we assume that if TraceType is null the type is QUERY?
{quote}

If an older server version ever initiates a trace, then traceType could be 
null. And if it *is* null, that means that it's a QUERY traceType (since that's 
the only kind of trace available in previous versions). If a newer server 
version ever sends a null traceType, that should probably be an error, though 
nothing like that is being done yet.

{quote}
StorageService:

Swallowing Throwable on line 2828 w/out any logging or comment as to why is 
unclear. Using exception handling as control flow is step back from the 
sameThreadExecutor error handling approach taken prior as we lose naming and 
intention information. Does this gain us something that I'm not aware of?
{quote}

Before saying anything else: yes, I probably should have commented the 
exception-swallowing.

I touched on this code change earlier in the paragraph that begins "Note that I 
switched some code". In a nutshell: preserving behavior of previous code.

Whether I succeeded or not is a different issue, but at least let me run 
through my thought process so that you can help me find the holes in it. Here 
is a redacted version of the code:

{noformat}
// Previous version:
Futures.addCallback(allSessions, new FutureCallback<Object>()
{
    public void onSuccess(@Nullable Object result)
    {
        // onSuccess stuff ...
        repairComplete();
    }

    public void onFailure(Throwable t)
    {
        // onFailure stuff (empty, except for repairComplete) ...
        repairComplete();
    }

    private void repairComplete()
    {
        // repairComplete stuff ...
    }
}, MoreExecutors.sameThreadExecutor());
{noformat}

Here are my assumptions:
- {{// onSuccess stuff ...}} does not throw any exceptions out of its code 
block.
- {{// repairComplete stuff ...}} works the same no matter which thread it runs 
in (assuming it's called at the proper time, of course).
- The guava library used corresponds to the source code found at 
{{build/lib/sources/guava-16.0.1-sources.jar}}.

{noformat}
// Review version:
try
{
    allSessions.get();
    // onSuccess stuff ...
}
catch (Throwable t)
{
    // onFailure stuff (empty, except for repairComplete) ...
}
// repairComplete stuff ...
{noformat}

Since {{allSessions.get()}} is the only thing that can throw to the catch 
block, then its success or failure should (almost, see below) be the only way 
to trigger the onSuccess and onFailure "branches" that correspond to the 
original code.

Intended difference:
- {{// repairComplete stuff ...}} should now run in the main repair thread, 
instead of in the thread of the last exiting subtask in allSessions.

Unintended differences:
- Futures#addCallback uses Uninterruptibles#getUninterruptibly instead of a 
plain Future#get, the main difference being that it will keep retrying 
Future#get even in the face of InterruptedException. This may be important, so 
I could do this also.

{quote}
TraceState

Right now isDone is only called by the query thread but there's nothing in the 
method signature or comments of that method documenting the 
doubling-every-second-run algorithm it's using.
{quote}

The doubling behavior used to be contained in the query thread; "isDone" used 
to be a (more or less) simple wait with timeout.

Agree on needing comments. In the interests of not gratuitously rearranging 
code, I'll merely see about adding better comments and making the code less 
magical.

----

I can't get to any code-editing at the moment, but probably sometime tomorrow.

The patch in its current form should still be fine if you're reviewing the 
functionality, though.


> Repair tracing
> --------------
>
>                 Key: CASSANDRA-5483
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5483
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Yuki Morishita
>            Assignee: Ben Chan
>            Priority: Minor
>              Labels: repair
>             Fix For: 3.0
>
>         Attachments: 5483-full-trunk.txt, 
> 5483-v06-04-Allow-tracing-ttl-to-be-configured.patch, 
> 5483-v06-05-Add-a-command-column-to-system_traces.events.patch, 
> 5483-v06-06-Fix-interruption-in-tracestate-propagation.patch, 
> 5483-v07-07-Better-constructor-parameters-for-DebuggableThreadPoolExecutor.patch,
>  5483-v07-08-Fix-brace-style.patch, 
> 5483-v07-09-Add-trace-option-to-a-more-complete-set-of-repair-functions.patch,
>  5483-v07-10-Correct-name-of-boolean-repairedAt-to-fullRepair.patch, 
> 5483-v08-11-Shorten-trace-messages.-Use-Tracing-begin.patch, 
> 5483-v08-12-Trace-streaming-in-Differencer-StreamingRepairTask.patch, 
> 5483-v08-13-sendNotification-of-local-traces-back-to-nodetool.patch, 
> 5483-v08-14-Poll-system_traces.events.patch, 
> 5483-v08-15-Limit-trace-notifications.-Add-exponential-backoff.patch, 
> 5483-v09-16-Fix-hang-caused-by-incorrect-exit-code.patch, 
> 5483-v10-17-minor-bugfixes-and-changes.patch, 
> 5483-v10-rebased-and-squashed-471f5cc.patch, 5483-v11-01-squashed.patch, 
> 5483-v11-squashed-nits.patch, 5483-v12-02-cassandra-yaml-ttl-doc.patch, 
> 5483-v13-608fb03-May-14-trace-formatting-changes.patch, 
> 5483-v14-01-squashed.patch, 
> 5483-v15-02-Hook-up-exponential-backoff-functionality.patch, 
> 5483-v15-03-Exact-doubling-for-exponential-backoff.patch, 
> 5483-v15-04-Re-add-old-StorageService-JMX-signatures.patch, 
> 5483-v15-05-Move-command-column-to-system_traces.sessions.patch, 
> 5483-v15.patch, 5483-v17-00.patch, 5483-v17-01.patch, 5483-v17.patch, 
> ccm-repair-test, cqlsh-left-justify-text-columns.patch, 
> prerepair-vs-postbuggedrepair.diff, test-5483-system_traces-events.txt, 
> trunk@4620823-5483-v02-0001-Trace-filtering-and-tracestate-propagation.patch, 
> trunk@4620823-5483-v02-0002-Put-a-few-traces-parallel-to-the-repair-logging.patch,
>  tr...@8ebeee1-5483-v01-001-trace-filtering-and-tracestate-propagation.txt, 
> [email protected], 
> v02p02-5483-v03-0003-Make-repair-tracing-controllable-via-nodetool.patch, 
> v02p02-5483-v04-0003-This-time-use-an-EnumSet-to-pass-boolean-repair-options.patch,
>  v02p02-5483-v05-0003-Use-long-instead-of-EnumSet-to-work-with-JMX.patch
>
>
> I think it would be nice to log repair stats and results like query tracing 
> stores traces to system keyspace. With it, you don't have to lookup each log 
> file to see what was the status and how it performed the repair you invoked. 
> Instead, you can query the repair log with session ID to see the state and 
> stats of all nodes involved in that repair session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-5483) Repair tracing

Reply via email to