[
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054502#comment-15054502
]
Anubhav Kale commented on CASSANDRA-10580:
------------------------------------------
Thanks for the interesting suggestion. Actually I considered going down that
route when I started working on it, but I just wasn't sure what the rationale /
design philosophy behind adding new metrics was therefore took a simpler route.
Glad to see your feedback.
I have attached 10580-metrics.patch and will open a separate JIRA for doing
this on a CF basis. I am using ApproximateTime class wherever its not taking
part in decision of dropping the mutation and simply used for logging. I hope
that makes sense.
I can clean up the methods in MessagingService a bit more if you like (couple
of them are printing the same message). I wanted to send this out first to make
sure I was on the right path.
Also, a question: It appears that Timer.Update appends entries to the metric
(which is what we want). Do you know at what point it starts dropping new
appends / starts giving up ? I wonder if there is a huge number of dropped
mutations will the timeTaken metric mess up ?
To make this work for CF, I will probably pass the mutation to
MessagingService.LogDroppedMessages (maybe through an overload) and update the
metrics on appropriate CF. Does that make sense ?
If this change looks good, I am more inclined towards making this work for CF
before making up patches for old branches. Let me know if that's okay.
Appreciate your time and feedback !
> On dropped mutations, more details should be logged.
> ----------------------------------------------------
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
> Issue Type: Improvement
> Components: Coordination
> Environment: Production
> Reporter: Anubhav Kale
> Assignee: Anubhav Kale
> Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 10580-Metrics.patch, 10580.patch,
> CASSANDRA-10580-Head.patch, Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations.
> At a minimum, we should print the time the thread took to get scheduled
> thereby dropping the mutation (We should also print the Message / Mutation so
> it helps in figuring out which column family was affected). This will help
> find the right tuning parameter for write_timeout_in_ms.
> The change is small and is in StorageProxy.java and MessagingTask.java. I
> will submit a patch shortly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)