[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054502#comment-15054502
 ] 

Anubhav Kale commented on CASSANDRA-10580:
------------------------------------------

Thanks for the interesting suggestion. Actually I considered going down that 
route when I started working on it, but I just wasn't sure what the rationale / 
design philosophy behind adding new metrics was therefore took a simpler route. 
Glad to see your feedback.

I have attached 10580-metrics.patch and will open a separate JIRA for doing 
this on a CF basis. I am using ApproximateTime class wherever its not taking 
part in decision of dropping the mutation and simply used for logging. I hope 
that makes sense.

I can clean up the methods in MessagingService a bit more if you like (couple 
of them are printing the same message). I wanted to send this out first to make 
sure I was on the right path. 

Also, a question: It appears that Timer.Update appends entries to the metric 
(which is what we want). Do you know at what point it starts dropping new 
appends / starts giving up ? I wonder if there is a huge number of dropped 
mutations will the timeTaken metric mess up ?

To make this work for CF, I will probably pass the mutation to 
MessagingService.LogDroppedMessages (maybe through an overload) and update the 
metrics on appropriate CF. Does that make sense ?

If this change looks good, I am more inclined towards making this work for CF 
before making up patches for old branches. Let me know if that's okay.

Appreciate your time and feedback !

> On dropped mutations, more details should be logged.
> ----------------------------------------------------
>
>                 Key: CASSANDRA-10580
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Coordination
>         Environment: Production
>            Reporter: Anubhav Kale
>            Assignee: Anubhav Kale
>            Priority: Minor
>             Fix For: 3.2, 2.2.x
>
>         Attachments: 10580-Metrics.patch, 10580.patch, 
> CASSANDRA-10580-Head.patch, Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to