[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054502#comment-15054502 ]
Anubhav Kale commented on CASSANDRA-10580: ------------------------------------------ Thanks for the interesting suggestion. Actually I considered going down that route when I started working on it, but I just wasn't sure what the rationale / design philosophy behind adding new metrics was therefore took a simpler route. Glad to see your feedback. I have attached 10580-metrics.patch and will open a separate JIRA for doing this on a CF basis. I am using ApproximateTime class wherever its not taking part in decision of dropping the mutation and simply used for logging. I hope that makes sense. I can clean up the methods in MessagingService a bit more if you like (couple of them are printing the same message). I wanted to send this out first to make sure I was on the right path. Also, a question: It appears that Timer.Update appends entries to the metric (which is what we want). Do you know at what point it starts dropping new appends / starts giving up ? I wonder if there is a huge number of dropped mutations will the timeTaken metric mess up ? To make this work for CF, I will probably pass the mutation to MessagingService.LogDroppedMessages (maybe through an overload) and update the metrics on appropriate CF. Does that make sense ? If this change looks good, I am more inclined towards making this work for CF before making up patches for old branches. Let me know if that's okay. Appreciate your time and feedback ! > On dropped mutations, more details should be logged. > ---------------------------------------------------- > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production > Reporter: Anubhav Kale > Assignee: Anubhav Kale > Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 10580-Metrics.patch, 10580.patch, > CASSANDRA-10580-Head.patch, Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)