[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10580:
-------------------------------------
    Description: 
In our production cluster, we are seeing a large number of dropped mutations. 
At a minimum, we should print the time the thread took to get scheduled thereby 
dropping the mutation. This will help find the right tuning parameter for 
write_timeout_in_ms. 

The change will need to be done in StorageProxy.java and MessagingTask.java. It 
is easy, and I will submit a patch shortly.



  was:
In our production cluster, we are seeing a large number of dropped mutations. 
It would be really helpful to see which column families are really affected by 
this (either through logs or through a dedicated counter for every column 
family).

I have made a hack in StorageProxy (below) to help us with this. I am happy to 
extend this to a better solution (print the CF affected in as logger.debug and 
then manually grep) if experts agree this additional detail would be helpful in 
general. Any other suggestions are welcome.

    private static abstract class LocalMutationRunnable implements Runnable
    {
        private final long constructionTime = System.currentTimeMillis();
        
        private IMutation mutation;

        public final void run()
        {
            if (System.currentTimeMillis() > constructionTime + 2000L)
            {
                long timeTaken = System.currentTimeMillis() - constructionTime;
                logger.warn("Anubhav LocalMutationRunnable thread ran after " + 
timeTaken);
                    
                try
                {
                         for(ColumnFamily family : 
this.mutation.getColumnFamilies())
                     {
                                if 
(family.toString().toLowerCase().contains("udsuserdailysnapshot"))
                        {
                                
MessagingService.instance().incrementDroppedMessages(MessagingService.Verb.USERDAILY);
                        }
                        
                        else if 
(family.toString().toLowerCase().contains("udsuserhourlysnapshot"))
                        {
                                
MessagingService.instance().incrementDroppedMessages(MessagingService.Verb.USERHOURLY);
                        }
                        
                        else if 
(family.toString().toLowerCase().contains("udstenantdailysnapshot"))
                        {
                                
MessagingService.instance().incrementDroppedMessages(MessagingService.Verb.TENANTDAILY);
                        }
                        
                        else if 
(family.toString().toLowerCase().contains("udstenanthourlysnapshot"))
                        {
                                
MessagingService.instance().incrementDroppedMessages(MessagingService.Verb.TENANTHOURLY);
                        }
                        
                        else if 
(family.toString().toLowerCase().contains("userdatasetraw"))
                        {
                                
MessagingService.instance().incrementDroppedMessages(MessagingService.Verb.USERDSRAW);
                        }
                                
                        else if 
(family.toString().toLowerCase().contains("tenants"))
                        {
                                
MessagingService.instance().incrementDroppedMessages(MessagingService.Verb.TENANTS);
                        }
                                
                        else if 
(family.toString().toLowerCase().contains("users"))
                        {
                                
MessagingService.instance().incrementDroppedMessages(MessagingService.Verb.USERS);
                        }
                                
                        else if 
(family.toString().toLowerCase().contains("tenantactivity"))
                        {
                                
MessagingService.instance().incrementDroppedMessages(MessagingService.Verb.TENANTACTIVITY);
                        }
                                
                        else if 
(family.getKeySpaceName().toLowerCase().contains("system"))
                        {
                                
MessagingService.instance().incrementDroppedMessages(MessagingService.Verb.SYSTEMKS);
                        }
                        
                                else
                        {
                                logger.warn("Anubhav LocalMutationRunnable 
updating mutations for " + family.toString().toLowerCase());
                                
MessagingService.instance().incrementDroppedMessages(MessagingService.Verb.OTHERTBL);
                        }
                     }  
                }
                catch (Exception e)
                {
                        logger.error("Anubhav LocalMutationRunnable Exception 
", e);
                }
                
                
MessagingService.instance().incrementDroppedMessages(MessagingService.Verb.MUTATION);
                
                HintRunnable runnable = new 
HintRunnable(FBUtilities.getBroadcastAddress())
                {
                    protected void runMayThrow() throws Exception
                    {
                        LocalMutationRunnable.this.runMayThrow();
                    }
                };
                submitHint(runnable);
                return;
            }

            try
            {
                runMayThrow();
            }
            catch (Exception e)
            {
                throw new RuntimeException(e);
            }
        }
        
        public LocalMutationRunnable(IMutation mutation)
        {
                this.mutation = mutation;
        }

        abstract protected void runMayThrow() throws Exception;
    }




> When mutations are dropped, the column family should be printed / have a 
> counter per column family
> --------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10580
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>         Environment: Production
>            Reporter: Anubhav Kale
>            Priority: Minor
>             Fix For: 2.1.x
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation. This will help find the right tuning parameter 
> for write_timeout_in_ms. 
> The change will need to be done in StorageProxy.java and MessagingTask.java. 
> It is easy, and I will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to