Thanks Eric Stevens for your detailed reply!! I got your points.
On Thu, Jun 16, 2016 at 11:49 PM, Eric Stevens <migh...@gmail.com> wrote: > Are you executing all queries with tracing enabled? If so that introduces > overhead you probably don't want. Most people probably don't see this log > very often because it's the exception to query with tracing enabled, and > not the rule (it's a diagnostic thing usually turned on only when > troubleshooting another problem). > > The trace messages being dropped aren't in-and-of-themselves a problem, it > just means that your traces would be incomplete. However this indicates a > cluster that's either unhealthy or getting close to it. > > Since you have a single node, run nodetool tpstats (if you had a cluster > you'd want to do this on each node). Look at the bottom section, ideally > it looks like this: > Message type Dropped > READ 0 > RANGE_SLICE 0 > _TRACE 0 > MUTATION 0 > COUNTER_MUTATION 0 > BINARY 0 > REQUEST_RESPONSE 0 > PAGED_RANGE 0 > READ_REPAIR 0 > > If you're seeing MUTATIONs or COUNTER_MUTATIONs dropped, your data > integrity is at risk, you need to reduce pressure in some way, as your > cluster is falling behind. That might be expanding your node count > *above* your RF, or it might be tuning your application to reduce read or > write pressure (note that expanding a cluster involves read pressure on the > existing nodes, so to successfully expand your cluster you probably *also* > need > to reduce pressure from the application, because the act of growing your > cluster will cause you to get even further behind). > > On a single node, like yours, mutations dropped is data lost. On a > cluster with RF > 1, there's just a *chance* that you have lost data (the > same mutation needs to be dropped by all replicas for it to be > irrecoverably lost). Dropped mutations are always bad. > > In this scenario you probably also want to double check nodetool > compactionstats, and ideally you have < 4 or 5 compactions. Lower is > always better, 4 or 5 is getting worrying if it remains in that range. If > you're above 10, you're falling behind on compaction too, and your read > performance will be suffering. > > Finally RE your JVM settings, with a 30GB node, I think you could turn > that up a bit if you convert to G1GC. JVM tuning is definitely not my > strong point, there are others on this list who will be able to help you do > a better job of it. If you're seeing big GC pauses, then you do need to > work on this, even if they're not the direct cause of the dropped traces. > With that column family count you'll be under more GC pressure than you > would be with a lower CF count (there is a fixed memory cost per CF). > Reconsider your data model, usually this many column families suggests > dynamically creating CF's (eg to solve multi-tenancy). If your CF count > will grow steadily over time at any appreciable rate, that's an > anti-pattern. > > On Thu, Jun 16, 2016 at 2:40 AM Varun Barala <varunbaral...@gmail.com> > wrote: > >> Thanks!! Eric Stevens for your reply. >> >> We have following JVM settings :- >> --------------------------------------------- >> *memtable_offheap_space_in_mb: 15360 (*found in casandra.yaml >> *)* >> *MAX_HEAP_SIZE="16G" (*found in cassandra-env.sh >> *)*--------------------------------------------- >> >> And I also found big GC in log. But messages and big GC were logged at >> different-different time in system.log. I was expecting to happen them at >> same time after reading your reply. I also manually triggered GC but >> messages were not dropped. >> >> is *TRACE message drop *harmful or it's okay we can neglect them? >> >> Thank you!! >> >> >> On Wed, Jun 15, 2016 at 8:45 PM, Eric Stevens <migh...@gmail.com> wrote: >> >>> This is better kept to the User groups. >>> >>> What are your JVM memory settings for Cassandra, and have you seen big >>> GC's in your logs? >>> >>> The reason I ask is because that's a large number of column families, >>> which produces memory pressure, and at first blush that strikes me as a >>> likely cause. >>> >>> >>> On Wed, Jun 15, 2016 at 3:23 AM Varun Barala <varunbaral...@gmail.com> >>> wrote: >>> >>>> Hi all, >>>> >>>> Can anyone tell me that what are all possible reasons for below log:- >>>> >>>> >>>> *"INFO [ScheduledTasks:1] 2016-06-14 06:27:39,498 >>>> MessagingService.java:929 - _TRACE messages were dropped in last 5000 ms: >>>> 928 for internal timeout and 0 for cross node timeout".* >>>> I searched online for the same and found some reasons like:- >>>> >>>> * Disk is not able to keep up with your ingest >>>> * Resources are not able to support all parallel running tasks >>>> * If other nodes are down then due to large hint replay >>>> * Heavy workload >>>> >>>> But in this case other kind of messages (mutation, read, write etc) >>>> should be dropped by *C** but It doesn't happen. >>>> >>>> ----------------------------- >>>> Cluster Specifications >>>> ------------------------------ >>>> number of nodes = 1 >>>> total number of CF = 2000 >>>> >>>> ----------------------------- >>>> Machine Specifications >>>> ------------------------------ >>>> RAM 30 GB >>>> hard disk SSD >>>> ubuntu 14.04 >>>> >>>> >>>> Thanks in advance!! >>>> >>>> Regards, >>>> Varun Barala >>>> >>> >>