Are you executing all queries with tracing enabled?  If so that introduces
overhead you probably don't want.  Most people probably don't see this log
very often because it's the exception to query with tracing enabled, and
not the rule (it's a diagnostic thing usually turned on only when
troubleshooting another problem).

The trace messages being dropped aren't in-and-of-themselves a problem, it
just means that your traces would be incomplete.  However this indicates a
cluster that's either unhealthy or getting close to it.

Since you have a single node, run nodetool tpstats (if you had a cluster
you'd want to do this on each node).  Look at the bottom section, ideally
it looks like this:
Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
MUTATION                     0
COUNTER_MUTATION             0
BINARY                       0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0

If you're seeing MUTATIONs or COUNTER_MUTATIONs dropped, your data
integrity is at risk, you need to reduce pressure in some way, as your
cluster is falling behind.  That might be expanding your node count
*above* your
RF, or it might be tuning your application to reduce read or write pressure
(note that expanding a cluster involves read pressure on the existing
nodes, so to successfully expand your cluster you probably *also* need to
reduce pressure from the application, because the act of growing your
cluster will cause you to get even further behind).

On a single node, like yours, mutations dropped is data lost.  On a cluster
with RF > 1, there's just a *chance* that you have lost data (the same
mutation needs to be dropped by all replicas for it to be irrecoverably
lost).  Dropped mutations are always bad.

In this scenario you probably also want to double check nodetool
compactionstats, and ideally you have < 4 or 5 compactions.  Lower is
always better, 4 or 5 is getting worrying if it remains in that range.  If
you're above 10, you're falling behind on compaction too, and your read
performance will be suffering.

Finally RE your JVM settings, with a 30GB node, I think you could turn that
up a bit if you convert to G1GC.  JVM tuning is definitely not my strong
point, there are others on this list who will be able to help you do a
better job of it.  If you're seeing big GC pauses, then you do need to work
on this, even if they're not the direct cause of the dropped traces.  With
that column family count you'll be under more GC pressure than you would be
with a lower CF count (there is a fixed memory cost per CF).  Reconsider
your data model, usually this many column families suggests dynamically
creating CF's (eg to solve multi-tenancy).  If your CF count will grow
steadily over time at any appreciable rate, that's an anti-pattern.

On Thu, Jun 16, 2016 at 2:40 AM Varun Barala <varunbaral...@gmail.com>
wrote:

> Thanks!! Eric Stevens for your reply.
>
> We have following JVM settings :-
> ---------------------------------------------
> *memtable_offheap_space_in_mb: 15360  (*found in casandra.yaml
> *)*
> *MAX_HEAP_SIZE="16G"  (*found in cassandra-env.sh
> *)*---------------------------------------------
>
> And I also found big GC in log. But messages and big GC were logged at
> different-different time in system.log. I was expecting to happen them at
> same time after reading your reply. I also manually triggered GC but
> messages were not dropped.
>
> is *TRACE message drop *harmful or it's okay we can neglect them?
>
> Thank you!!
>
>
> On Wed, Jun 15, 2016 at 8:45 PM, Eric Stevens <migh...@gmail.com> wrote:
>
>> This is better kept to the User groups.
>>
>> What are your JVM memory settings for Cassandra, and have you seen big
>> GC's in your logs?
>>
>> The reason I ask is because that's a large number of column families,
>> which produces memory pressure, and at first blush that strikes me as a
>> likely cause.
>>
>>
>> On Wed, Jun 15, 2016 at 3:23 AM Varun Barala <varunbaral...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> Can anyone tell me that what are all possible reasons for below log:-
>>>
>>>
>>> *"INFO  [ScheduledTasks:1] 2016-06-14 06:27:39,498
>>> MessagingService.java:929 - _TRACE messages were dropped in last 5000 ms:
>>> 928 for internal timeout and 0 for cross node timeout".*
>>> I searched online for the same and found some reasons like:-
>>>
>>> * Disk is not able to keep up with your ingest
>>> * Resources are not able to support all parallel running tasks
>>> * If other nodes are down then due to large hint replay
>>> * Heavy workload
>>>
>>> But in this case other kind of messages (mutation, read, write etc)
>>>  should be dropped by *C** but It doesn't happen.
>>>
>>> -----------------------------
>>> Cluster Specifications
>>> ------------------------------
>>> number of nodes = 1
>>> total number of CF = 2000
>>>
>>> -----------------------------
>>> Machine Specifications
>>> ------------------------------
>>> RAM 30 GB
>>> hard disk SSD
>>> ubuntu 14.04
>>>
>>>
>>> Thanks in advance!!
>>>
>>> Regards,
>>> Varun Barala
>>>
>>
>

Reply via email to