Re: Reason for Trace Message Drop

Varun Barala Thu, 16 Jun 2016 20:58:09 -0700

Thanks Eric Stevens for your detailed reply!!

I got your points.



On Thu, Jun 16, 2016 at 11:49 PM, Eric Stevens <migh...@gmail.com> wrote:

> Are you executing all queries with tracing enabled?  If so that introduces
> overhead you probably don't want.  Most people probably don't see this log
> very often because it's the exception to query with tracing enabled, and
> not the rule (it's a diagnostic thing usually turned on only when
> troubleshooting another problem).
>
> The trace messages being dropped aren't in-and-of-themselves a problem, it
> just means that your traces would be incomplete.  However this indicates a
> cluster that's either unhealthy or getting close to it.
>
> Since you have a single node, run nodetool tpstats (if you had a cluster
> you'd want to do this on each node).  Look at the bottom section, ideally
> it looks like this:
> Message type           Dropped
> READ                         0
> RANGE_SLICE                  0
> _TRACE                       0
> MUTATION                     0
> COUNTER_MUTATION             0
> BINARY                       0
> REQUEST_RESPONSE             0
> PAGED_RANGE                  0
> READ_REPAIR                  0
>
> If you're seeing MUTATIONs or COUNTER_MUTATIONs dropped, your data
> integrity is at risk, you need to reduce pressure in some way, as your
> cluster is falling behind.  That might be expanding your node count
> *above* your RF, or it might be tuning your application to reduce read or
> write pressure (note that expanding a cluster involves read pressure on the
> existing nodes, so to successfully expand your cluster you probably *also* 
> need
> to reduce pressure from the application, because the act of growing your
> cluster will cause you to get even further behind).
>
> On a single node, like yours, mutations dropped is data lost.  On a
> cluster with RF > 1, there's just a *chance* that you have lost data (the
> same mutation needs to be dropped by all replicas for it to be
> irrecoverably lost).  Dropped mutations are always bad.
>
> In this scenario you probably also want to double check nodetool
> compactionstats, and ideally you have < 4 or 5 compactions.  Lower is
> always better, 4 or 5 is getting worrying if it remains in that range.  If
> you're above 10, you're falling behind on compaction too, and your read
> performance will be suffering.
>
> Finally RE your JVM settings, with a 30GB node, I think you could turn
> that up a bit if you convert to G1GC.  JVM tuning is definitely not my
> strong point, there are others on this list who will be able to help you do
> a better job of it.  If you're seeing big GC pauses, then you do need to
> work on this, even if they're not the direct cause of the dropped traces.
> With that column family count you'll be under more GC pressure than you
> would be with a lower CF count (there is a fixed memory cost per CF).
> Reconsider your data model, usually this many column families suggests
> dynamically creating CF's (eg to solve multi-tenancy).  If your CF count
> will grow steadily over time at any appreciable rate, that's an
> anti-pattern.
>
> On Thu, Jun 16, 2016 at 2:40 AM Varun Barala <varunbaral...@gmail.com>
> wrote:
>
>> Thanks!! Eric Stevens for your reply.
>>
>> We have following JVM settings :-
>> ---------------------------------------------
>> *memtable_offheap_space_in_mb: 15360  (*found in casandra.yaml
>> *)*
>> *MAX_HEAP_SIZE="16G"  (*found in cassandra-env.sh
>> *)*---------------------------------------------
>>
>> And I also found big GC in log. But messages and big GC were logged at
>> different-different time in system.log. I was expecting to happen them at
>> same time after reading your reply. I also manually triggered GC but
>> messages were not dropped.
>>
>> is *TRACE message drop *harmful or it's okay we can neglect them?
>>
>> Thank you!!
>>
>>
>> On Wed, Jun 15, 2016 at 8:45 PM, Eric Stevens <migh...@gmail.com> wrote:
>>
>>> This is better kept to the User groups.
>>>
>>> What are your JVM memory settings for Cassandra, and have you seen big
>>> GC's in your logs?
>>>
>>> The reason I ask is because that's a large number of column families,
>>> which produces memory pressure, and at first blush that strikes me as a
>>> likely cause.
>>>
>>>
>>> On Wed, Jun 15, 2016 at 3:23 AM Varun Barala <varunbaral...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Can anyone tell me that what are all possible reasons for below log:-
>>>>
>>>>
>>>> *"INFO  [ScheduledTasks:1] 2016-06-14 06:27:39,498
>>>> MessagingService.java:929 - _TRACE messages were dropped in last 5000 ms:
>>>> 928 for internal timeout and 0 for cross node timeout".*
>>>> I searched online for the same and found some reasons like:-
>>>>
>>>> * Disk is not able to keep up with your ingest
>>>> * Resources are not able to support all parallel running tasks
>>>> * If other nodes are down then due to large hint replay
>>>> * Heavy workload
>>>>
>>>> But in this case other kind of messages (mutation, read, write etc)
>>>>  should be dropped by *C** but It doesn't happen.
>>>>
>>>> -----------------------------
>>>> Cluster Specifications
>>>> ------------------------------
>>>> number of nodes = 1
>>>> total number of CF = 2000
>>>>
>>>> -----------------------------
>>>> Machine Specifications
>>>> ------------------------------
>>>> RAM 30 GB
>>>> hard disk SSD
>>>> ubuntu 14.04
>>>>
>>>>
>>>> Thanks in advance!!
>>>>
>>>> Regards,
>>>> Varun Barala
>>>>
>>>
>>

Re: Reason for Trace Message Drop

Reply via email to