Re: MUTATION messages were dropped in last 5000 ms for cross node timeout

2017-08-15 Thread Erick Ramirez
Mutations get dropped because a node can't keep up with writes. If you
understand the Cassandra write path, writes are ACKed when the mutation is
appended to the commitlog which is why it's very fast.

Knowing that, dropped mutations mean that the disk is not able to keep up
with the IO. Another word for it is "overloaded".

So what do you do? Use fast SSDs (HDDs and EBS won't do) and size your
cluster correctly to deal with the peak load, i.e. add more nodes. Cheers!

On Fri, Jul 21, 2017 at 1:27 AM, ZAIDI, ASAD A  wrote:

> Hello Folks –
>
>
>
> I’m using apache-cassandra 2.2.8.
>
>
>
> I see many messages like below in my system.log file. In Cassandra.yaml
> file [ cross_node_timeout: true] is set and NTP server is also running
> correcting clock drift on 16node cluster. I do not see pending or blocked
> HintedHandoff  in tpstats output though there are bunch of MUTATIONS
> dropped observed.
>
>
>
> 
>
> INFO  [ScheduledTasks:1] 2017-07-20 08:02:52,511 MessagingService.java:946
> - MUTATION messages were dropped in last 5000 ms: 822 for internal timeout
> and 2152 for cross node timeout
>
> 
>
>
>
> I’m seeking help here if you please let me know what I need to check in
> order to address these cross node timeouts.
>
>
>
> Thank you,
>
> Asad
>
>
>


Re: MUTATION messages were dropped in last 5000 ms for cross node timeout

2017-08-04 Thread Akhil Mehra
Glad I could be of help :)

Hopefully the partition size resize goes smoothly.

Regards,
Akhil

> On 4/08/2017, at 5:41 AM, ZAIDI, ASAD A <az1...@att.com> wrote:
> 
> Hi Akhil,
>  
> Thank you for your reply.
>  
> I kept testing different timeout numbers over last week and eventually 
> settled at setting *_request_timeout_in_ms parameters at 1.5minutes for 
> coordinator wait time. That is the number where I donot see any dropped 
> mutations. 
>  
> Also asked developers to tweak data model where we saw bunch of tables with 
> really large partition size , some are ranging  Partition-key size around 
> ~6.6GB.. we’re now working to reduce the partition size of the tables. I am 
> hoping corrected data model will help reduce coordinator wait time (get back 
> to default number!)  again.
>  
> Thank again/Asad
>  
> From: Akhil Mehra [mailto:akhilme...@gmail.com] 
> Sent: Friday, July 21, 2017 4:24 PM
> To: user@cassandra.apache.org
> Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node 
> timeout
>  
> Hi Asad,
>  
> The 5000 ms is not configurable 
> (https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/net/MessagingService.java#L423
>  
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_8b3a60b9a7dbefeecc06bace617279612ec7092d_src_java_org_apache_cassandra_net_MessagingService.java-23L423=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=FsmDztdsVuIKml8IDhdHdg=dp_TvjXTbUtu3Iu43aZ83eHl1fgW6l4P4PSQglF855g=USbrEM6jaGFIRKSUhJBx3VAkSSrXzid0db6TDV1vrDs=>).
>  This just the time after which the number of dropped messages are reported. 
> Thus dropped messages are reported every 5000ms. 
>  
> If you are looking to tweak the number of ms after which a message is 
> considered dropped then you need to use the write_request_timeout_in_ms.  The 
> write_request_timeout_in_ms 
> (http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html
>  
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.datastax.com_en_cassandra_2.1_cassandra_configuration_configCassandra-5Fyaml-5Fr.html=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=FsmDztdsVuIKml8IDhdHdg=dp_TvjXTbUtu3Iu43aZ83eHl1fgW6l4P4PSQglF855g=ab1NW9WoAXIlxT2kWjsiYFVaVidEnC_MB770pwTtqLs=>)
>  can be used to increase the mutation timeout. By default it is set to 2000ms.
>  
> I hope that helps.
>  
> Regards,
> Akhil
>  
>  
> On 22/07/2017, at 2:46 AM, ZAIDI, ASAD A <az1...@att.com 
> <mailto:az1...@att.com>> wrote:
>  
> Hi Akhil,
>  
> Thank you for your reply. Previously, I did ‘tune’ various timeouts – 
> basically increased them a bit but none of those parameter listed in the link 
> matches with that “were dropped in last 5000 ms”.
> I was wondering from where that [5000ms] number is coming from when,  like I 
> mentioned before, none of any timeout parameter settings matches that #!
>  
> Load is intermittently high but again cpu queue length never goes beyond 
> medium depth. I wonder if there is some internal limit that I’m still not 
> aware of.
>  
> Thanks/Asad
>  
>  
> From: Akhil Mehra [mailto:akhilme...@gmail.com <mailto:akhilme...@gmail.com>] 
> Sent: Thursday, July 20, 2017 3:47 PM
> To: user@cassandra.apache.org <mailto:user@cassandra.apache.org>
> Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node 
> timeout
>  
> Hi Asad,
>  
> http://cassandra.apache.org/doc/latest/faq/index.html#why-message-dropped 
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org_doc_latest_faq_index.html-23why-2Dmessage-2Ddropped=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=FsmDztdsVuIKml8IDhdHdg=WcHuHKcjg2YCsAbw2NR_0-CiHr9JNxtCzYikia16mpo=0_0pQfoOZLuswpQ_lE-AU2bTMFLgRbR4k4Kh8vEOZSk=>
>  
> As mentioned in the link above this is a load shedding mechanism used by 
> Cassandra.
>  
> Is you cluster under heavy load?
>  
> Regards,
> Akhil
>  
>  
> On 21/07/2017, at 3:27 AM, ZAIDI, ASAD A <az1...@att.com 
> <mailto:az1...@att.com>> wrote:
>  
> Hello Folks –
>  
> I’m using apache-cassandra 2.2.8.
>  
> I see many messages like below in my system.log file. In Cassandra.yaml file 
> [ cross_node_timeout: true] is set and NTP server is also running correcting 
> clock drift on 16node cluster. I do not see pending or blocked HintedHandoff  
> in tpstats output though there are bunch of MUTATIONS dropped observed.
>  
> 
> INFO  [ScheduledTasks:1] 2017-07-20 08:02:52,511 MessagingService.java:946 - 
> MUTATION messages were dropped in last 5000 ms: 822 for internal timeout and 
> 2152 for cross node timeout
> 
>  
> I’m seeking help here if you please let me know what I need to check in order 
> to address these cross node timeouts.
>  
> Thank you,
> Asad



RE: MUTATION messages were dropped in last 5000 ms for cross node timeout

2017-08-03 Thread ZAIDI, ASAD A
Hi Akhil,

Thank you for your reply.

I kept testing different timeout numbers over last week and eventually settled 
at setting *_request_timeout_in_ms parameters at 1.5minutes for coordinator 
wait time. That is the number where I donot see any dropped mutations.

Also asked developers to tweak data model where we saw bunch of tables with 
really large partition size , some are ranging  Partition-key size around 
~6.6GB.. we’re now working to reduce the partition size of the tables. I am 
hoping corrected data model will help reduce coordinator wait time (get back to 
default number!)  again.

Thank again/Asad

From: Akhil Mehra [mailto:akhilme...@gmail.com]
Sent: Friday, July 21, 2017 4:24 PM
To: user@cassandra.apache.org
Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node 
timeout

Hi Asad,

The 5000 ms is not configurable 
(https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/net/MessagingService.java#L423<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_8b3a60b9a7dbefeecc06bace617279612ec7092d_src_java_org_apache_cassandra_net_MessagingService.java-23L423=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=FsmDztdsVuIKml8IDhdHdg=dp_TvjXTbUtu3Iu43aZ83eHl1fgW6l4P4PSQglF855g=USbrEM6jaGFIRKSUhJBx3VAkSSrXzid0db6TDV1vrDs=>).
 This just the time after which the number of dropped messages are reported. 
Thus dropped messages are reported every 5000ms.

If you are looking to tweak the number of ms after which a message is 
considered dropped then you need to use the write_request_timeout_in_ms.  The 
write_request_timeout_in_ms 
(http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.datastax.com_en_cassandra_2.1_cassandra_configuration_configCassandra-5Fyaml-5Fr.html=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=FsmDztdsVuIKml8IDhdHdg=dp_TvjXTbUtu3Iu43aZ83eHl1fgW6l4P4PSQglF855g=ab1NW9WoAXIlxT2kWjsiYFVaVidEnC_MB770pwTtqLs=>)
 can be used to increase the mutation timeout. By default it is set to 2000ms.

I hope that helps.

Regards,
Akhil


On 22/07/2017, at 2:46 AM, ZAIDI, ASAD A 
<az1...@att.com<mailto:az1...@att.com>> wrote:

Hi Akhil,

Thank you for your reply. Previously, I did ‘tune’ various timeouts – basically 
increased them a bit but none of those parameter listed in the link matches 
with that “were dropped in last 5000 ms”.
I was wondering from where that [5000ms] number is coming from when,  like I 
mentioned before, none of any timeout parameter settings matches that #!

Load is intermittently high but again cpu queue length never goes beyond medium 
depth. I wonder if there is some internal limit that I’m still not aware of.

Thanks/Asad


From: Akhil Mehra [mailto:akhilme...@gmail.com]
Sent: Thursday, July 20, 2017 3:47 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node 
timeout

Hi Asad,

http://cassandra.apache.org/doc/latest/faq/index.html#why-message-dropped<https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org_doc_latest_faq_index.html-23why-2Dmessage-2Ddropped=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=FsmDztdsVuIKml8IDhdHdg=WcHuHKcjg2YCsAbw2NR_0-CiHr9JNxtCzYikia16mpo=0_0pQfoOZLuswpQ_lE-AU2bTMFLgRbR4k4Kh8vEOZSk=>

As mentioned in the link above this is a load shedding mechanism used by 
Cassandra.

Is you cluster under heavy load?

Regards,
Akhil


On 21/07/2017, at 3:27 AM, ZAIDI, ASAD A 
<az1...@att.com<mailto:az1...@att.com>> wrote:

Hello Folks –

I’m using apache-cassandra 2.2.8.

I see many messages like below in my system.log file. In Cassandra.yaml file [ 
cross_node_timeout: true] is set and NTP server is also running correcting 
clock drift on 16node cluster. I do not see pending or blocked HintedHandoff  
in tpstats output though there are bunch of MUTATIONS dropped observed.


INFO  [ScheduledTasks:1] 2017-07-20 08:02:52,511 MessagingService.java:946 - 
MUTATION messages were dropped in last 5000 ms: 822 for internal timeout and 
2152 for cross node timeout


I’m seeking help here if you please let me know what I need to check in order 
to address these cross node timeouts.

Thank you,
Asad



Re: MUTATION messages were dropped in last 5000 ms for cross node timeout

2017-07-21 Thread Akhil Mehra
Hi Asad,

The 5000 ms is not configurable 
(https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/net/MessagingService.java#L423
 
<https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/net/MessagingService.java#L423>).
 This just the time after which the number of dropped messages are reported. 
Thus dropped messages are reported every 5000ms. 

If you are looking to tweak the number of ms after which a message is 
considered dropped then you need to use the write_request_timeout_in_ms.  The 
write_request_timeout_in_ms 
(http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html
 
<http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html>)
 can be used to increase the mutation timeout. By default it is set to 2000ms.

I hope that helps.

Regards,
Akhil


> On 22/07/2017, at 2:46 AM, ZAIDI, ASAD A <az1...@att.com> wrote:
> 
> Hi Akhil,
>  
> Thank you for your reply. Previously, I did ‘tune’ various timeouts – 
> basically increased them a bit but none of those parameter listed in the link 
> matches with that “were dropped in last 5000 ms”.
> I was wondering from where that [5000ms] number is coming from when,  like I 
> mentioned before, none of any timeout parameter settings matches that #!
>  
> Load is intermittently high but again cpu queue length never goes beyond 
> medium depth. I wonder if there is some internal limit that I’m still not 
> aware of.
>  
> Thanks/Asad
>  
>  
> From: Akhil Mehra [mailto:akhilme...@gmail.com] 
> Sent: Thursday, July 20, 2017 3:47 PM
> To: user@cassandra.apache.org
> Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node 
> timeout
>  
> Hi Asad,
>  
> http://cassandra.apache.org/doc/latest/faq/index.html#why-message-dropped 
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org_doc_latest_faq_index.html-23why-2Dmessage-2Ddropped=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=FsmDztdsVuIKml8IDhdHdg=WcHuHKcjg2YCsAbw2NR_0-CiHr9JNxtCzYikia16mpo=0_0pQfoOZLuswpQ_lE-AU2bTMFLgRbR4k4Kh8vEOZSk=>
>  
> As mentioned in the link above this is a load shedding mechanism used by 
> Cassandra.
>  
> Is you cluster under heavy load?
>  
> Regards,
> Akhil
>  
>  
> On 21/07/2017, at 3:27 AM, ZAIDI, ASAD A <az1...@att.com 
> <mailto:az1...@att.com>> wrote:
>  
> Hello Folks –
>  
> I’m using apache-cassandra 2.2.8.
>  
> I see many messages like below in my system.log file. In Cassandra.yaml file 
> [ cross_node_timeout: true] is set and NTP server is also running correcting 
> clock drift on 16node cluster. I do not see pending or blocked HintedHandoff  
> in tpstats output though there are bunch of MUTATIONS dropped observed.
>  
> 
> INFO  [ScheduledTasks:1] 2017-07-20 08:02:52,511 MessagingService.java:946 - 
> MUTATION messages were dropped in last 5000 ms: 822 for internal timeout and 
> 2152 for cross node timeout
> 
>  
> I’m seeking help here if you please let me know what I need to check in order 
> to address these cross node timeouts.
>  
> Thank you,
> Asad



RE: MUTATION messages were dropped in last 5000 ms for cross node timeout

2017-07-21 Thread Anuj Wadehra
Hi Asad, 
You can increase it by 2 at a time.  For example if its currently 2, try 
increasing it to 4 and retest. 
We flush 5-6 tables at a time and use 3 memtable_flush_writers. It works 
great!! There were dropped mutations when it was set to one. The idea is to 
make sure that writes are not blocked. 
ThanksAnuj
Sent from Yahoo Mail on Android 
 
  On Fri, 21 Jul 2017 at 20:04, ZAIDI, ASAD A<az1...@att.com> wrote:   
#yiv0831784205 #yiv0831784205 -- _filtered #yiv0831784205 {panose-1:2 4 5 3 5 4 
6 3 2 4;} _filtered #yiv0831784205 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 
3 2 4;}#yiv0831784205 #yiv0831784205 p.yiv0831784205MsoNormal, #yiv0831784205 
li.yiv0831784205MsoNormal, #yiv0831784205 div.yiv0831784205MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;}#yiv0831784205 a:link, 
#yiv0831784205 span.yiv0831784205MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv0831784205 a:visited, #yiv0831784205 
span.yiv0831784205MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv0831784205 
p.yiv0831784205msonormal, #yiv0831784205 li.yiv0831784205msonormal, 
#yiv0831784205 div.yiv0831784205msonormal 
{margin-right:0in;margin-left:0in;font-size:12.0pt;}#yiv0831784205 
p.yiv0831784205msochpdefault, #yiv0831784205 li.yiv0831784205msochpdefault, 
#yiv0831784205 div.yiv0831784205msochpdefault 
{margin-right:0in;margin-left:0in;font-size:12.0pt;}#yiv0831784205 
span.yiv0831784205msohyperlink {}#yiv0831784205 
span.yiv0831784205msohyperlinkfollowed {}#yiv0831784205 
span.yiv0831784205emailstyle17 {}#yiv0831784205 p.yiv0831784205msonormal1, 
#yiv0831784205 li.yiv0831784205msonormal1, #yiv0831784205 
div.yiv0831784205msonormal1 
{margin:0in;margin-bottom:.0001pt;font-size:11.0pt;}#yiv0831784205 
span.yiv0831784205msohyperlink1 
{color:#0563C1;text-decoration:underline;}#yiv0831784205 
span.yiv0831784205msohyperlinkfollowed1 
{color:#954F72;text-decoration:underline;}#yiv0831784205 
span.yiv0831784205emailstyle171 {color:windowtext;}#yiv0831784205 
p.yiv0831784205msochpdefault1, #yiv0831784205 li.yiv0831784205msochpdefault1, 
#yiv0831784205 div.yiv0831784205msochpdefault1 
{margin-right:0in;margin-left:0in;font-size:12.0pt;}#yiv0831784205 
span.yiv0831784205EmailStyle27 {color:#1F497D;}#yiv0831784205 
.yiv0831784205MsoChpDefault {} _filtered #yiv0831784205 {margin:1.0in 1.0in 
1.0in 1.0in;}#yiv0831784205 div.yiv0831784205WordSection1 {}#yiv0831784205 
Thank you for your reply. I’ll increase memTable_flush_writes and report back 
if it helps.
 
  
 
Is there any formula that we can use to arrive at correct number of 
memTable_flush_writers ? or the exercise would windup be like “try and error” 
taking much time to arrive at some number that may not be optimal.  
 
  
 
Thank you again.
 
  
 
  
 
  
 
From: Anuj Wadehra [mailto:anujw_2...@yahoo.co.in]
Sent: Thursday, July 20, 2017 12:17 PM
To: ZAIDI, ASAD A <az1...@att.com>; user@cassandra.apache.org
Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node 
timeout
 
  
 
Hi Asad, 
 
  
 
You can do following things:
 
  
 
1.Increase memtable_flush_writers especially if you have a write heavy load. 
 
  
 
2.Make sure there are no big gc pauses on your nodes. If yes,  go for heap 
tuning. 
 
  
 
  
 
Please let us know whether above measures fixed your problem or not. 
 
  
 
  
 
Thanks
 
Anuj
 
  
 
Sent from Yahoo Mail on Android
 
  
 

On Thu, 20 Jul 2017 at 20:57, ZAIDI, ASAD A
 
<az1...@att.com> wrote:
 
Hello Folks –
 
 
 
I’m using apache-cassandra 2.2.8.
 
 
 
I see many messages like below in my system.log file. In Cassandra.yaml file [ 
cross_node_timeout: true] is set and NTP server is also running correcting 
clock drift on 16node cluster. I do not see pending or blocked HintedHandoff  
in tpstats output though there are bunch of MUTATIONS dropped observed.
 
 
 

 
INFO  [ScheduledTasks:1] 2017-07-20 08:02:52,511 MessagingService.java:946 - 
MUTATION messages were dropped in last 5000 ms: 822 for internal timeout and 
2152 for cross node timeout
 

 
 
 
I’m seeking help here if you please let me know what I need to check in order 
to address these cross node timeouts.
 
 
 
Thank you,
 
Asad
 
 
 
  


RE: MUTATION messages were dropped in last 5000 ms for cross node timeout

2017-07-21 Thread ZAIDI, ASAD A
Hi Akhil,

Thank you for your reply. Previously, I did ‘tune’ various timeouts – basically 
increased them a bit but none of those parameter listed in the link matches 
with that “were dropped in last 5000 ms”.
I was wondering from where that [5000ms] number is coming from when,  like I 
mentioned before, none of any timeout parameter settings matches that #!

Load is intermittently high but again cpu queue length never goes beyond medium 
depth. I wonder if there is some internal limit that I’m still not aware of.

Thanks/Asad


From: Akhil Mehra [mailto:akhilme...@gmail.com]
Sent: Thursday, July 20, 2017 3:47 PM
To: user@cassandra.apache.org
Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node 
timeout

Hi Asad,

http://cassandra.apache.org/doc/latest/faq/index.html#why-message-dropped<https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org_doc_latest_faq_index.html-23why-2Dmessage-2Ddropped=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=FsmDztdsVuIKml8IDhdHdg=WcHuHKcjg2YCsAbw2NR_0-CiHr9JNxtCzYikia16mpo=0_0pQfoOZLuswpQ_lE-AU2bTMFLgRbR4k4Kh8vEOZSk=>

As mentioned in the link above this is a load shedding mechanism used by 
Cassandra.

Is you cluster under heavy load?

Regards,
Akhil


On 21/07/2017, at 3:27 AM, ZAIDI, ASAD A 
<az1...@att.com<mailto:az1...@att.com>> wrote:

Hello Folks –

I’m using apache-cassandra 2.2.8.

I see many messages like below in my system.log file. In Cassandra.yaml file [ 
cross_node_timeout: true] is set and NTP server is also running correcting 
clock drift on 16node cluster. I do not see pending or blocked HintedHandoff  
in tpstats output though there are bunch of MUTATIONS dropped observed.


INFO  [ScheduledTasks:1] 2017-07-20 08:02:52,511 MessagingService.java:946 - 
MUTATION messages were dropped in last 5000 ms: 822 for internal timeout and 
2152 for cross node timeout


I’m seeking help here if you please let me know what I need to check in order 
to address these cross node timeouts.

Thank you,
Asad



RE: MUTATION messages were dropped in last 5000 ms for cross node timeout

2017-07-21 Thread ZAIDI, ASAD A
Thanks for your reply Subroto – I’ll try you suggestions to see if this help. 
I’ll revert with results.


From: Subroto Barua [mailto:sbarua...@yahoo.com.INVALID]
Sent: Thursday, July 20, 2017 12:22 PM
To: user@cassandra.apache.org
Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node 
timeout

In a cloud environment, cross_node_timeout = true can cause issues; we had this 
issue in our environment and it is set to false now.
Dropped messages is an another issue

Subroto

On Jul 20, 2017, at 8:27 AM, ZAIDI, ASAD A 
<az1...@att.com<mailto:az1...@att.com>> wrote:
Hello Folks –

I’m using apache-cassandra 2.2.8.

I see many messages like below in my system.log file. In Cassandra.yaml file [ 
cross_node_timeout: true] is set and NTP server is also running correcting 
clock drift on 16node cluster. I do not see pending or blocked HintedHandoff  
in tpstats output though there are bunch of MUTATIONS dropped observed.


INFO  [ScheduledTasks:1] 2017-07-20 08:02:52,511 MessagingService.java:946 - 
MUTATION messages were dropped in last 5000 ms: 822 for internal timeout and 
2152 for cross node timeout


I’m seeking help here if you please let me know what I need to check in order 
to address these cross node timeouts.

Thank you,
Asad



RE: MUTATION messages were dropped in last 5000 ms for cross node timeout

2017-07-21 Thread ZAIDI, ASAD A
Thank you for your reply. I’ll increase memTable_flush_writes and report back 
if it helps.

Is there any formula that we can use to arrive at correct number of 
memTable_flush_writers ? or the exercise would windup be like “try and error” 
taking much time to arrive at some number that may not be optimal.

Thank you again.



From: Anuj Wadehra [mailto:anujw_2...@yahoo.co.in]
Sent: Thursday, July 20, 2017 12:17 PM
To: ZAIDI, ASAD A <az1...@att.com>; user@cassandra.apache.org
Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node 
timeout

Hi Asad,

You can do following things:

1.Increase memtable_flush_writers especially if you have a write heavy load.

2.Make sure there are no big gc pauses on your nodes. If yes,  go for heap 
tuning.


Please let us know whether above measures fixed your problem or not.


Thanks
Anuj

Sent from Yahoo Mail on 
Android<https://urldefense.proofpoint.com/v2/url?u=https-3A__overview.mail.yahoo.com_mobile_-3F.src-3DAndroid=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=FsmDztdsVuIKml8IDhdHdg=wu-cVspapPKFg21Ul4vRDSw9hpc-s7H1OxPMTPTEApE=YCtcGLTbnUV3ZWX_ZFNGe-DCoXPuokN0NCCwS_ahD2I=>

On Thu, 20 Jul 2017 at 20:57, ZAIDI, ASAD A
<az1...@att.com<mailto:az1...@att.com>> wrote:

Hello Folks –



I’m using apache-cassandra 2.2.8.



I see many messages like below in my system.log file. In Cassandra.yaml file [ 
cross_node_timeout: true] is set and NTP server is also running correcting 
clock drift on 16node cluster. I do not see pending or blocked HintedHandoff  
in tpstats output though there are bunch of MUTATIONS dropped observed.





INFO  [ScheduledTasks:1] 2017-07-20 08:02:52,511 MessagingService.java:946 - 
MUTATION messages were dropped in last 5000 ms: 822 for internal timeout and 
2152 for cross node timeout





I’m seeking help here if you please let me know what I need to check in order 
to address these cross node timeouts.



Thank you,

Asad




Re: MUTATION messages were dropped in last 5000 ms for cross node timeout

2017-07-20 Thread Akhil Mehra
Hi Asad,

http://cassandra.apache.org/doc/latest/faq/index.html#why-message-dropped 


As mentioned in the link above this is a load shedding mechanism used by 
Cassandra.

Is you cluster under heavy load?

Regards,
Akhil


> On 21/07/2017, at 3:27 AM, ZAIDI, ASAD A  wrote:
> 
> Hello Folks –
>  
> I’m using apache-cassandra 2.2.8.
>  
> I see many messages like below in my system.log file. In Cassandra.yaml file 
> [ cross_node_timeout: true] is set and NTP server is also running correcting 
> clock drift on 16node cluster. I do not see pending or blocked HintedHandoff  
> in tpstats output though there are bunch of MUTATIONS dropped observed.
>  
> 
> INFO  [ScheduledTasks:1] 2017-07-20 08:02:52,511 MessagingService.java:946 - 
> MUTATION messages were dropped in last 5000 ms: 822 for internal timeout and 
> 2152 for cross node timeout
> 
>  
> I’m seeking help here if you please let me know what I need to check in order 
> to address these cross node timeouts.
>  
> Thank you,
> Asad



Re: MUTATION messages were dropped in last 5000 ms for cross node timeout

2017-07-20 Thread Subroto Barua
In a cloud environment, cross_node_timeout = true can cause issues; we had this 
issue in our environment and it is set to false now.
Dropped messages is an another issue

Subroto 

> On Jul 20, 2017, at 8:27 AM, ZAIDI, ASAD A  wrote:
> 
> Hello Folks –
>  
> I’m using apache-cassandra 2.2.8.
>  
> I see many messages like below in my system.log file. In Cassandra.yaml file 
> [ cross_node_timeout: true] is set and NTP server is also running correcting 
> clock drift on 16node cluster. I do not see pending or blocked HintedHandoff  
> in tpstats output though there are bunch of MUTATIONS dropped observed.
>  
> 
> INFO  [ScheduledTasks:1] 2017-07-20 08:02:52,511 MessagingService.java:946 - 
> MUTATION messages were dropped in last 5000 ms: 822 for internal timeout and 
> 2152 for cross node timeout
> 
>  
> I’m seeking help here if you please let me know what I need to check in order 
> to address these cross node timeouts.
>  
> Thank you,
> Asad
>  


Re: MUTATION messages were dropped in last 5000 ms for cross node timeout

2017-07-20 Thread Anuj Wadehra
Hi Asad, 
You can do following things:
1.Increase memtable_flush_writers especially if you have a write heavy load. 
2.Make sure there are no big gc pauses on your nodes. If yes,  go for heap 
tuning. 

Please let us know whether above measures fixed your problem or not. 

ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Thu, 20 Jul 2017 at 20:57, ZAIDI, ASAD A wrote:

Hello Folks –
 
  
 
I’m using apache-cassandra 2.2.8.
 
  
 
I see many messages like below in my system.log file. In Cassandra.yaml file [ 
cross_node_timeout: true] is set and NTP server is also running correcting 
clock drift on 16node cluster. I do not see pending or blocked HintedHandoff  
in tpstats output though there are bunch of MUTATIONS dropped observed.
 
  
 

 
INFO  [ScheduledTasks:1] 2017-07-20 08:02:52,511 MessagingService.java:946 - 
MUTATION messages were dropped in last 5000 ms: 822 for internal timeout and 
2152 for cross node timeout
 

 
  
 
I’m seeking help here if you please let me know what I need to check in order 
to address these cross node timeouts.
 
  
 
Thank you,
 
Asad