RE: MUTATION messages were dropped in last 5000 ms for cross node timeout

2017-08-03 Thread ZAIDI, ASAD A
Hi Akhil,

Thank you for your reply.

I kept testing different timeout numbers over last week and eventually settled 
at setting *_request_timeout_in_ms parameters at 1.5minutes for coordinator 
wait time. That is the number where I donot see any dropped mutations.

Also asked developers to tweak data model where we saw bunch of tables with 
really large partition size , some are ranging  Partition-key size around 
~6.6GB.. we’re now working to reduce the partition size of the tables. I am 
hoping corrected data model will help reduce coordinator wait time (get back to 
default number!)  again.

Thank again/Asad

From: Akhil Mehra [mailto:akhilme...@gmail.com]
Sent: Friday, July 21, 2017 4:24 PM
To: user@cassandra.apache.org
Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node 
timeout

Hi Asad,

The 5000 ms is not configurable 
(https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/net/MessagingService.java#L423).
 This just the time after which the number of dropped messages are reported. 
Thus dropped messages are reported every 5000ms.

If you are looking to tweak the number of ms after which a message is 
considered dropped then you need to use the write_request_timeout_in_ms.  The 
write_request_timeout_in_ms 
(http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html)
 can be used to increase the mutation timeout. By default it is set to 2000ms.

I hope that helps.

Regards,
Akhil


On 22/07/2017, at 2:46 AM, ZAIDI, ASAD A 
> wrote:

Hi Akhil,

Thank you for your reply. Previously, I did ‘tune’ various timeouts – basically 
increased them a bit but none of those parameter listed in the link matches 
with that “were dropped in last 5000 ms”.
I was wondering from where that [5000ms] number is coming from when,  like I 
mentioned before, none of any timeout parameter settings matches that #!

Load is intermittently high but again cpu queue length never goes beyond medium 
depth. I wonder if there is some internal limit that I’m still not aware of.

Thanks/Asad


From: Akhil Mehra [mailto:akhilme...@gmail.com]
Sent: Thursday, July 20, 2017 3:47 PM
To: user@cassandra.apache.org
Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node 
timeout

Hi Asad,

http://cassandra.apache.org/doc/latest/faq/index.html#why-message-dropped

As mentioned in the link above this is a load shedding mechanism used by 
Cassandra.

Is you cluster under heavy load?

Regards,
Akhil


On 21/07/2017, at 3:27 AM, ZAIDI, ASAD A 
> wrote:

Hello Folks –

I’m using apache-cassandra 2.2.8.

I see many messages like below in my system.log file. In Cassandra.yaml file [ 
cross_node_timeout: true] is set and NTP server is also running correcting 
clock drift on 16node cluster. I do not see pending or blocked HintedHandoff  
in tpstats output though there are bunch of MUTATIONS dropped observed.


INFO  [ScheduledTasks:1] 2017-07-20 08:02:52,511 MessagingService.java:946 - 
MUTATION messages were dropped in last 5000 ms: 822 for internal timeout and 
2152 for cross node timeout


I’m seeking help here if you please let me know what I need to check in order 
to address these cross node timeouts.

Thank you,
Asad



Re: Replacing a Seed Node

2017-08-03 Thread Oleksandr Shulgin
On Thu, Aug 3, 2017 at 3:00 PM, Fd Habash  wrote:

> Hi all …
>
> I know there is plenty of docs on how to replace a seed node, but some are
> steps are contradictory e.g. need to remote the node from seed list for
> entire cluster.
>
>
>
> My cluster has 6 nodes with 3 seeds running C* 2.8. One seed node was
> terminated by AWS.
>

Hi,

First of all -- are you using instance storage or EBS?  If the latter: is
it attached with a setting to delete the volume on instance termination?
In other words: do you still have the data files from that node?

If you still have that EBS volume, you can start a replacement instance
with that volume attached with the same private IP address (unless it was
taken by any other EC2 instance meanwhile).  This would be preferred way,
since the node just gets UP again without bootstrapping and just needs to
replay hints or be repaired (if it was down longer than max_hint_window
which is 3 hours by default).

I came up with this procedure. Did I miss anything …
>
>
>
>1. Remove the node (decomm or removenode) based on its current status
>2. Remove the node from its own seed list
>   1. No need to remove it from other nodes. My cluster has 3 seeds
>3. Restart C* with auto_bootstrap = true
>4. Once autobootstrap is done, re-add the node as seed in its own
>Cassandra.yaml again
>5. Restart C* on this node
>6. No need to restart other nodes in the cluster
>
> You won't be able to decommission if the node is not up.  At the same
time, you can avoid changing topology (first to remove the dead node, then
to bootstrap a new one) by using -Dcassandra.replace_address=172.31.xx.yyy
i.e. address of that dead node.  If your Cassandra versions supports it,
use replace_address_first_boot.

This should bootstrap the node by streaming exactly the data your dead seed
node was responsible for previously.  After this is done, you still need to
do a rolling restart of all nodes, updating their seed list.  You should
remove the IP address of the dead seed and add the address of any currently
healthy node, not necessarily this freshly boot strapped one: consider
balancing Availability Zones, so that you have a seed node in each AZ.

Regards,
--
Alex


Replacing a Seed Node

2017-08-03 Thread Fd Habash
Hi all …
I know there is plenty of docs on how to replace a seed node, but some are 
steps are contradictory e.g. need to remote the node from seed list for entire 
cluster. 

My cluster has 6 nodes with 3 seeds running C* 2.8. One seed node was 
terminated by AWS. 

I came up with this procedure. Did I miss anything …

1) Remove the node (decomm or removenode) based on its current status
2) Remove the node from its own seed list
a. No need to remove it from other nodes. My cluster has 3 seeds
3) Restart C* with auto_bootstrap = true
4) Once autobootstrap is done, re-add the node as seed in its own 
Cassandra.yaml again
5) Restart C* on this node
6) No need to restart other nodes in the cluster



Thank you



Re: Bootstrapping a new Node with Consistency=ONE

2017-08-03 Thread Daniel Hölbling-Inzko
That makes sense. Thank you so much for pointing that out Alex.
So long story short. Once I am up to the RF I actually want (RF3 per DC)
and am just adding nodes for capacity joining the Ring will correctly work
and no inconsistencies will exist.
If I just change the RF the nodes don't have the data yet so a repair needs
to be run.

Awesome - thanks so much.

greetings Daniel

On Thu, 3 Aug 2017 at 09:56 Oleksandr Shulgin 
wrote:

> On Thu, Aug 3, 2017 at 9:33 AM, Daniel Hölbling-Inzko <
> daniel.hoelbling-in...@bitmovin.com> wrote:
>
>> No I set Auto bootstrap to true and the node was UN in nodetool status
>> but when doing a select on the node with ONE I got incomplete data.
>>
>
> What I think is happening here is not related to the new node being added.
>
> When you increase Replication Factor, that does not automatically
> redistribute the existing data.  It just makes other nodes responsible for
> portions of the data they might not really have yet.  So I would expect
> that all your nodes show some inconsistencies, before you run a full repair
> of the ring.
>
> I can fairly easily reproduce it locally with ccm[1], 3 nodes, version
> 3.0.13.
>
> $ ccm status
> Cluster: 'v3013'
> 
> node1: UP
> node3: UP
> node2: UP
>
> $ ccm node1 cqlsh
> cqlsh> create keyspace test_rf WITH replication = {'class':
> 'NetworkTopologyStrategy', 'datacenter1': 1};
> cqlsh> create table test_rf.t1(id int, data text, primary key(id));
> cqlsh> insert into test_rf.t1(id, data) values(1, 'one');
> cqlsh> select * from test_rf.t1;
>
>  id | data
> +--
>   1 |  one
>
> (1 rows)
>
> At this point selecting from t1 works correctly on any of the nodes with
> the default CL=ONE.
>
> If we would now increase the RF and try reading again, something
> surprising will happen:
>
> cqlsh> alter keyspace test_rf WITH replication = {'class':
> 'NetworkTopologyStrategy', 'datacenter1': 2};
> cqlsh> select * from test_rf.t1;
>
>  id | data
> +--
>
> (0 rows)
>
> And in my test this happens on all nodes at the same time.  Explanation is
> fairly simple: now a different node is responsible for the data that was
> written to only one other node previously.
>
> A repair in this tiny test is trivial:
> cqlsh> CONSISTENCY ALL;
> cqlsh> select * from test_rf.t1;
>
>  id | data
> +--
>   1 |  one
>
> (1 rows)
>
> And now the data can be read from any node again, since we did a "full
> repair".
>
> --
> Alex
>
> [1] https://github.com/pcmanus/ccm
>
>


Re: Bootstrapping a new Node with Consistency=ONE

2017-08-03 Thread Oleksandr Shulgin
On Thu, Aug 3, 2017 at 9:33 AM, Daniel Hölbling-Inzko <
daniel.hoelbling-in...@bitmovin.com> wrote:

> No I set Auto bootstrap to true and the node was UN in nodetool status but
> when doing a select on the node with ONE I got incomplete data.
>

What I think is happening here is not related to the new node being added.

When you increase Replication Factor, that does not automatically
redistribute the existing data.  It just makes other nodes responsible for
portions of the data they might not really have yet.  So I would expect
that all your nodes show some inconsistencies, before you run a full repair
of the ring.

I can fairly easily reproduce it locally with ccm[1], 3 nodes, version
3.0.13.

$ ccm status
Cluster: 'v3013'

node1: UP
node3: UP
node2: UP

$ ccm node1 cqlsh
cqlsh> create keyspace test_rf WITH replication = {'class':
'NetworkTopologyStrategy', 'datacenter1': 1};
cqlsh> create table test_rf.t1(id int, data text, primary key(id));
cqlsh> insert into test_rf.t1(id, data) values(1, 'one');
cqlsh> select * from test_rf.t1;

 id | data
+--
  1 |  one

(1 rows)

At this point selecting from t1 works correctly on any of the nodes with
the default CL=ONE.

If we would now increase the RF and try reading again, something surprising
will happen:

cqlsh> alter keyspace test_rf WITH replication = {'class':
'NetworkTopologyStrategy', 'datacenter1': 2};
cqlsh> select * from test_rf.t1;

 id | data
+--

(0 rows)

And in my test this happens on all nodes at the same time.  Explanation is
fairly simple: now a different node is responsible for the data that was
written to only one other node previously.

A repair in this tiny test is trivial:
cqlsh> CONSISTENCY ALL;
cqlsh> select * from test_rf.t1;

 id | data
+--
  1 |  one

(1 rows)

And now the data can be read from any node again, since we did a "full
repair".

--
Alex

[1] https://github.com/pcmanus/ccm


Re: Bootstrapping a new Node with Consistency=ONE

2017-08-03 Thread Daniel Hölbling-Inzko
No I set Auto bootstrap to true and the node was UN in nodetool status but
when doing a select on the node with ONE I got incomplete data.
Jeff Jirsa  schrieb am Do. 3. Aug. 2017 um 09:02:

> "nodetool status" shows node as UN (up normal) instead of UJ (up joining)
>
> What you're describing really sounds odd. Something isn't adding up to me
> but I'm not sure why. You shouldn't be able to query it directly until its
> bootstrapped as far as I know
>
> Are you sure you're not joining as a seed node? Or with auto bootstrap set
> to false?
>
>
> --
> Jeff Jirsa
>
>
> On Aug 2, 2017, at 11:52 PM, Daniel Hölbling-Inzko <
> daniel.hoelbling-in...@bitmovin.com> wrote:
>
> Thanks Jeff. How do I determine that bootstrap is finished? Haven't seen
> that anywhere so far.
>
> Reads via storage would be ok as every query would be checked by another
> node too. I was only seeing inconsistencies since clients went directly to
> the node with Consistency ONE
>
> Greetings
> Jeff Jirsa  schrieb am Mi. 2. Aug. 2017 um 16:01:
>
>> By the time bootstrap is complete it should be as consistent as the
>> source node - you can change start_native_transport to false to avoid
>> serving clients directly (tcp/9042), but it'll still serve reads via the
>> storage service (tcp/7000), but the guarantee is that data should be
>> consistent by the time bootstrap finishes
>>
>>
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> > On Aug 2, 2017, at 1:53 AM, Daniel Hölbling-Inzko <
>> daniel.hoelbling-in...@bitmovin.com> wrote:
>> >
>> > Hi,
>> > It's probably a strange question but I have a heavily read-optimized
>> payload where data integrity is not a big deal. So to keep latencies low I
>> am reading with Consistency ONE from my Multi-DC Cluster.
>> >
>> > Now the issue I saw is that I needed to add another Cassandra node (for
>> redundancy reasons).
>> > Since I want this for renduncancy I booted the node and then changed
>> the Replication of my Keyspace to include the new node (all nodes have 100%
>> of the data).
>> >
>> > The issue I was seeing is that clients that connected to the new Node
>> afterwards were seeing incomplete data - so the Key would already be
>> present, but the columns would all be null values.
>> > I expect this to die down once the node is fully replicated, but in the
>> meantime a lot of my connected clients were in trouble. (The application
>> can handle seeing old data - incomplete is another matter all together)
>> >
>> > The total data in question is a negligible 500kb (so nothing that
>> should really take any amount of time in my opinion but it took a few
>> minutes for the data to replicate over and I am still not sure everything
>> is replicated correctly).
>> >
>> > Increasing the RF to something higher won't really help as the setup is
>> dc1: 3; dc2: 2 (I added the second node in dc2). So a LOCAL_QUORUM in dc2
>> would still be 2 nodes which means I just can't loose either of them.
>> Adding a third node is not really cost effective for the current workloads
>> these nodes need to handle.
>> >
>> > Any advice on how to avoid this in the future? Is there a way to start
>> up a node that does not serve client requests but does replicate data?
>> >
>> > greetings Daniel
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>


Re: Bootstrapping a new Node with Consistency=ONE

2017-08-03 Thread Jeff Jirsa
"nodetool status" shows node as UN (up normal) instead of UJ (up joining)

What you're describing really sounds odd. Something isn't adding up to me but 
I'm not sure why. You shouldn't be able to query it directly until its 
bootstrapped as far as I know

Are you sure you're not joining as a seed node? Or with auto bootstrap set to 
false?


-- 
Jeff Jirsa


> On Aug 2, 2017, at 11:52 PM, Daniel Hölbling-Inzko 
>  wrote:
> 
> Thanks Jeff. How do I determine that bootstrap is finished? Haven't seen that 
> anywhere so far. 
> 
> Reads via storage would be ok as every query would be checked by another node 
> too. I was only seeing inconsistencies since clients went directly to the 
> node with Consistency ONE
> 
> Greetings 
> Jeff Jirsa  schrieb am Mi. 2. Aug. 2017 um 16:01:
>> By the time bootstrap is complete it should be as consistent as the source 
>> node - you can change start_native_transport to false to avoid serving 
>> clients directly (tcp/9042), but it'll still serve reads via the storage 
>> service (tcp/7000), but the guarantee is that data should be consistent by 
>> the time bootstrap finishes
>> 
>> 
>> 
>> 
>> --
>> Jeff Jirsa
>> 
>> 
>> > On Aug 2, 2017, at 1:53 AM, Daniel Hölbling-Inzko 
>> >  wrote:
>> >
>> > Hi,
>> > It's probably a strange question but I have a heavily read-optimized 
>> > payload where data integrity is not a big deal. So to keep latencies low I 
>> > am reading with Consistency ONE from my Multi-DC Cluster.
>> >
>> > Now the issue I saw is that I needed to add another Cassandra node (for 
>> > redundancy reasons).
>> > Since I want this for renduncancy I booted the node and then changed the 
>> > Replication of my Keyspace to include the new node (all nodes have 100% of 
>> > the data).
>> >
>> > The issue I was seeing is that clients that connected to the new Node 
>> > afterwards were seeing incomplete data - so the Key would already be 
>> > present, but the columns would all be null values.
>> > I expect this to die down once the node is fully replicated, but in the 
>> > meantime a lot of my connected clients were in trouble. (The application 
>> > can handle seeing old data - incomplete is another matter all together)
>> >
>> > The total data in question is a negligible 500kb (so nothing that should 
>> > really take any amount of time in my opinion but it took a few minutes for 
>> > the data to replicate over and I am still not sure everything is 
>> > replicated correctly).
>> >
>> > Increasing the RF to something higher won't really help as the setup is 
>> > dc1: 3; dc2: 2 (I added the second node in dc2). So a LOCAL_QUORUM in dc2 
>> > would still be 2 nodes which means I just can't loose either of them. 
>> > Adding a third node is not really cost effective for the current workloads 
>> > these nodes need to handle.
>> >
>> > Any advice on how to avoid this in the future? Is there a way to start up 
>> > a node that does not serve client requests but does replicate data?
>> >
>> > greetings Daniel
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 


Re: Bootstrapping a new Node with Consistency=ONE

2017-08-03 Thread Daniel Hölbling-Inzko
Thanks Jeff. How do I determine that bootstrap is finished? Haven't seen
that anywhere so far.

Reads via storage would be ok as every query would be checked by another
node too. I was only seeing inconsistencies since clients went directly to
the node with Consistency ONE

Greetings
Jeff Jirsa  schrieb am Mi. 2. Aug. 2017 um 16:01:

> By the time bootstrap is complete it should be as consistent as the source
> node - you can change start_native_transport to false to avoid serving
> clients directly (tcp/9042), but it'll still serve reads via the storage
> service (tcp/7000), but the guarantee is that data should be consistent by
> the time bootstrap finishes
>
>
>
>
> --
> Jeff Jirsa
>
>
> > On Aug 2, 2017, at 1:53 AM, Daniel Hölbling-Inzko <
> daniel.hoelbling-in...@bitmovin.com> wrote:
> >
> > Hi,
> > It's probably a strange question but I have a heavily read-optimized
> payload where data integrity is not a big deal. So to keep latencies low I
> am reading with Consistency ONE from my Multi-DC Cluster.
> >
> > Now the issue I saw is that I needed to add another Cassandra node (for
> redundancy reasons).
> > Since I want this for renduncancy I booted the node and then changed the
> Replication of my Keyspace to include the new node (all nodes have 100% of
> the data).
> >
> > The issue I was seeing is that clients that connected to the new Node
> afterwards were seeing incomplete data - so the Key would already be
> present, but the columns would all be null values.
> > I expect this to die down once the node is fully replicated, but in the
> meantime a lot of my connected clients were in trouble. (The application
> can handle seeing old data - incomplete is another matter all together)
> >
> > The total data in question is a negligible 500kb (so nothing that should
> really take any amount of time in my opinion but it took a few minutes for
> the data to replicate over and I am still not sure everything is replicated
> correctly).
> >
> > Increasing the RF to something higher won't really help as the setup is
> dc1: 3; dc2: 2 (I added the second node in dc2). So a LOCAL_QUORUM in dc2
> would still be 2 nodes which means I just can't loose either of them.
> Adding a third node is not really cost effective for the current workloads
> these nodes need to handle.
> >
> > Any advice on how to avoid this in the future? Is there a way to start
> up a node that does not serve client requests but does replicate data?
> >
> > greetings Daniel
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>