Re: inter dc bandwidth calculation

2020-01-27 Thread Georg Brandemann
Hello,

just as a small addition: The numbers also depend on your consistency level
used for reads. It will behave like that if you just read on local nodes.
If you do reads on ALL,  QUORUM or EACH_QUORUM etc. you need also include
the read volume in the calculation.

Regards,
Georg

Am Mi., 15. Jan. 2020 um 19:35 Uhr schrieb Osman Yozgatlıoğlu <
osman.yozgatlio...@gmail.com>:

> Thank you. I have an insight now.
>
> Regards,
> Osman
>
> On Wed, 15 Jan 2020 at 19:18, Reid Pinchback 
> wrote:
> >
> > Oh, duh.  Revise that.  I was forgetting that multi-dc writes are sent
> to a single node in the other dc and tagged to be forwarded to other nodes
> within the dc.
> >
> > So your quick-and-dirty estimate would be more like (write volume) x 2
> to leave headroom for random other mechanics.
> >
> > R
> >
> >
> > On 1/15/20, 11:07 AM, "Reid Pinchback" 
> wrote:
> >
> >  Message from External Sender
> >
> > I would think that it would be largely driven by the replication
> factor.  It isn't that the sstables are forklifted from one dc to another,
> it's just that the writes being made to the memtables are also shipped
> around by the coordinator nodes as the writes happen.  Operations at the
> sstable level, like compactions, are local to the node.
> >
> > One potential wrinkle that I'm unclear on, is related to repairs.  I
> don't know if merkle trees are biased to mostly bounce around only
> intra-dc, versus how often they are communicated inter-dc.  Note that even
> queries can trigger some degree of repair traffic if you have a usage
> pattern of trying to read data recently written, because at the bleeding
> edge of the recent changes you'll have more cases of rows not having had
> time to settle to a consistent state.
> >
> > If you want a quick-and-dirty heuristic, I'd probably take (write
> volume) x (replication factor) x 2 as a guestimate so you have some
> headroom for C* and TCP mechanics, but then monitor to see what your real
> use is.
> >
> > R
> >
> >
> > On 1/15/20, 4:14 AM, "Osman Yozgatlıoğlu" <
> osman.yozgatlio...@gmail.com> wrote:
> >
> >  Message from External Sender
> >
> > Hello,
> >
> > Is there any way to calculate inter dc bandwidth requirements for
> > proper operation?
> > I can't find any info about this subject.
> > Can we say, how much sstable collected at one dc has to be
> transferred to other?
> > I can calculate bandwidth with generated sstable then.
> > I have twcs with one hour window.
> >
> > Regards,
> > Osman
> >
> >
>  -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
> >
> >
> >
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: How to read content of hints file and apply them manually?

2020-01-27 Thread Surbhi Gupta
We tried to tune sethintedhandoffthrottlekb to 100 , 1024 , 10240 but
nothing helped .
Our hints related parameters are as below, if you don't find any parameter
below then it is not set in our environment and should be of the default
value.

max_hint_window_in_ms: 1080 # 3 hours

hinted_handoff_enabled: true

hinted_handoff_throttle_in_kb: 100

max_hints_delivery_threads: 8

hints_directory: /var/lib/cassandra/hints

hints_flush_period_in_ms: 1

max_hints_file_size_in_mb: 128

On Mon, 27 Jan 2020 at 18:34, Jeff Jirsa  wrote:

>
> The high cpu is probably the hints getting replayed slamming the write path
>
> Slowing it down with the hint throttle may help
>
> It’s not instant.
>
> On Jan 27, 2020, at 6:05 PM, Erick Ramirez  wrote:
>
> 
>
>> Increase the max_hint_window_in_ms setting in cassandra.yaml to more than
>> 3 hours, perhaps 6 hours. If the issue still persists networking may need
>> to be tested for bandwidth issues.
>>
>
> Just a note of warning about bumping up the hint window without
> understanding the pros and cons. Be aware that doubling it means:
>
>- you'll end up doubling the size of stored hints in
>the hints_directory
>- there'll be twice as much hints to replay when node(s) come back
>online
>
> There's always 2 sides to fiddling with the knobs in C*. Cheers!
>
>


Re: How to read content of hints file and apply them manually?

2020-01-27 Thread Jeff Jirsa

The high cpu is probably the hints getting replayed slamming the write path

Slowing it down with the hint throttle may help

It’s not instant. 

> On Jan 27, 2020, at 6:05 PM, Erick Ramirez  wrote:
> 
> 
>> Increase the max_hint_window_in_ms setting in cassandra.yaml to more than 3 
>> hours, perhaps 6 hours. If the issue still persists networking may need to 
>> be tested for bandwidth issues.
> 
> Just a note of warning about bumping up the hint window without understanding 
> the pros and cons. Be aware that doubling it means:
> you'll end up doubling the size of stored hints in the hints_directory
> there'll be twice as much hints to replay when node(s) come back online
> There's always 2 sides to fiddling with the knobs in C*. Cheers!


Re: How to read content of hints file and apply them manually?

2020-01-27 Thread Erick Ramirez
>
> Increase the max_hint_window_in_ms setting in cassandra.yaml to more than
> 3 hours, perhaps 6 hours. If the issue still persists networking may need
> to be tested for bandwidth issues.
>

Just a note of warning about bumping up the hint window without
understanding the pros and cons. Be aware that doubling it means:

   - you'll end up doubling the size of stored hints in the hints_directory
   - there'll be twice as much hints to replay when node(s) come back online

There's always 2 sides to fiddling with the knobs in C*. Cheers!


Re: new node stops streaming..

2020-01-27 Thread Erick Ramirez
You can increase the max number of open files on the new node. We find that
65K is too low for most production clusters and you can bump it up to 100
or 200K. We generally recommend 1 million but YMMV:

 - nofile 1048576


On Tue, Jan 28, 2020 at 11:55 AM Eunsu Kim  wrote:

> Hi experts
>
> I had a problem adding a new node.
>
> Joining node in datacenterA stops streaming while joining. So it keeps the
> UJ.
> (datacenterB is fine.)
>
> I try 'nodetool netstats' on a stopped node and it looks like this:
>
> Mode: JOINING
> Not sending any streams.
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
>
> When I try 'nodetool rebuild' it changes to the following but no steaming
> occurs.
>
> Mode: JOINING
> Rebuild 1df64590-4166-11ea-86a0-4b3cc5e92e4a
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
>
> I think this is related to the number of open file descriptors.
>
>
> Incoming Streaming Bytes went to zero after the number of open file
> descriptors reached the host's MAX (65536).
> Since then, the number of open file descriptors has decreased, but
> steaming has not resumed.
>
> And when I drop that joining process, it automatically was removed from
> the cluster.
>
> What should I do to add nodes to this data center in this case?
>
> Please advice.
>
> Thank you.
>


Re: How to read content of hints file and apply them manually?

2020-01-27 Thread Deepak Vohra
 
Surbhi,
The hints could be getting accumulated for one or both of the following reasons:
- Some node is becoming unavailable very routinely, which is unlikely- The 
hints are getting replayed very slowly due to network bandwidth issues, which 
is more likely
Increase the max_hint_window_in_ms setting in cassandra.yaml to more than 3 
hours, perhaps 6 hours. If the issue still persists networking may need to be 
tested for bandwidth issues.
regards,DeepakOn Tuesday, January 28, 2020, 01:01:51 a.m. UTC, Surbhi Gupta 
 wrote:  
 
 Why we think it might be related to hints is , because if we truncate the 
hints then load goes normal on the nodes.FYI , We had to run repair after 
truncating hints. 
Any thoughts ?

On Mon, 27 Jan 2020 at 15:27, Deepak Vohra  wrote:

 
Hints are a stopgap measure and not a fix to the underlying issue. Run a full 
repair.On Monday, January 27, 2020, 10:17:01 p.m. UTC, Surbhi Gupta 
 wrote:  
 
 Hi,
We are on Open source 3.11 .We have a issue in one of the cluster where lots of 
hints gets piled up and they don't get applied within hinted handoff period ( 3 
hour in our case) . And load and CPU of the server goes very high.We see lot of 
messages   in system.log and debug.log . Our read repair chance and 
dc_local_repair chance is 0.1 . Any pointers are welcome . 

ERROR [ReadRepairStage:83] 2020-01-27 13:08:43,695 CassandraDaemon.java:228 - 
Exception in thread Thread[ReadRepairStage:83,5,main]

org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
received only 0 responses.


DEBUG [ReadRepairStage:111] 2020-01-27 13:10:06,663 ReadCallback.java:242 - 
Digest mismatch:

org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
DecoratedKey(4759131696153881383, 9a21276d0af64de28d5d3023b69e) 
(142a55e1e28de7daa2ddc34a361

474a0 vs fcba30f022ef25f456914c341022963d)
  
  

Re: Uneven token distribution with allocate_tokens_for_keyspace

2020-01-27 Thread Anthony Grasso
Hi Leo,

The token assignment for each node in the cluster must be unique regardless
of the datacenter they are in. This is because the range of tokens
available to assign to nodes is per cluster. Token allocation is performed
per node at a global level. A datacenter helps define the way data is
replicated and has no influence on how tokens are assigned to nodes.

For example, if a new node is assigned one or more of the tokens already
owned by another node in the cluster, the new node will take ownership of
those tokens. This will happen regardless of which datacenter either node
is in.

Regards,
Anthony

On Sat, 25 Jan 2020 at 02:11, Léo FERLIN SUTTON 
wrote:

> Hi Anthony !
>
> I have a follow-up question :
>
> Check to make sure that no other node in the cluster is assigned any of
>> the four tokens specified above. If there is another node in the cluster
>> that is assigned one of the above tokens, increment the conflicting token
>> by values of one until no other node in the cluster is assigned that token
>> value. The idea is to make sure that these four tokens are unique to the
>> node.
>
>
> I don't understand this part of the process. Why do tokens conflict if the
> nodes owning them are in a different datacenter ?
>
> Regards,
>
> Leo
>
> On Thu, Dec 5, 2019 at 1:00 AM Anthony Grasso 
> wrote:
>
>> Hi Enrico,
>>
>> Glad to hear the problem has been resolved and thank you for the feedback!
>>
>> Kind regards,
>> Anthony
>>
>> On Mon, 2 Dec 2019 at 22:03, Enrico Cavallin 
>> wrote:
>>
>>> Hi Anthony,
>>> thank you for your hints, now the new DC is well balanced within 2%.
>>> I did read your article, but I thought it was needed only for new
>>> "clusters", not also for new "DCs"; but RF is per DC so it makes sense.
>>>
>>> You TLP guys are doing a great job for Cassandra community.
>>>
>>> Thank you,
>>> Enrico
>>>
>>>
>>> On Fri, 29 Nov 2019 at 05:09, Anthony Grasso 
>>> wrote:
>>>
 Hi Enrico,

 This is a classic chicken and egg problem with the
 allocate_tokens_for_keyspace setting.

 The allocate_tokens_for_keyspace setting uses the replication factor
 of a DC keyspace to calculate the token allocation when a node is added to
 the cluster for the first time.

 Nodes need to be added to the new DC before we can replicate the
 keyspace over to it. Herein lies the problem. We are unable to use
 allocate_tokens_for_keyspace unless the keyspace is replicated to the
 new DC. In addition, as soon as you change the keyspace replication to the
 new DC, new data will start to be written to it. To work around this issue
 you will need to do the following.

1. Decommission all the nodes in the *dcNew*, one at a time.
2. Once all the *dcNew* nodes are decommissioned, wipe the contents
in the *commitlog*, *data*, *saved_caches*, and *hints* directories
of these nodes.
3. Make the first node to add into the *dcNew* a seed node. Set the
seed list of the first node with its IP address and the IP addresses of 
 the
other seed nodes in the cluster.
4. Set the *initial_token* setting for the first node. You can
calculate the values using the algorithm in my blog post:

 https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html.
For convenience I have calculated them:
*-9223372036854775808,-4611686018427387904,0,4611686018427387904*.
Note, remove the *allocate_tokens_for_keyspace* setting from the
*cassandra.yaml* file for this (seed) node.
5. Check to make sure that no other node in the cluster is assigned
any of the four tokens specified above. If there is another node in the
cluster that is assigned one of the above tokens, increment the 
 conflicting
token by values of one until no other node in the cluster is assigned 
 that
token value. The idea is to make sure that these four tokens are unique 
 to
the node.
6. Add the seed node to cluster. Make sure it is listed in *dcNew *by
checking nodetool status.
7. Create a dummy keyspace in *dcNew* that has a replication factor
of 2.
8. Set the *allocate_tokens_for_keyspace* value to be the name of
the dummy keyspace for the other two nodes you want to add to
*dcNew*. Note remove the *initial_token* setting for these other
nodes.
9. Set *auto_bootstrap* to *false* for the other two nodes you want
to add to *dcNew*.
10. Add the other two nodes to the cluster, one at a time.
11. If you are happy with the distribution, copy the data to *dcNew*
by running a rebuild.


 Hope this helps.

 Regards,
 Anthony

 On Fri, 29 Nov 2019 at 02:08, Enrico Cavallin <
 cavallin.enr...@gmail.com> wrote:

> Hi all,
> I have an old datacenter with 4 nodes and 256 

Re: How to read content of hints file and apply them manually?

2020-01-27 Thread Surbhi Gupta
Why we think it might be related to hints is , because if we truncate the
hints then load goes normal on the nodes.
FYI , We had to run repair after truncating hints.
Any thoughts ?


On Mon, 27 Jan 2020 at 15:27, Deepak Vohra 
wrote:

>
> Hints are a stopgap measure and not a fix to the underlying issue. Run a
> full repair.
> On Monday, January 27, 2020, 10:17:01 p.m. UTC, Surbhi Gupta <
> surbhi.gupt...@gmail.com> wrote:
>
>
> Hi,
>
> We are on Open source 3.11 .
> We have a issue in one of the cluster where lots of hints gets piled up
> and they don't get applied within hinted handoff period ( 3 hour in our
> case) .
> And load and CPU of the server goes very high.
> We see lot of messages   in system.log and debug.log . Our read repair
> chance and dc_local_repair chance is 0.1 . Any pointers are welcome .
>
> ERROR [ReadRepairStage:83] 2020-01-27 13:08:43,695
> CassandraDaemon.java:228 - Exception in thread
> Thread[ReadRepairStage:83,5,main]
>
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out
> - received only 0 responses.
>
> DEBUG [ReadRepairStage:111] 2020-01-27 13:10:06,663 ReadCallback.java:242
> - Digest mismatch:
>
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
> DecoratedKey(4759131696153881383, 9a21276d0af64de28d5d3023b69e)
> (142a55e1e28de7daa2ddc34a361
>
> 474a0 vs fcba30f022ef25f456914c341022963d)
>


new node stops streaming..

2020-01-27 Thread Eunsu Kim
Hi experts

I had a problem adding a new node.

Joining node in datacenterA stops streaming while joining. So it keeps the UJ.
(datacenterB is fine.)

I try 'nodetool netstats' on a stopped node and it looks like this:

Mode: JOINING
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0

When I try 'nodetool rebuild' it changes to the following but no steaming 
occurs.

Mode: JOINING
Rebuild 1df64590-4166-11ea-86a0-4b3cc5e92e4a
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0

I think this is related to the number of open file descriptors.



Incoming Streaming Bytes went to zero after the number of open file descriptors 
reached the host's MAX (65536).
Since then, the number of open file descriptors has decreased, but steaming has 
not resumed.

And when I drop that joining process, it automatically was removed from the 
cluster.

What should I do to add nodes to this data center in this case?

Please advice.

Thank you.

Re: How to read content of hints file and apply them manually?

2020-01-27 Thread Deepak Vohra
 
Hints are a stopgap measure and not a fix to the underlying issue. Run a full 
repair.On Monday, January 27, 2020, 10:17:01 p.m. UTC, Surbhi Gupta 
 wrote:  
 
 Hi,
We are on Open source 3.11 .We have a issue in one of the cluster where lots of 
hints gets piled up and they don't get applied within hinted handoff period ( 3 
hour in our case) . And load and CPU of the server goes very high.We see lot of 
messages   in system.log and debug.log . Our read repair chance and 
dc_local_repair chance is 0.1 . Any pointers are welcome . 

ERROR [ReadRepairStage:83] 2020-01-27 13:08:43,695 CassandraDaemon.java:228 - 
Exception in thread Thread[ReadRepairStage:83,5,main]

org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
received only 0 responses.


DEBUG [ReadRepairStage:111] 2020-01-27 13:10:06,663 ReadCallback.java:242 - 
Digest mismatch:

org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
DecoratedKey(4759131696153881383, 9a21276d0af64de28d5d3023b69e) 
(142a55e1e28de7daa2ddc34a361

474a0 vs fcba30f022ef25f456914c341022963d)
  

Re: How to read content of hints file and apply them manually?

2020-01-27 Thread Erick Ramirez
There isn't a tool that I'm aware of that's readily available to do that.
Your best bet is to run a regular repair.

But really, hints are just a side-issue of a much wider problem and that is
the nodes are overloaded. Is your application getting hit with a much
higher than expected traffic? The screenshots you posted show that even
read-repairs aren't getting responses from replicas. You should really
address the overload issue. Cheers!

>


How to read content of hints file and apply them manually?

2020-01-27 Thread Surbhi Gupta
Hi,

We are on Open source 3.11 .
We have a issue in one of the cluster where lots of hints gets piled up and
they don't get applied within hinted handoff period ( 3 hour in our case) .
And load and CPU of the server goes very high.
We see lot of messages   in system.log and debug.log . Our read repair
chance and dc_local_repair chance is 0.1 . Any pointers are welcome .

ERROR [ReadRepairStage:83] 2020-01-27 13:08:43,695 CassandraDaemon.java:228
- Exception in thread Thread[ReadRepairStage:83,5,main]

org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
received only 0 responses.

DEBUG [ReadRepairStage:111] 2020-01-27 13:10:06,663 ReadCallback.java:242 -
Digest mismatch:

org.apache.cassandra.service.DigestMismatchException: Mismatch for key
DecoratedKey(4759131696153881383, 9a21276d0af64de28d5d3023b69e)
(142a55e1e28de7daa2ddc34a361

474a0 vs fcba30f022ef25f456914c341022963d)


Re: [EXTERNAL] Re: sstableloader & num_tokens change

2020-01-27 Thread Voytek Jarnot
Odd. Have you seen this behavior? I ran a test last week, loaded snapshots
from 4 nodes to 4 nodes (RF 3 on both ends) and did not notice a spike.
That's not to say that it didn't happen, but I think I'd have noticed as I
was loading approx 250GB x 4 (although sequentially rather than 4x
sstableloader in parallel).

Also, thanks to everyone for confirming no issue with num_tokens and
sstableloader; appreciate it.


On Mon, Jan 27, 2020 at 9:02 AM Durity, Sean R 
wrote:

> I would suggest to be aware of potential data size expansion. If you load
> (for example) three copies of the data into a new cluster (because the RF
> of the origin cluster is 3), it will also get written to the RF of the new
> cluster (3 more times). So, you could see data expansion of 9x the original
> data size (or, origin RF * target RF), until compaction can run.
>
>
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* Erick Ramirez 
> *Sent:* Friday, January 24, 2020 11:03 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: sstableloader & num_tokens change
>
>
>
>
>
> If I may just loop this back to the question at hand:
>
> I'm curious if there are any gotchas with using sstableloader to restore
> snapshots taken from 256-token nodes into a cluster with 32-token (or your
> preferred number of tokens) nodes (otherwise same # of nodes and same RF).
>
>
>
> No, there isn't. It will work as designed so you're good to go. Cheers!
>
>
>
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


RE: [EXTERNAL] Re: sstableloader & num_tokens change

2020-01-27 Thread Durity, Sean R
I would suggest to be aware of potential data size expansion. If you load (for 
example) three copies of the data into a new cluster (because the RF of the 
origin cluster is 3), it will also get written to the RF of the new cluster (3 
more times). So, you could see data expansion of 9x the original data size (or, 
origin RF * target RF), until compaction can run.


Sean Durity – Staff Systems Engineer, Cassandra

From: Erick Ramirez 
Sent: Friday, January 24, 2020 11:03 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: sstableloader & num_tokens change


If I may just loop this back to the question at hand:

I'm curious if there are any gotchas with using sstableloader to restore 
snapshots taken from 256-token nodes into a cluster with 32-token (or your 
preferred number of tokens) nodes (otherwise same # of nodes and same RF).

No, there isn't. It will work as designed so you're good to go. Cheers!





The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: sstableloader & num_tokens change

2020-01-27 Thread Jean Carlo
Hello

Concerning the original question, I agreed with @eric_ramirez,
sstableloader is transparent for token allocation number.

just for info @voytek, check this post out
https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
You lay be interested to now if you have your cluster well balanced with 32
tokens. 32 tokens seems to be the future default value, but changing the
default vnodes token numbers seems not to be so straight forward

cheers

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


On Sat, Jan 25, 2020 at 5:05 AM Erick Ramirez  wrote:

> On the subject of DSBulk, sstableloader is the tool of choice for this
> scenario.
>
> +1 to Sergio and I'm confirming that DSBulk is designed as a bulk loader
> for CSV/JSON formats. Cheers!
>