How to avoid flush if the data can fit into memtable

2017-05-24 Thread preetika tyagi
Hi,

I'm running Cassandra with a very small dataset so that the data can exist
on memtable only. Below are my configurations:

In jvm.options:

-Xms4G
-Xmx4G

In cassandra.yaml,

memtable_cleanup_threshold: 0.50
memtable_allocation_type: heap_buffers

As per the documentation in cassandra.yaml, the *memtable_heap_space_in_mb*
 and *memtable_heap_space_in_mb* will be set of 1/4 of heap size i.e. 1000MB

According to the documentation here (
http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__memtable_cleanup_threshold),
the memtable flush will trigger if the total size of memtabl(s) goes beyond
(1000+1000)*0.50=1000MB.

Now if I perform several write requests which results in almost ~300MB of
the data, memtable still gets flushed since I see sstables being created on
file system (Data.db etc.) and I don't understand why.

Could anyone explain this behavior and point out if I'm missing something
here?

Thanks,

Preetika


Re: Impact on latency with larger memtable

2017-05-24 Thread preetika tyagi
Thank you all for the response. I figured out the root cause.
I thought all my data was in memtable only but the data was actually being
dumped to the disk. That's why I was noticing the drop in throughput.

On Wed, May 24, 2017 at 9:42 AM, daemeon reiydelle 
wrote:

> You speak of increase. Please provide your results. Specific examples, Eg
> 25% increase results in n% increase. Also please include number of nodes,
> size of total keyspace, rep factor, etc.
>
> Hopefully this is a 6 node cluster with several hundred gig per keyspace,
> not some single node free tier box.
>
> “All men dream, but not equally. Those who dream by night in the dusty
> recesses of their minds wake up in the day to find it was vanity, but the
> dreamers of the day are dangerous men, for they may act their dreams with
> open eyes, to make it possible.” — T.E. Lawrence
>
> sent from my mobile
> Daemeon Reiydelle
> skype daemeon.c.m.reiydelle
> USA 415.501.0198 <(415)%20501-0198>
>
> On May 24, 2017 9:32 AM, "preetika tyagi"  wrote:
>
>> Hi,
>>
>> I'm experimenting with memtable/heap size on my Cassandra server to
>> understand how it impacts the latency/throughput for read requests.
>>
>> I vary heap size (Xms and -Xmx) in jvm.options so memtable will be 1/4 of
>> this. When I increase the heap size and hence memtable, I notice the drop
>> in throughput and increase in latency. I'm also creating the database such
>> that its size doesn't exceed the size of memtable. Therefore, all data
>> exist in memtable and I'm not able to reason why bigger size of memtable is
>> resulting into higher latency/low throughput.
>>
>> Since everything is DRAM, shouldn't the throughput/latency remain same in
>> all the cases?
>>
>> Thanks,
>> Preetika
>>
>


Re: Replication issue with Multi DC setup in cassandra

2017-05-24 Thread daemeon reiydelle
Cqlsh looks at the cluster, not node

“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence

sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On May 16, 2017 2:42 PM, "suraj pasuparthy" 
wrote:

> So i though the same,
> I see the data via the CQLSH in both the datacenters. consistency is set
> to LQ
>
> thanks
> -Suraj
>
> On Tue, May 16, 2017 at 2:19 PM, Nitan Kainth  wrote:
>
>> Do you see data on other DC or just directory structure? Directory
>> structure would populate because it is DDL but inserts shouldn’t populate,
>> ideally.
>>
>> On May 16, 2017, at 3:19 PM, suraj pasuparthy 
>> wrote:
>>
>> elp me fig
>>
>>
>>
>
>
> --
> Suraj Pasuparthy
>
> cisco systems
> Software Engineer
> San Jose CA
>


Re: Replication issue with Multi DC setup in cassandra

2017-05-24 Thread Arvydas Jonusonis
Run *nodetool cleanup* on the *4.4.4.5* DC node(s). Changing network
topology does not *remove* data - it's a manual task.

But it should prevent it from replicating over to the undesired DC.

Also make sure your LoadBalancingStrategy is set to DCAwareRoundRobinPolicy,
with *4.4.4.4* DC set as the *local* DC.

Arvydas

On Wed, May 24, 2017 at 9:46 PM, daemeon reiydelle 
wrote:

> May I inquire if your configuration is actually data center aware? Do you
> understand the difference between LQ and replication?
>
>
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>
>
> *“All men dream, but not equally. Those who dream by night in the dusty
> recesses of their minds wake up in the day to find it was vanity, but the
> dreamers of the day are dangerous men, for they may act their dreams with
> open eyes, to make it possible.” — T.E. Lawrence*
>
>
> On Wed, May 24, 2017 at 12:03 PM, Igor Leão  wrote:
>
>> Did you run `nodetool repair` after changing the keyspace? (not sure if
>> it makes sense though)
>>
>> 2017-05-16 19:52 GMT-03:00 Nitan Kainth :
>>
>>> Strange. Anybody else might share something more important.
>>>
>>> Sent from my iPhone
>>>
>>> On May 16, 2017, at 5:23 PM, suraj pasuparthy <
>>> suraj.pasupar...@gmail.com> wrote:
>>>
>>> Yes is see them in the datacenter's data directories.. infact i see then
>>> even after i bring down the interface between the 2 DC's which further
>>> confirms that a local copy is maintained in the DC that was not configured
>>> in the strategy ..
>>> its quite important that we block the info for this keyspace from
>>> replicating :(.. not sure why this does not work
>>>
>>> Thanks
>>> Suraj
>>>
>>> On Tue, May 16, 2017 at 3:06 PM Nitan Kainth  wrote:
>>>
 check for datafiles on filesystem in both DCs.

 On May 16, 2017, at 4:42 PM, suraj pasuparthy <
 suraj.pasupar...@gmail.com> wrote:

 So i though the same,
 I see the data via the CQLSH in both the datacenters. consistency is
 set to LQ

 thanks
 -Suraj

 On Tue, May 16, 2017 at 2:19 PM, Nitan Kainth 
 wrote:

> Do you see data on other DC or just directory structure? Directory
> structure would populate because it is DDL but inserts shouldn’t populate,
> ideally.
>
> On May 16, 2017, at 3:19 PM, suraj pasuparthy <
> suraj.pasupar...@gmail.com> wrote:
>
> elp me fig
>
>
>


 --
 Suraj Pasuparthy

 cisco systems
 Software Engineer
 San Jose CA






>>
>>
>> --
>> Igor Leão  Site Reliability Engineer
>>
>> Mobile: +55 81 99727-1083 
>> Skype: *igorvpcleao*
>> Office: +55 81 4042-9757 
>> Website: inlocomedia.com 
>> [image: inlocomedia]
>> 
>>  [image: LinkedIn]
>> 
>>  [image: Facebook]  [image:
>> Twitter]
>> 
>>
>>
>>
>>
>>
>>
>>
>


Re: Replication issue with Multi DC setup in cassandra

2017-05-24 Thread daemeon reiydelle
May I inquire if your configuration is actually data center aware? Do you
understand the difference between LQ and replication?





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


*“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence*


On Wed, May 24, 2017 at 12:03 PM, Igor Leão  wrote:

> Did you run `nodetool repair` after changing the keyspace? (not sure if it
> makes sense though)
>
> 2017-05-16 19:52 GMT-03:00 Nitan Kainth :
>
>> Strange. Anybody else might share something more important.
>>
>> Sent from my iPhone
>>
>> On May 16, 2017, at 5:23 PM, suraj pasuparthy 
>> wrote:
>>
>> Yes is see them in the datacenter's data directories.. infact i see then
>> even after i bring down the interface between the 2 DC's which further
>> confirms that a local copy is maintained in the DC that was not configured
>> in the strategy ..
>> its quite important that we block the info for this keyspace from
>> replicating :(.. not sure why this does not work
>>
>> Thanks
>> Suraj
>>
>> On Tue, May 16, 2017 at 3:06 PM Nitan Kainth  wrote:
>>
>>> check for datafiles on filesystem in both DCs.
>>>
>>> On May 16, 2017, at 4:42 PM, suraj pasuparthy <
>>> suraj.pasupar...@gmail.com> wrote:
>>>
>>> So i though the same,
>>> I see the data via the CQLSH in both the datacenters. consistency is set
>>> to LQ
>>>
>>> thanks
>>> -Suraj
>>>
>>> On Tue, May 16, 2017 at 2:19 PM, Nitan Kainth  wrote:
>>>
 Do you see data on other DC or just directory structure? Directory
 structure would populate because it is DDL but inserts shouldn’t populate,
 ideally.

 On May 16, 2017, at 3:19 PM, suraj pasuparthy <
 suraj.pasupar...@gmail.com> wrote:

 elp me fig



>>>
>>>
>>> --
>>> Suraj Pasuparthy
>>>
>>> cisco systems
>>> Software Engineer
>>> San Jose CA
>>>
>>>
>>>
>>>
>>>
>>>
>
>
> --
> Igor Leão  Site Reliability Engineer
>
> Mobile: +55 81 99727-1083 
> Skype: *igorvpcleao*
> Office: +55 81 4042-9757 
> Website: inlocomedia.com 
> [image: inlocomedia]
> 
>  [image: LinkedIn]
> 
>  [image: Facebook]  [image: Twitter]
> 
>
>
>
>
>
>
>


Re: Replication issue with Multi DC setup in cassandra

2017-05-24 Thread Igor Leão
Did you run `nodetool repair` after changing the keyspace? (not sure if it
makes sense though)

2017-05-16 19:52 GMT-03:00 Nitan Kainth :

> Strange. Anybody else might share something more important.
>
> Sent from my iPhone
>
> On May 16, 2017, at 5:23 PM, suraj pasuparthy 
> wrote:
>
> Yes is see them in the datacenter's data directories.. infact i see then
> even after i bring down the interface between the 2 DC's which further
> confirms that a local copy is maintained in the DC that was not configured
> in the strategy ..
> its quite important that we block the info for this keyspace from
> replicating :(.. not sure why this does not work
>
> Thanks
> Suraj
>
> On Tue, May 16, 2017 at 3:06 PM Nitan Kainth  wrote:
>
>> check for datafiles on filesystem in both DCs.
>>
>> On May 16, 2017, at 4:42 PM, suraj pasuparthy 
>> wrote:
>>
>> So i though the same,
>> I see the data via the CQLSH in both the datacenters. consistency is set
>> to LQ
>>
>> thanks
>> -Suraj
>>
>> On Tue, May 16, 2017 at 2:19 PM, Nitan Kainth  wrote:
>>
>>> Do you see data on other DC or just directory structure? Directory
>>> structure would populate because it is DDL but inserts shouldn’t populate,
>>> ideally.
>>>
>>> On May 16, 2017, at 3:19 PM, suraj pasuparthy <
>>> suraj.pasupar...@gmail.com> wrote:
>>>
>>> elp me fig
>>>
>>>
>>>
>>
>>
>> --
>> Suraj Pasuparthy
>>
>> cisco systems
>> Software Engineer
>> San Jose CA
>>
>>
>>
>>
>>
>>


-- 
Igor Leão  Site Reliability Engineer

Mobile: +55 81 99727-1083 
Skype: *igorvpcleao*
Office: +55 81 4042-9757 
Website: inlocomedia.com 
[image: inlocomedia]

 [image: LinkedIn]

 [image: Facebook]  [image: Twitter]



Re: Impact on latency with larger memtable

2017-05-24 Thread daemeon reiydelle
You speak of increase. Please provide your results. Specific examples, Eg
25% increase results in n% increase. Also please include number of nodes,
size of total keyspace, rep factor, etc.

Hopefully this is a 6 node cluster with several hundred gig per keyspace,
not some single node free tier box.

“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence

sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On May 24, 2017 9:32 AM, "preetika tyagi"  wrote:

> Hi,
>
> I'm experimenting with memtable/heap size on my Cassandra server to
> understand how it impacts the latency/throughput for read requests.
>
> I vary heap size (Xms and -Xmx) in jvm.options so memtable will be 1/4 of
> this. When I increase the heap size and hence memtable, I notice the drop
> in throughput and increase in latency. I'm also creating the database such
> that its size doesn't exceed the size of memtable. Therefore, all data
> exist in memtable and I'm not able to reason why bigger size of memtable is
> resulting into higher latency/low throughput.
>
> Since everything is DRAM, shouldn't the throughput/latency remain same in
> all the cases?
>
> Thanks,
> Preetika
>


Re: Impact on latency with larger memtable

2017-05-24 Thread Nitan Kainth
Larger memtable means mor time during flushes and larger heap means longer GC 
pauses. You can see these in system log

Sent from my iPhone

> On May 24, 2017, at 11:31 AM, preetika tyagi  wrote:
> 
> Hi,
> 
> I'm experimenting with memtable/heap size on my Cassandra server to 
> understand how it impacts the latency/throughput for read requests.
> 
> I vary heap size (Xms and -Xmx) in jvm.options so memtable will be 1/4 of 
> this. When I increase the heap size and hence memtable, I notice the drop in 
> throughput and increase in latency. I'm also creating the database such that 
> its size doesn't exceed the size of memtable. Therefore, all data exist in 
> memtable and I'm not able to reason why bigger size of memtable is resulting 
> into higher latency/low throughput.
> 
> Since everything is DRAM, shouldn't the throughput/latency remain same in all 
> the cases?
> 
> Thanks,
> Preetika

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Impact on latency with larger memtable

2017-05-24 Thread preetika tyagi
Hi,

I'm experimenting with memtable/heap size on my Cassandra server to
understand how it impacts the latency/throughput for read requests.

I vary heap size (Xms and -Xmx) in jvm.options so memtable will be 1/4 of
this. When I increase the heap size and hence memtable, I notice the drop
in throughput and increase in latency. I'm also creating the database such
that its size doesn't exceed the size of memtable. Therefore, all data
exist in memtable and I'm not able to reason why bigger size of memtable is
resulting into higher latency/low throughput.

Since everything is DRAM, shouldn't the throughput/latency remain same in
all the cases?

Thanks,
Preetika


Re: How to find dataSize at client side?

2017-05-24 Thread Nicolas Guyomar
Hi,

The list is opened :
https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user,
feel free to subscribe.

Datastax is the main maintainer of the java driver, which is open source (
https://github.com/datastax/java-driver ) , which is not the same driver as
the DSE one : https://github.com/datastax/java-dse-driver



On 24 May 2017 at 10:53, techpyaasa .  wrote:

> Hi Nicolas
>
> I think only DataStax Enterprise(paid) c* version can ask questions/get
> support from datastax :(
>
> On Tue, May 23, 2017 at 9:44 PM, techpyaasa . 
> wrote:
>
>> Thanks for your reply..
>>
>> On Tue, May 23, 2017 at 7:40 PM, Nicolas Guyomar <
>> nicolas.guyo...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> If you were to know the batch size on client side to make sure it does
>>> not get above the 5kb limit, so that you can "limit the number of
>>> statements in a batch", I would suspect you do not need batch at all right
>>> ? See  https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-
>>> to-batch/
>>>
>>> As for your question, you might get an answer on the java driver ML :
>>> java-driver-u...@lists.datastax.com
>>>
>>>
>>> On 23 May 2017 at 15:25, techpyaasa .  wrote:
>>>

 * WARN [SharedPool-Worker-1] 2017-05-22 20:28:46,204
 BatchStatement.java (line 253) Batch of prepared statements for
 [site24x7.wm_rawstats_tb, site24x7.wm_rawstats] is of size 6122, exceeding
 specified threshold of 5120 by 1002*
 We are frequently getting this message in logs, so I wanted to restrict
 inserts at client side by calculating *dataSize* of insert/batch
 statements before sending it to c* servers.

 We are using datastax java drivers , how can I get dataSize here??


 Any ideas??

 Thanks in advance
 TechPyaasa

>>>
>>>
>>
>


Re: How to find dataSize at client side?

2017-05-24 Thread techpyaasa .
Hi Nicolas

I think only DataStax Enterprise(paid) c* version can ask questions/get
support from datastax :(

On Tue, May 23, 2017 at 9:44 PM, techpyaasa .  wrote:

> Thanks for your reply..
>
> On Tue, May 23, 2017 at 7:40 PM, Nicolas Guyomar <
> nicolas.guyo...@gmail.com> wrote:
>
>> Hi,
>>
>> If you were to know the batch size on client side to make sure it does
>> not get above the 5kb limit, so that you can "limit the number of
>> statements in a batch", I would suspect you do not need batch at all right
>> ? See  https://inoio.de/blog/2016/01/13/cassandra-to-batch-or-not-
>> to-batch/
>>
>> As for your question, you might get an answer on the java driver ML :
>> java-driver-u...@lists.datastax.com
>>
>>
>> On 23 May 2017 at 15:25, techpyaasa .  wrote:
>>
>>>
>>> * WARN [SharedPool-Worker-1] 2017-05-22 20:28:46,204 BatchStatement.java
>>> (line 253) Batch of prepared statements for [site24x7.wm_rawstats_tb,
>>> site24x7.wm_rawstats] is of size 6122, exceeding specified threshold of
>>> 5120 by 1002*
>>> We are frequently getting this message in logs, so I wanted to restrict
>>> inserts at client side by calculating *dataSize* of insert/batch
>>> statements before sending it to c* servers.
>>>
>>> We are using datastax java drivers , how can I get dataSize here??
>>>
>>>
>>> Any ideas??
>>>
>>> Thanks in advance
>>> TechPyaasa
>>>
>>
>>
>


Re: Slowness in C* cluster after implementing multiple network interface configuration.

2017-05-24 Thread Carlos Rolo
It might be a bug.
Cassandra, AFAIK, scans those files for changes and updates the topology
(So you don't need a restart if you change the files). It might be the case
that the absence of the file, is still noticed by Cassandra even if it is
not really used.

I can do a small test to confirm, if so, it is a question of "expected
behaviour" (as in, always leave the file there) vs Bug (It shouldn't care
for files it doesn't use).

If you can always reproduce, feel free to Open a JIRA.

Thanks for the description.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant / Datastax Certified Architect / Cassandra MVP

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
*linkedin.com/in/carlosjuzarterolo
*
Mobile: +351 918 918 100
www.pythian.com

On Wed, May 24, 2017 at 8:12 AM, Prakash Chauhan <
prakash.chau...@ericsson.com> wrote:

> Hi All,
>
>
>
> We have a new observation.
>
>
>
> Earlier for implementing multiple network interfaces, we were deleting
> *cassandra-topologies.properties* in the last step (Steps are mentioned
> in mail trail).
>
> The rationale was that because we are using altogether a new
> endpoint_snitch , we don’t require cassandra-topologies.properties file
> anymore.
>
>
>
> Now we have observed that if we don’t delete cassandra-topologies.properties,
> the slowness is not there in the cluster (Even with multiple restarts)
>
>
>
> Is there some relationship between *GossipingPropertyFileSnitch* and
> *cassandra-topologies.properties* ?
>
>
>
> As per my knowledge,  *cassandra-topologies.properties* file is only used
> as a fallback while doing snitch migration. If that’s the case, why does
> Cassandra becomes slow with time ( and after doing multiple restarts )
> after deleting cassandra-topologies.properties ?
>
>
>
>
>
>
>
>
>
> Regards,
>
> Prakash Chauhan.
>
>
>
> *From:* Cogumelos Maravilha [mailto:cogumelosmaravi...@sapo.pt]
> *Sent:* Wednesday, May 24, 2017 12:15 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Slowness in C* cluster after implementing multiple network
> interface configuration.
>
>
>
> Hi,
>
> I never used version 2.0.x but I think port 7000 isn't enough.
>
> Try enable:
>
> 7000 inter-node
>
> 7001 SSL inter-node
>
> 9042 CQL
>
> 9160 Thrift is enable in that version
>
>
>
> And
>
> In Cassandra.yaml, add property “broadcast_address”.  = local ipv4
>
> In Cassandra.yaml, change “listen_address” to private IP. = local ipv4
>
>
>
> As a starting point.
>
>
>
> Cheers.
>
>
>
> On 22-05-2017 12:36, Prakash Chauhan wrote:
>
> Hi All ,
>
>
>
> Need Help !!!
>
>
>
> *Setup Details:*
>
> Cassandra 2.0.14
>
> Geo Red setup
>
> · DC1 - 3 nodes
>
> · DC2 - 3 nodes
>
>
>
>
>
> We were trying to implement multiple network interfaces with Cassandra
> 2.0.14
>
> After doing all the steps mentioned in DataStax doc
> http://docs.datastax.com/en/archived/cassandra/2.0/
> cassandra/configuration/configMultiNetworks.html, we observed that nodes
> were not able to see each other (checked using nodetool status).
>
>
>
> To resolve this issue, we followed the comment
> 
> mentioned in the JIRA : CASSANDRA-9748
> 
>
>
>
> Exact steps that we followed are :
>
> 
>
> *1.   *Stop Cassandra
>
> *2.   *Add rule to “iptables” to forward all packets on the public
> interface to the private interface.
>
>
>
> COMMAND: # iptables -t nat -A PREROUTING -p tcp -m tcp -d 
> --dport 7000 -j DNAT --to-destination :7000
>
>
>
> *3.   *In Cassandra.yaml, add property “broadcast_address”.
>
> *4.   *In Cassandra.yaml, change “listen_address” to private IP.
>
> *5.   *Clear the data from directory “peers”.
>
> *6.   *Change Snitch to GossipingPropertyFileSnitch.
>
> *7.   *Append following property to the file 
> “/etc/cassandra/conf/cassandra-env.sh”
> to purge gossip state.
>
> JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false"
>
>
>
> *8.   *Start Cassandra
>
> *9.   *After node has been started, remove following property from
> the file “/etc/cassandra/conf/cassandra-env.sh” (previously added in step
> 7)
>
> JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false"
>
> *10.   *Delete file “/etc/cassandra/conf/cassandra-topology.properties”
>
>
>
>
>
> Now We have an observation that after multiple restarts of Cassandra on
> multiple nodes, slowness is observed in the cluster.
>
> The problem gets resolved when we revert the steps mentioned above.
>
>
>
> *Do u think there is any step that can cause the problem ?*
>
> We are suspecting Step 2(iptable rule) but not very sure about it.
>
>
>
>
>
> Regards,
>
> Prakash Chauhan.
>
>
>

-- 


--





RE: Slowness in C* cluster after implementing multiple network interface configuration.

2017-05-24 Thread Prakash Chauhan
Hi All,

We have a new observation.

Earlier for implementing multiple network interfaces, we were deleting 
cassandra-topologies.properties in the last step (Steps are mentioned in mail 
trail).
The rationale was that because we are using altogether a new endpoint_snitch , 
we don't require cassandra-topologies.properties file anymore.

Now we have observed that if we don't delete cassandra-topologies.properties, 
the slowness is not there in the cluster (Even with multiple restarts)

Is there some relationship between GossipingPropertyFileSnitch and 
cassandra-topologies.properties ?

As per my knowledge,  cassandra-topologies.properties file is only used as a 
fallback while doing snitch migration. If that's the case, why does Cassandra 
becomes slow with time ( and after doing multiple restarts ) after deleting 
cassandra-topologies.properties ?




Regards,
Prakash Chauhan.

From: Cogumelos Maravilha [mailto:cogumelosmaravi...@sapo.pt]
Sent: Wednesday, May 24, 2017 12:15 AM
To: user@cassandra.apache.org
Subject: Re: Slowness in C* cluster after implementing multiple network 
interface configuration.


Hi,

I never used version 2.0.x but I think port 7000 isn't enough.

Try enable:

7000 inter-node

7001 SSL inter-node

9042 CQL

9160 Thrift is enable in that version



And

In Cassandra.yaml, add property "broadcast_address".  = local ipv4

In Cassandra.yaml, change "listen_address" to private IP. = local ipv4



As a starting point.



Cheers.

On 22-05-2017 12:36, Prakash Chauhan wrote:
Hi All ,

Need Help !!!

Setup Details:
Cassandra 2.0.14
Geo Red setup

* DC1 - 3 nodes

* DC2 - 3 nodes


We were trying to implement multiple network interfaces with Cassandra 2.0.14
After doing all the steps mentioned in DataStax doc 
http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/configuration/configMultiNetworks.html,
 we observed that nodes were not able to see each other (checked using nodetool 
status).

To resolve this issue, we followed the 
comment
 mentioned in the JIRA : 
CASSANDRA-9748

Exact steps that we followed are :


1.   Stop Cassandra

2.   Add rule to "iptables" to forward all packets on the public interface 
to the private interface.


COMMAND: # iptables -t nat -A PREROUTING -p tcp -m tcp -d  --dport 
7000 -j DNAT --to-destination :7000



3.   In Cassandra.yaml, add property "broadcast_address".

4.   In Cassandra.yaml, change "listen_address" to private IP.

5.   Clear the data from directory "peers".

6.   Change Snitch to GossipingPropertyFileSnitch.

7.   Append following property to the file 
"/etc/cassandra/conf/cassandra-env.sh" to purge gossip state.

JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false"



8.   Start Cassandra

9.   After node has been started, remove following property from the file 
"/etc/cassandra/conf/cassandra-env.sh" (previously added in step 7)

JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false"

10.   Delete file "/etc/cassandra/conf/cassandra-topology.properties"


Now We have an observation that after multiple restarts of Cassandra on 
multiple nodes, slowness is observed in the cluster.
The problem gets resolved when we revert the steps mentioned above.

Do u think there is any step that can cause the problem ?
We are suspecting Step 2(iptable rule) but not very sure about it.


Regards,
Prakash Chauhan.



Re: EC2 instance recommendations

2017-05-24 Thread Cogumelos Maravilha
Exactly.


On 23-05-2017 23:55, Gopal, Dhruva wrote:
>
> By that do you mean it’s like bootstrapping a node if it fails or is
> shutdown and with a RF that is 2 or higher, data will get replicated
> when it’s brought up?
>
>  
>
> *From: *Cogumelos Maravilha 
> *Date: *Tuesday, May 23, 2017 at 1:52 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Re: EC2 instance recommendations
>
>  
>
> Yes we can only reboot.
>
> But using rf=2 or higher it's only a node fresh restart.
>
> EBS is a network attached disk. Spinning disk or SSD is almost the same.
>
> It's better take the "risk" and use type i instances.
>
> Cheers.
>
>  
>
> On 23-05-2017 21:39, sfesc...@gmail.com  wrote:
>
> I think this is overstating it. If the instance ever stops you'll
> lose the data. That means if the server crashes for example, or if
> Amazon decides your instance requires maintenance.
>
>  
>
> On Tue, May 23, 2017 at 10:30 AM Gopal, Dhruva
> > wrote:
>
> Thanks! So, I assume that as long we make sure we never
> explicitly “shutdown” the instance, we are good. Are you also
> saying we won’t be able to snapshot a directory with ephemeral
> storage and that is why EBS is better? We’re just finding that
> to get a reasonable amount of IOPS (gp2) out of EBS at a
> reasonable rate, it gets more expensive than an I3.
>
>  
>
> *From: *Jonathan Haddad  >
> *Date: *Tuesday, May 23, 2017 at 9:42 AM
> *To: *"Gopal, Dhruva" 
> , Matija Gobec
> >, Bhuvan
> Rawal >
> *Cc: *"user@cassandra.apache.org
> "  >
>
>
> *Subject: *Re: EC2 instance recommendations
>
>  
>
> > Oh, so all the data is lost if the instance is shutdown or
> restarted (for that instance)? 
>
>  
>
> When you restart the OS, you're technically not shutting down
> the instance.  As long as the instance isn't stopped /
> terminated, your data is fine.  I ran my databases on
> ephemeral storage for years without issue.  In general,
> ephemeral storage is going to give you lower latency since
> there's no network overhead.  EBS is generally cheaper than
> ephemeral, is persistent, and you can take snapshots easily.
>
>  
>
> On Tue, May 23, 2017 at 9:35 AM Gopal, Dhruva
> > wrote:
>
> Oh, so all the data is lost if the instance is shutdown or
> restarted (for that instance)? If we take a naïve approach
> to backing up the directory, and restoring it, if we ever
> have to bring down the instance and back up, will that
> work as a strategy? Data is only kept around for 2 days
> and is TTL’d after.
>
>  
>
> *From: *Matija Gobec  >
> *Date: *Tuesday, May 23, 2017 at 8:15 AM
> *To: *Bhuvan Rawal  >
> *Cc: *"Gopal, Dhruva" 
> ,
> "user@cassandra.apache.org
> "
> >
> *Subject: *Re: EC2 instance recommendations
>
>  
>
> We are running on I3s since they came out. NVMe SSDs are
> really fast and I managed to push them to 75k IOPs.
>
> As Bhuvan mentioned the i3 storage is ephemeral. If you
> can work around it and plan for failure recovery you are
> good to go.
>
>  
>
> I ran Cassandra on m4s before and had no problems with EBS
> volumes (gp2) even in low latency use cases. With the cost
> of M4 instances and EBS volumes that make sense in IOPs, I
> would recommend going with more i3s and working around the
> ephemeral issue (if its an issue).
>
>  
>
> Best,
>
> Matija
>
> On Tue, May 23, 2017 at 2:13 AM, Bhuvan Rawal
> > wrote:
>
> i3 instances will undoubtedly give you more meat for
> buck - easily 40K+ iops whereas on the other hand EBS
> maxes out at 20K 

Re: memtable_allocation_type on Cassandra 2.1.x

2017-05-24 Thread Akhil Mehra
Hi Varun,

Look at the recommendation for offheap_objects and memtable flush writers
and readers in the following guide
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html. In the
guide and cassandra.yaml the default is suggested as a good starting point.

If you want to use the default just omit the setting
of memtable_heap_space_in_mb instead of setting it to 0. Note in newer
versions of Cassandra setting the memtable_cleanup_threshold is deprecated.
The default value is said to be the only reasonable setting.

Regards,
Akhil


On Wed, May 24, 2017 at 1:41 PM, varun saluja  wrote:

> Thanks Akhil for response.
>
> I have set memtable_allocation_type as Off-heap. But cassandra 2.1.x does
> not allow to set *memtable_heap_space_in_mb: 0.*
>
> It mentions , we need to assign some positive value to heap space. In such
> case, will memtable still use jvm heap space.
>
> Can anyone suggest below parameters.
>
> memtable_flush_writers:
> memtable_cleanup_threshold:
>
>
> PS : We have high write intensive workload . 5Node cluster (12 Core , 62GB
> RAM and flash disk per node)
>
>
> Regards,
> Varun Saluja
>
> On 23 May 2017 at 03:26, Akhil Mehra  wrote:
>
>> I believe off-heap storage was reintroduced in 3.4 (
>> https://issues.apache.org/jira/browse/CASSANDRA-9472). It was removed
>> from 3.0 due to the refactoring of the storage engine.  Check out
>> http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1 to
>> get an overview of the pros and cons of using off-heap storage.
>>
>> Regards,
>> Akhil Mehra
>>
>>
>> On Tue, May 23, 2017 at 12:32 AM, varun saluja 
>> wrote:
>>
>>> Hi Experts,
>>>
>>> I have some concerns regarding memtable parameters for my current
>>> version. 2.1.8.
>>>
>>> As per documentation , its mentioned to have Off-heap memtables in
>>> Cassandra 2.1 . And in releases 3.2.0 and 3.2.1, the only option that works
>>> is: heap-buffers.
>>> Can you Please suggest what value should be use for below paramteres in
>>> 2.1.x
>>> memtable_allocation_type :
>>>
>>> Regards,
>>> Varun Saluja
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
>>
>