Re: Reason for Trace Message Drop

2016-06-16 Thread Varun Barala
Thanks Eric Stevens for your detailed reply!! I got your points. On Thu, Jun 16, 2016 at 11:49 PM, Eric Stevens wrote: > Are you executing all queries with tracing enabled? If so that introduces > overhead you probably don't want. Most people probably don't see this log >

Re: Multi-DC Cluster w/ non-replicated Keyspace

2016-06-16 Thread Jason J. W. Williams
It was my mistake. Before I checked the trace I assume it meant the data also had been replicated to the remote cluster, which is why it could answer the request. Thank you for responding so quickly and helping correct my misunderstanding. As long as the data isn't being replicated, everything

Re: Multi-DC Cluster w/ non-replicated Keyspace

2016-06-16 Thread Ben Slater
That’s the behaviour I would have expected. I’m not aware of anyway to prevent this and would be surprised if there is one (but I’ve never tried to find one either so it might be possible). Cheers Ben On Fri, 17 Jun 2016 at 12:02 Jason J. W. Williams wrote: > Hey

Re: Multi-DC Cluster w/ non-replicated Keyspace

2016-06-16 Thread Jason J. W. Williams
Hey Ben, Looks like just the schema. I was surprised that running SELECTs against the DC which should not have any data (because it's not specified in NetworkTopologyStrategy), actually returned data. But looking at the query trace it looks like its forwarding the queries to the other DC. -J On

Re: Multi-DC Cluster w/ non-replicated Keyspace

2016-06-16 Thread Ben Slater
Do you mean the data is getting replicated or just the schema? On Fri, 17 Jun 2016 at 11:48 Jason J. W. Williams wrote: > Hi Guys, > > We have a 2 DC cluster where the keyspaces are replicated between the 2. > Is it possible to add a keyspace to one of the DCs that

Multi-DC Cluster w/ non-replicated Keyspace

2016-06-16 Thread Jason J. W. Williams
Hi Guys, We have a 2 DC cluster where the keyspaces are replicated between the 2. Is it possible to add a keyspace to one of the DCs that won't be replicated to the other? Whenever we add a new keyspace it seems to get replicated even if we don't specify the other DC in the keyspace's

Re: Backup strategy

2016-06-16 Thread Dennis Lovely
Snapshot would flush your memtable to disk and you could stream your sstables out. Incremental backups would be the differences that have occurred since your last snapshot as far as I'm aware. Since it's reasonably unfeasible to constantly stream out full snapshots (depending on the density of

Re: Backup strategy

2016-06-16 Thread Dennis Lovely
Periodic snapshots + incremental backups I think are pretty good in terms of restoring to point in time. But you must manage cleaning up your snapshots + incremental backups on your own. I believe that tablesnap ( https://github.com/JeremyGrosser/tablesnap) is a pretty decent approach in terms

Re: Backup strategy

2016-06-16 Thread Rakesh Kumar
On Thu, Jun 16, 2016 at 7:30 PM, Bhuvan Rawal wrote: > 2. Snapshotting : Hardlinks of sstables will get created. This is a very > fast process and latest data is captured into sstables after flushing > memtables, snapshots will be created in snapshots directory. But snapshot

Re: Backup strategy

2016-06-16 Thread Bhuvan Rawal
Also if we talk about backup strategy for Cassandra Data then essentially there are couple of strategies that are adopted: 1. Incremental Backups. The old sstables will remain inside a backup directory and can be shipped to a storage location like AWS Glacier, etc. 2. Snapshotting : Hardlinks of

Re: Backup strategy

2016-06-16 Thread Bhuvan Rawal
What kind of data are we talking here? Is it time series data with infrequent updates and only inserts or frequently updated data. How frequently is old data read. I ask this because your Node size planning and Compaction Strategy will essentially depend on these. I have known people go upto 3-5

Re: Backup strategy

2016-06-16 Thread vasu . nosql
Bhuvan, Thanks for the info but actually I'm not looking for migration strategy. just want to backup strategy and retention policy best practices Thanks, Vasu > On Jun 16, 2016, at 6:51 PM, Bhuvan Rawal wrote: > > Hi Vasu, > > Planet Cassandra has a documentation page

Re: Backup strategy

2016-06-16 Thread Bhuvan Rawal
Hi Vasu, Planet Cassandra has a documentation page for basic info about migrating to cassandra from MySQL. What to expect and what not to. It can be found here . I had a look at this slide

Backup strategy

2016-06-16 Thread vasu . nosql
Hi , I'm from relational world recently started working on Cassandra. I'm just wondering what is backup best practices for DB around 100 Tb with multi DC setup. Thanks, Vasu

Re: StreamCoordinator.ConnectionsPerHost set to 1

2016-06-16 Thread Paulo Motta
Increasing the number of threads alone won't help, because you need to add connectionsPerHost-awareness to StreamPlan.requestRanges (otherwise only a single connection per host is created) similar to what was done to StreamPlan.transferFiles by CASSANDRA-3668, but maybe bit trickier. There's an

Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-16 Thread Dennis Lovely
I believe you want to set memoryFraction higher, not lower. These two older threads seem to have similar issues you are experiencing: https://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/%3CCAHUQ+_ZqaWFs_MJ=+V49bD2paKvjLErPKMEW5duLO1jAo4=d...@mail.gmail.com%3E

Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-16 Thread Cassa L
Hi Dennis, On Wed, Jun 15, 2016 at 11:39 PM, Dennis Lovely wrote: > You could try tuning spark.shuffle.memoryFraction and > spark.storage.memoryFraction (both of which have been deprecated in 1.6), > but ultimately you need to find out where you are bottlenecked and address >

Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-16 Thread Cassa L
On Thu, Jun 16, 2016 at 5:27 AM, Deepak Goel wrote: > What is your hardware configuration like which you are running Spark on? > > It is 24core, 128GB RAM > Hey > > Namaskara~Nalama~Guten Tag~Bonjour > > >-- > Keigu > > Deepak > 73500 12833 > www.simtree.net,

Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-16 Thread Cassa L
Hi, > > What do you see under Executors and Details for Stage (for the > affected stages)? Anything weird memory-related? > Under executor Tab, logs throw these warning - 16/06/16 20:45:40 INFO TorrentBroadcast: Reading broadcast variable 422145 took 1 ms 16/06/16 20:45:40 WARN MemoryStore:

StreamCoordinator.ConnectionsPerHost set to 1

2016-06-16 Thread Anubhav Kale
Hello, I noticed that StreamCoordinator.ConnectionsPerHost is always set to 1 (Cassandra 2.1.13). If I am reading the code correctly, this means there will always be just one socket (well, 2 technically for each direction) between nodes when rebuilding thus the data will always be serialized.

Re: Reason for Trace Message Drop

2016-06-16 Thread Eric Stevens
Are you executing all queries with tracing enabled? If so that introduces overhead you probably don't want. Most people probably don't see this log very often because it's the exception to query with tracing enabled, and not the rule (it's a diagnostic thing usually turned on only when

Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-16 Thread Jacek Laskowski
Hi, What do you see under Executors and Details for Stage (for the affected stages)? Anything weird memory-related? How does your "I am reading data from Kafka into Spark and writing it into Cassandra after processing it." pipeline look like? Pozdrawiam, Jacek Laskowski

Re: Streaming from 1 node only when adding a new DC

2016-06-16 Thread Fabien Rousseau
Thanks, Created the issue: https://issues.apache.org/jira/browse/CASSANDRA-12015 2016-06-15 15:25 GMT+02:00 Paulo Motta : > For rebuild, replace and -Dcassandra.consistent.rangemovement=false in > general we currently pick the closest replica (as indicated by the

Re: Reason for Trace Message Drop

2016-06-16 Thread Varun Barala
Thanks!! Eric Stevens for your reply. We have following JVM settings :- - *memtable_offheap_space_in_mb: 15360 (*found in casandra.yaml *)* *MAX_HEAP_SIZE="16G" (*found in cassandra-env.sh *)*- And I also

RE: how to force cassandra-stress to actually generate enough data

2016-06-16 Thread Peter Kovgan
Thank you, guys. I will try all proposals. The limitation, mentioned by Benedict, is huge. But anyway, there is something to do around. From: Peter Kovgan Sent: Wednesday, June 15, 2016 3:25 PM To: 'user@cassandra.apache.org' Subject: how to force cassandra-stress to actually generate enough

Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-16 Thread Dennis Lovely
You could try tuning spark.shuffle.memoryFraction and spark.storage.memoryFraction (both of which have been deprecated in 1.6), but ultimately you need to find out where you are bottlenecked and address that as adjusting memoryFraction will only be a stopgap. both shuffle and storage