Re: Bloom filter memory usage disparity

2016-05-17 Thread Jeff Jirsa
Even with the same data, bloom filter is based on sstables. If your compaction behaves differently on 2 nodes than the third, your bloom filter RAM usage may be different. From: Kai Wang Reply-To: "user@cassandra.apache.org" Date: Tuesday, May 17, 2016 at 8:02 PM To:

Re: Accessing Cassandra data from Spark Shell

2016-05-17 Thread Cassa L
Hi, I followed instructions to run SparkShell with Spark-1.6. It works fine. However, I need to use spark-1.5.2 version. With it, it does not work. I keep getting NoSuchMethod Errors. Is there any issue running Spark Shell for Cassandra using older version of Spark? Regards, LCassa On Tue, May

About the data structure of partition index

2016-05-17 Thread Hiroyuki Yamada
Hi, I am wondering how many primary keys are stored in one partition index. As the following documents say, I understand that each

Re: Repair schedules for new clusters

2016-05-17 Thread Ben Slater
We’ve found with incremental repairs that more frequent repairs are generally better. Our current standard for incremental repairs is once per day. I imagine that the exact optimum frequency is dependant on the ratio of reads to write in your cluster. Turning on incremental repairs from the

Restoring Incremental Backups without using sstableloader

2016-05-17 Thread Ravi Teja A V
Hi everyone I am currently working with Cassandra 3.5. I would like to know if it is possible to restore backups without using sstableloader. I have been referring to the following pages in the datastax documentation:

Re: Why simple replication strategy for system_auth ?

2016-05-17 Thread Jérôme Mainaud
Thank you for your answer. What I still don't understand is why auth data is not managed in the same way as schema metadata. Both must be accessible to the node to do the job. Both are changed very rarely. In a way users are some kind of database objects. I understand the choice for trace and

Re: SS Table File Names not containing GUIDs

2016-05-17 Thread Alain RODRIGUEZ
Hi, I am wondering if there is any reason as to why the SS Table format doesn’t > have a GUID I don't know for sure, but what I can say is that GUID is often used to solve the incremental issue on distributed system. SSTables are store on one node, so increment works. So I would say this worked

Re: MigrationManager.java:164 - Migration task failed to complete

2016-05-17 Thread Alain RODRIGUEZ
There is not that much context here, so I will do a standard answer too. If you have a doubt regarding the data owned by a node, running repair takes some resources but should never break anything. I mean it is an operation you can be running as much as you want. So I would use it, just in case.

Re: Cassandra Debian repos (Apache vs DataStax)

2016-05-17 Thread Eric Evans
On Mon, May 16, 2016 at 5:19 PM, Drew Kutcharian wrote: > > What’s the difference between the two “Community” repositories Apache > (http://www.apache.org/dist/cassandra/debian) and DataStax > (http://debian.datastax.com/community/)? Good question. All I can tell you is that

Re: Bloom filter memory usage disparity

2016-05-17 Thread Alain RODRIGUEZ
Hi, we would need more information here (if you did not solve it yet). What is your Cassandra version? Does this 3 node cluster use a Replication Factor of 3? Did you change the bloom_filter_fp_chance recently? That table has about 16M keys and 140GB of data. > Is that the total value or per

Repair schedules for new clusters

2016-05-17 Thread Ashic Mahtab
Hi All,My previous cassandra clusters had moderate loads, and I'd simply schedule full repairs at different times in the week (but on the same day). That seemed to work ok, but was redundant. In my current project, I'm going to need to care about repair times a lot more, and was wondering what

RE: Data platform support

2016-05-17 Thread Ashic Mahtab
If Spark workers are installed on the same nodes as Cassandra nodes, then they can take advantage of data locality, greatly reducing the amount of network IO in Spark jobs. If you use a seperate / Cloudera / Hortonworks / EMR cluster, you won't be able to benefit from this. Other than the

Re: [C*3.0.3]lucene indexes not deleted and nodetool repair makes DC unavailable

2016-05-17 Thread Andres de la Peña
Hi Siddarth, Lucene doesn't immediately remove deleted documents from disk. Instead, it just marks them as deleted, and they are effectively removed during segments merge. This is quite similar to how C* manages deletions with tombstones and compactions. Regards, 2016-05-17 17:30 GMT+01:00

Re: Cassandra Debian repos (Apache vs DataStax)

2016-05-17 Thread Drew Kutcharian
Thanks Eric. > On May 17, 2016, at 7:50 AM, Eric Evans wrote: > > On Mon, May 16, 2016 at 5:19 PM, Drew Kutcharian wrote: >> >> What’s the difference between the two “Community” repositories Apache >> (http://www.apache.org/dist/cassandra/debian)

Re: Cassandra Debian repos (Apache vs DataStax)

2016-05-17 Thread Drew Kutcharian
BTW, the language on this page should probably change since it currently sounds like the official repo is the DataStax one and Apache is only an “alternative" http://wiki.apache.org/cassandra/DebianPackaging - Drew > On May 17, 2016, at 11:35 AM, Drew Kutcharian wrote: > >

Re: Cassandra Debian repos (Apache vs DataStax)

2016-05-17 Thread Drew Kutcharian
OK to make things even more confusing, the “Release” files in the Apache Repo say "Origin: Unofficial Cassandra Packages”!! i.e. http://dl.bintray.com/apache/cassandra/dists/35x/:Release > On May 17, 2016, at 12:11 PM, Drew Kutcharian wrote: > > BTW, the language on this

Applying TTL Change quickly

2016-05-17 Thread Anubhav Kale
Hello, We use STCS and DTCS on our tables and recently made a TTL change (reduced from 8 days to 2) on a table with large amounts of data. What is the best way to quickly purge old data ? I am playing with tombstone_compaction_interval at the moment, but would like some suggestions on what

Re: Applying TTL Change quickly

2016-05-17 Thread Jeff Jirsa
Fastest way? Stop cassandra, use sstablemetadata to remove any files with maxTimestamp > 2 days. Start cassandra. Works better with some compaction strategies than others (probably find a few droppable sstables with either DTCS / STCS, but not perfect). Cleanest way? One by one (starting with

restore cassandra snapshots on a smaller cluster

2016-05-17 Thread Luigi Tagliamonte
Hi everyone, i'm wondering if it is possible to restore all the snapshots of a cluster (10 nodes) in a smaller cluster (3 nodes)? If yes how to do it? -- Luigi --- “The only way to get smarter is by playing a smarter opponent.”

Re: restore cassandra snapshots on a smaller cluster

2016-05-17 Thread Jeff Jirsa
http://www.datastax.com/dev/blog/using-the-cassandra-bulk-loader-updated From: Luigi Tagliamonte Reply-To: "user@cassandra.apache.org" Date: Tuesday, May 17, 2016 at 5:35 PM To: "user@cassandra.apache.org" Subject: restore cassandra snapshots on a smaller cluster Hi everyone, i'm

Re: restore cassandra snapshots on a smaller cluster

2016-05-17 Thread Ben Slater
It should definitely work if you use sstableloader to load all the files. I imagine it is possible doing a straight restore (copy sstables) if you assign the tokens from multiple source nodes to one target node using the initial_token parameter in cassandra.yaml. Cheers Ben On Wed, 18 May 2016