Re: Recommendation for hosting multi tenant clusters

2013-08-13 Thread Jon Haddad
I strongly recommend against EBS, even with optimized ebs provisioned. The throughput you'll get from local drives is significantly better than what you'll get with EBS (even 4K iops provisioned) On Aug 13, 2013, at 2:10 PM, Rahul Gupta rgu...@dekaresearch.com wrote: I am working on

Re: Custom commands in cassandra

2013-08-14 Thread Jon Haddad
Aside from the problems mentioned below, it's a rare case that tightly coupling your application code directly into your database makes it easier to maintain your codebase, especially as you scale. If you roll out your custom Cassandra application, then decide you need search, will you also

Re: Configuring ephemeral only column family

2013-08-16 Thread Jon Haddad
+1 for redis for this use case. On Aug 16, 2013, at 10:54 AM, Robert Coli rc...@eventbrite.com wrote: On Fri, Aug 16, 2013 at 10:43 AM, Todd Nine tn...@apigee.com wrote: We're using expiring columns as a mean for locking. Perhaps a log structured data store with immutable data files is

Re: Failed decommission

2013-08-25 Thread Jon Haddad
We ran into a similar issue as well. I believe we removed the node via cqlsh from the system keyspace, restarted the cluster, then ran a repair. I'm not sure how safe this really is though. On Aug 25, 2013, at 8:47 AM, Mike Heffner m...@librato.com wrote: Janne, We ran into this too.

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jon Haddad
Does your previous snapshot include the system keyspace? I haven't tried upgrading from 1.0.x then rolling back, but it's possible there's some backwards incompatible changes.Other than that, make sure you also rolled back your config files? On Aug 30, 2013, at 8:57 AM, Mike Neir

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jon Haddad
Sorry, I didn't see the test procedure, it's still early. On Aug 30, 2013, at 8:57 AM, Mike Neir m...@liquidweb.com wrote: Greetings folks, I'm faced with the need to update a 36 node cluster with roughly 25T of data on disk to a version of cassandra in the 1.2.x series. While it seems

Re: CQL Thrift

2013-08-30 Thread Jon Haddad
If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text,

Re: CQL Thrift

2013-08-30 Thread Jon Haddad
is much more powerful in that respect. not everyone needs to take advantage of the full power of dynamic columns. On Fri, Aug 30, 2013 at 1:58 PM, Jon Haddad j...@jonhaddad.com wrote: Just curious - what do you need to do that requires thrift? We've build our entire platform using CQL3 and we

Re: CQL Thrift

2013-08-30 Thread Jon Haddad
for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't

Re: CQL Thrift

2013-08-30 Thread Jon Haddad
the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're

Re: CQL Thrift

2013-08-30 Thread Jon Haddad
the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek

Re: Cassandra cluster migration in Amazon EC2

2013-09-02 Thread Jon Haddad
If you launch the new servers, have them join the cluster, then decommission the old ones, you'll be able to do it without downtime. It'll also have the effect of randomizing the tokens, I believe. On Sep 2, 2013, at 4:21 PM, Renat Gilfanov gren...@mail.ru wrote: Hello, Currently we have

Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

2013-09-13 Thread Jon Haddad
It sounds some something that's only useful in a really limited use case. In an 11 node cluster it would be quorum reads / writes would need to come from 6 nodes. It would probably be much slower for both reads writes. It sounds like what you want is a database with replication, not

Re: DELETE does not delete :)

2013-10-07 Thread Jon Haddad
I haven't used VMWare but it seems odd that it would lock up the ntp port. try ps aux | grep ntp to see if ntpd it's already running. On Oct 7, 2013, at 12:23 AM, Alexander Shutyaev shuty...@gmail.com wrote: Hi Michał, I didn't notice your message at first.. Well this seems like a real

Re: one big cluster vs multiple smaller clusters

2013-10-13 Thread Jon Haddad
This is a pretty vague question. What are you trying to achieve? On Oct 12, 2013, at 9:05 PM, Wei Zhu wz1...@yahoo.com wrote: Hi, As we bring more use cases to Cassandra, we have been thinking about the best way to host it. Let's say we will have 15 physical machines available, we can use

Re: Output of nodetool ring with virtual nodes

2013-10-15 Thread Jon Haddad
It's expected. I think nodetool status is meant to replace nodetool ring. On Oct 15, 2013, at 11:45 AM, Paulo Motta pauloricard...@gmail.com wrote: Hello, I recently did the Enabling virtual nodes on an existing production cluster procedure

Re: mixed linux/windows cluster in Cassandra-1.2

2013-10-21 Thread Jon Haddad
I can't imagine any situation where this would be practical. What would be the reason to even consider this? On Oct 21, 2013, at 11:06 AM, Robert Coli rc...@eventbrite.com wrote: On Mon, Oct 21, 2013 at 12:55 AM, Илья Шипицин chipits...@gmail.com wrote: is mixed linux/windows cluster

Re: Wide rows/composite keys clarification needed

2013-10-21 Thread Jon Haddad
If you're working with CQL, you don't need to worry about the column names, it's handled for you. If you specify multiple keys as part of the primary key, they become clustering keys and are mapped to the column names. So if you have a sensor_id / time_stamp, all your sensor readings will be

Re: Efficient IP address location lookup

2013-11-15 Thread Jon Haddad
Instead of determining your table first, you should figure out what you want to ask Cassandra. What do you want to look up your data by? For each query you may need to store the data multiple times, which is perfectly reasonable and is recommended. On Nov 15, 2013, at 4:36 PM, Jacob Rhoden

Re: Struggling to understand CFS and its use.

2013-11-17 Thread Jon Haddad
Having used (and moved off of) Titan I do not recommend it as a primary database. Until it overcomes it’s extremely unoptimized graph traversals, it will increase the load on your database by several orders of magnitude. As a secondary analytics database, it might do fine. Just don’t rely

Re: Securing Cassandra database

2014-04-05 Thread Jon Haddad
This isn’t Cassandra specific, but this is why I hate including db configuration with the main codebase instead of making it the responsibility of ops. This case you described shouldn’t even be possible. The production db configs should be provided by the team maintaining the production

Re: Recommended Approach for Config Changes

2014-04-25 Thread Jon Haddad
You might want to take a peek at what’s happening in the process via strace -p or tcpdump. I can’t remember ever waiting an hour for a node to rejoin. On Apr 25, 2014, at 8:59 AM, Tyler Hobbs ty...@datastax.com wrote: On Fri, Apr 25, 2014 at 10:43 AM, Phil Burress philburress...@gmail.com

Re: Cassandra data retention policy

2014-04-28 Thread Jon Haddad
He said below that he’d like to keep the old data, so that might rule out TTLs in any case. You’ve got a few options that I can think of off the top of my head. The easiest from a management perspective is to use one table per month. WhateverData042014 would be this months. It’s easy enough

Re: Cassandra vs Elasticsearch.

2014-05-03 Thread Jon Haddad
Agreed w/ ES not being the durable data store. I would recommend treating it as ephemeral, and using Cassandra as your source of truth. Keep in mind if you change your ES index mapping, you’ll require a full reindex in order to search the data properly. It’s not like adding a secondary index

Re: Do I need to run repair and compaction every node?

2015-04-13 Thread Jon Haddad
Or use spotify’s reaper and forget about it https://github.com/spotify/cassandra-reaper https://github.com/spotify/cassandra-reaper On Apr 13, 2015, at 3:45 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Apr 13, 2015 at 3:33 PM, Jeff Ferland j...@tubularlabs.com

Re: timestamp as clustering key doesn't work as expected

2015-10-23 Thread Jon Haddad
What version of Cassandra? I can’t think of a reason why you’d see this output. If you can reliably reproduce, this should be filed as a JIRA. https://issues.apache.org/jira > On Oct 23, 2015, at 8:55 AM, Kai Wang wrote: > > Hi, > > I use a timestamp column as the last

Re: Oracle TIMESTAMP(9) equivalent in Cassandra

2015-10-29 Thread Jon Haddad
Keep in mind that in a distributed environment you probably have so much variance that nanosecond precision is pointless. Even google notes that in the paper, Dapper, a Large-Scale Distributed Systems Tracing Infrastructure [http://research.google.com/pubs/pub36356.html

Re: Deletes Reappeared even when nodes are not down

2015-11-13 Thread Jon Haddad
than the one got deleted (based on last modified date field). We > are definitely not talking about few millis here. > > Praveen > > From: Jon Haddad <jonathan.had...@gmail.com > <mailto:jonathan.had...@gmail.com>> > Reply-To: "user@cassandra.apache.org <

Re: Deletes Reappeared even when nodes are not down

2015-11-13 Thread Jon Haddad
ing on AWS servers and no clocks are not 20 minutes off. > > > From: Jon Haddad <jonathan.had...@gmail.com > <mailto:jonathan.had...@gmail.com>> > Reply-To: "user@cassandra.apache.org <mailto:user@cassandra.apache.org>" > <user@cassandra.apache.org <mai

Re: Deletes Reappeared even when nodes are not down

2015-11-13 Thread Jon Haddad
Any chance your clocks are off? > On Nov 13, 2015, at 1:09 PM, Peddi, Praveen wrote: > > Hi, > We are using Cassandra 2.0.8, with replication factor of 3. > > We are seeing a scenario where some of the rows in the table reappears even > after they are deleted. We have seen

Re: Overriding timestamp with light weight transactions

2015-11-16 Thread Jon Haddad
Perhaps you should fix your clock drift issues instead of trying to use a workaround? > On Nov 16, 2015, at 11:39 AM, Peddi, Praveen wrote: > > Hi, > We are using Cassandra 2.0.9 and we currently have “using timestamp” clause > in all our update queries. We did this to fix

Re: Overriding timestamp with light weight transactions

2015-11-16 Thread Jon Haddad
Europe has longer drifts. We override > the timestamp only if we see current timestamp on the row is in future. Why > do you think overriding timestamp is a work around? It seems like a valid > reason to override timestamps. > > Thanks > Praveen > > > From: Jon Haddad

Re: scylladb

2015-11-05 Thread Jon Haddad
Nope, no one I know. Let me know if you try it I'd love to hear your feedback. > On Nov 5, 2015, at 9:22 AM, tommaso barbugli wrote: > > Hi guys, > > did anyone already try Scylladb (yet another fastest NoSQL database in town) > and has some thoughts/hands-on experience

Re: compression cpu overhead

2015-11-03 Thread Jon Haddad
You won't see any overhead on writes because you don't actually write to sstables when performing a write. Just the commit log & memtable. Memtables are flushes asynchronously. > On Nov 4, 2015, at 1:57 AM, Tushar Agrawal wrote: > > For writes it's negligible. For

Re: Read query taking a long time

2015-10-19 Thread Jon Haddad
I wrote a blog post a while back you may find helpful on diagnosing problems in production. There's a lot of potential things that could be wrong with your cluster and going back and forth on the ML to pin down the right one will take forever.

Re: LOCAL_SERIAL

2015-10-15 Thread Jon Haddad
ZK seems a little overkill for just 1 feature though. LOCAL_SERIAL is fine if all you want to do is keep a handful of keys up to date. There’s a massive cost in adding something new to your infrastructure, and imo, very little gain in this case. > On Oct 15, 2015, at 8:29 AM, Eric Stevens

Re: Data visualization tools for Cassandra

2015-10-20 Thread Jon Haddad
PySpark (dataframes) + Pandas + Seaborn/Matplotlib > On Oct 20, 2015, at 11:22 AM, Charles Rich wrote: > > Take a look at jKool, a DataStax partner at jKoolCloud.com > . It provides visualization for data in DSE. > > Regards, > > Charley > >

Re: G1 GC settings

2015-10-13 Thread Jon Haddad
You may want to read Al Tobey’s Cassandra tuning guide. It’s got a section on G1. It’s being widely used, successfully, at massive scale. https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html > On Oct 13, 2015,

Re: SELECT some_column vs SELECT *

2015-11-24 Thread Jon Haddad
If it's sparsely populated you'll get the same benefit from the schema definition. You don't pay for fields you don't use. > On Nov 24, 2015, at 12:17 PM, Jack Krupansky wrote: > > Are all or ost of the 1000+ columns populated for a given row? If they are > sparse

Re: Triggering Deletion/Updation

2015-11-22 Thread Jon Haddad
There's no built in way of doing cascading deletes in Cassandra, I really wouldn't recommend using triggers for this either. My advice is to manage it in your app code. > On Nov 22, 2015, at 9:59 AM, Prem Yadav wrote: > > if it is cassandra 2.0+, > you can implement

Re: Cassandra version numbering

2017-02-23 Thread Jon Haddad
No > On Feb 23, 2017, at 1:59 PM, Rakesh Kumar wrote: > > Is ver 3.0.10 same as 3.10. > > Cassandra website mentions this: Cassandra 3.10 Changelog > > But in other places 3.0.10 is mentioned.

Re: How does cassandra achieve Linearizability?

2017-02-09 Thread Jon Haddad
LWT != Last Write Wins. They are totally different. LWTs give you (assuming you also read at SERIAL) “atomic consistency”, meaning you are able to perform operations atomically and in isolation. That’s the safety blanket everyone wants but is extremely expensive, especially in Cassandra.

Re: Priority for cassandra nodes in cluster

2016-11-12 Thread Jon Haddad
Agreed w/ Benjamin. Trying to diagnose issues in prod will be a nightmare. Keep your DB servers homogeneous. > On Nov 12, 2016, at 1:52 PM, Benjamin Roth wrote: > > 1. From a 15 year experience of running distributed Services: dont Mix > Services on machines if

Re: Storing videos in cassandra

2016-11-14 Thread Jon Haddad
You’ve asked a lot of questions on this mailing list, and you’ve gotten help on a ton of beginner issues. Making fun of someone for asking similar beginner questions is not cool at all. Cut it out. > On Nov 14, 2016, at 10:13 AM, Ali Akhtar wrote: > > Another

Re: Storing videos in cassandra

2016-11-14 Thread Jon Haddad
> On Nov 14, 2016, at 10:25 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: > > Excuse me? I did not make fun of anyone. I gave valid suggestions that are > all theoretically possible. > > If it came off in a condescending way, i am genuinely sorry. > > > On

Re: Storing videos in cassandra

2016-11-14 Thread Jon Haddad
While Cassandra *can* be used this way, I don’t recommend it. It’s going to be far cheaper and easier to maintain to store data in an Object store like S3, like Oskar recommended. > On Nov 14, 2016, at 10:16 AM, l...@airstreamcomm.net wrote: > > We store videos and files in Cassandra by

Re: Query on Data Modelling of a specific usecase

2017-04-19 Thread Jon Haddad
How much data do you plan to store in each table? I’ll be honest, this doesn’t sound like a Cassandra use case at first glance. 1 table per report x 1000 is going to be a bad time. Odds are with different queries, you’ll need multiple views, so lets call that a handful of tables per report.

Re: Slow writes and Frequent timeouts

2017-04-17 Thread Jon Haddad
What are your hardware specs? Where are you running the cluster? Is every node in the same physical datacenter? What command are you using to run stress? > On Apr 17, 2017, at 9:57 AM, Akshay Suresh > wrote: > > Hi > > I have not done much. Just created a

Re: Downside to running multiple nodetool repairs at the same time?

2017-04-21 Thread Jon Haddad
We (The Last Pickle) forked reaper a while ago and added support for 3.0. https://github.com/thelastpickle/cassandra-reaper We set up a mailing list here for Reaper specific questions:

Re: How does clustering key works with TimeWindowCompactionStrategy (TWCS)

2017-04-07 Thread Jon Haddad
Alex Dejanovski wrote a good post on how the LIMIT clause works and why it doesn’t (until 3.4) work the way you think it would. http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html > On Apr 7, 2017, at 7:23 AM, Jerry Lam

Re: Cassandra Data migration from 2.2.3 to 3.7

2017-08-01 Thread Jon Haddad
Just curious, why go to 3.7? 3.11 has hundreds of bug fixes that 3.7 doesn’t and will continue to receive fixes. > On Aug 1, 2017, at 3:44 PM, Harika Vangapelli -T (hvangape - AKRAYA INC at > Cisco) wrote: > > Jeff, I tried the below steps for just 3 rows of data, It

Re: cqlsh -e output - How to change the default delimiter '|' in the output

2017-08-15 Thread Jon Haddad
Using COPY .. TO you can export using the DELIMITER option, does that help? > On Aug 15, 2017, at 9:01 PM, Harikrishnan A wrote: > > Thank you all > > Regards, > Hari > > > On Tuesday, August 15, 2017 12:55 AM, Erick Ramirez > wrote: > >

Re: Migrate from DSE (Datastax) to Apache Cassandra

2017-08-15 Thread Jon Haddad
I agree with Jeff, it’s not necessary to launch a new cluster for this operation. > On Aug 15, 2017, at 7:39 PM, Jeff Jirsa wrote: > > Or just alter the key space replication strategy and remove the DSE specific > strategies in favor of network topology strategy > > > --

Re: Multi datacenter node loss

2017-07-21 Thread Jon Haddad
SimpleStrategy doesn’t take DC or rack into account at all. It simply places replicas on subsequent tokens. You could end up with 3 copies in 1 DC and zero in another. /** * This class returns the nodes responsible for a given * key but does not respect rack awareness. Basically *

Re: Data Loss irreparabley so

2017-07-27 Thread Jon Haddad
We (The Last Pickle) maintain an open source tool to help manage repairs across your clusters called Reaper. It’s a lot easier to set up and manage than trying to manage it through cron. http://thelastpickle.com/reaper.html > On Jul 27, 2017, at 12:38 AM,

Re: Upgrade requirements for upgrading from cassandra 2.1.x to 2.2.x

2017-08-22 Thread Jon Haddad
NEWS.txt is the goto spot for upgrade instructions, caveats, etc. Jon > On Aug 22, 2017, at 2:46 PM, Chuck Reynolds wrote: > > Anyone? > > From: "Chuck (me) Reynolds" > Reply-To: "user@cassandra.apache.org" > Date:

Re: Service discovery in the Cassandra cluster

2017-05-01 Thread Jon Haddad
>> num_tokens: recommended value: 256 >> -seeds: internal IP address of each seed node > > I saw also hostnames mentioned few times, but it just makes it even more > confusing. > > — > Roman > >> On May 1, 2017, at 3:50 PM, Jon Haddad <jonathan.had...@gma

Re: Service discovery in the Cassandra cluster

2017-05-01 Thread Jon Haddad
— > Roman > >> On May 1, 2017, at 4:14 PM, Jon Haddad <jonathan.had...@gmail.com >> <mailto:jonathan.had...@gmail.com>> wrote: >> >> The in-tree docs do not mention this anywhere, and even have some of the >> answers you’re asking: >> >

Re: Smart Table creation for 2D range query

2017-05-08 Thread Jon Haddad
It gets a little tricky when you try to add in the coordinates to the clustering key if you want to do operations that are more complex. For instance, finding all the elements within a radius of point (x,y) isn’t particularly fun with Cassandra. I recommend moving that logic into the

Re: Smart Table creation for 2D range query

2017-05-09 Thread Jon Haddad
could > be efficiently queried. > > Jim > > On Tue, May 9, 2017 at 11:19 AM, Jon Haddad <jonathan.had...@gmail.com> > wrote: > >> The problem with using geohashes is that you can’t efficiently do ranges >> with random token distribution. So even if your scalar val

Re: Cassandra 3.10 has partial partition key search but does it result in a table scan?

2017-05-09 Thread Jon Haddad
I don’t see any way it wouldn’t. Have you tried tracing it? > On May 9, 2017, at 8:32 AM, Kant Kodali wrote: > > Hi All, > > It looks like Cassandra 3.10 has partial partition key search but does it > result in a table scan? for example I can have the following > > create

Re: Cassandra 3.10 has partial partition key search but does it result in a table scan?

2017-05-09 Thread Jon Haddad
Output from both queries, demonstrating full cluster scans: https://gist.github.com/rustyrazorblade/c4947fc37da85bca50e08aa1ef3c7a06 <https://gist.github.com/rustyrazorblade/c4947fc37da85bca50e08aa1ef3c7a06> Jon > On May 9, 2017, at 9:24 AM, Jon Haddad <jonathan.had...@gmai

Re: Cassandra 3.10 has partial partition key search but does it result in a table scan?

2017-05-09 Thread Jon Haddad
; partial partition key and get the max b ? > > > ​ > > > On Tue, May 9, 2017 at 6:33 AM, Jon Haddad <jonathan.had...@gmail.com > <mailto:jonathan.had...@gmail.com>> wrote: > I don’t see any way it wouldn’t. Have you tried tracing it? > > > On May 9, 2017, at 8:32 AM,

Re: Smart Table creation for 2D range query

2017-05-09 Thread Jon Haddad
S_jwNVHRPZTTDzXXn6Q/view#slide=id.i0>. > As Jon mentions, this puts more work on the client, but might give you a lot > of querying flexibility when using Cassandra. > > Jim > > On Mon, May 8, 2017 at 11:13 PM, Jon Haddad <jonathan.had...@gmail.com > <mailto:jonathan.had..

Re: Service discovery in the Cassandra cluster

2017-05-02 Thread Jon Haddad
gt; wrote: > Lol yeah, why > I guess I run some ec2 instances, drop some cassandra deb packages on 'em - > the thing will figure out how to run... > > Also, how would you get "initial state of the cluster" if the cluster... is > being initialized? > Or that

Re: Smart Table creation for 2D range query

2017-05-05 Thread Jon Haddad
I think you’ll want to model your table similar to how an R-Tree [1] / Quad tree [2] works. Let’s suppose you had a 10x10 meter land area and you wanted to put stuff in there. In order to find “all the things in point x,y”, you could break your land area into a grid. A partition would

Re: manual deletes with TWCS

2017-05-05 Thread Jon Haddad
You cannot. From Alex’s TLP post: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html TWCS is no fit for workload that perform deletes on non TTLed data. Consider that SSTables from different time windows will never be compacted

Re: DTCS to TWCS

2017-05-04 Thread Jon Haddad
We (The Last Pickle) wrote a blog post on using TWCS pre-3.0: http://thelastpickle.com/blog/2017/01/10/twcs-part2.html Alex Dejanovski wrote a very comprehensive guide to TWCS I recommend reading before putting it in prod:

Re: Totally unbalanced cluster

2017-05-04 Thread Jon Haddad
Adding nodes with NTS is easier, in my opinion. You don’t need to worry about replica placement, if you do it right. > On May 4, 2017, at 7:43 AM, Cogumelos Maravilha > wrote: > > Hi Alain thanks for your kick reply. > > > Regarding SimpleStrategy perhaps you

Re: Service discovery in the Cassandra cluster

2017-05-01 Thread Jon Haddad
Sure, you could use DNS. Where does it say IP addresses are a requirement? > On May 1, 2017, at 1:36 PM, Roman Naumenko wrote: > > If I understand how Cassandra nodes work, they must contain a list of seed’s > IP addressed in config file. > > This requirement makes

Re: detail of compactionstats, pending tasks

2017-09-21 Thread Jon Haddad
Pending tasks are not a queue, they are an estimation of the amount of work it would take to reach a perfect compaction point, but the compactions aren’t independent from one another. For instance, with LCS you may have a compaction from L0 -> L1, which triggers a L1 -> L2 compaction. You

Re: Massive deletes -> major compaction?

2017-09-21 Thread Jon Haddad
Have you considered the fantastic DeletingCompactionStrategy? https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy > On Sep 21, 2017, at 11:51 AM, Jeff Jirsa

Re: Node failure

2017-10-06 Thread Jon Haddad
I’ve had a few use cases for downgrading consistency over the years. If you’re showing a customer dashboard w/ some Ad summary data, it’s great to be right, but showing a number that’s close is better than not being up. > On Oct 6, 2017, at 1:32 PM, Jeff Jirsa wrote: > > I

Re: Increasing VNodes

2017-10-04 Thread Jon Haddad
The site (with the docs) is probably more helpful to learn about how reaper works: http://cassandra-reaper.io/ > On Oct 4, 2017, at 9:54 AM, Chris Lohfink wrote: > > Increasing number of tokens will make repairs worse not better. You can just

Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

2017-10-04 Thread Jon Haddad
Seems pretty overengineered, imo, given you can just save the pagination state as Andy Tolbert pointed out. > On Oct 4, 2017, at 8:38 AM, Daniel Hölbling-Inzko > wrote: > > Thanks for pointing me to Elassandra. > Have you had any experience running this in

Re: Could not connect to localhost:9160 when installing Cassandra on AWS

2017-10-10 Thread Jon Haddad
How did you install Cassandra? Try passing the machine’s IP address to cqlsh, like “cqlsh 192.168.1.1" > On Oct 10, 2017, at 10:43 AM, Lutaya Shafiq Holmes > wrote: > > Hello Cassandra Gurus, > > After I installed Cassandra on AWS- This error comes up when I try to

Re: Cassandra 3.11.0 compaction attempting impossible to complete compactions

2017-10-13 Thread Jon Haddad
Can you paste the output of cassandra compactionstats? What you’re describing should not happen. There’s a check that drops sstables out of a compaction task if there isn’t enough available disk space, see https://issues.apache.org/jira/browse/CASSANDRA-12979

Re: Cassandra compatibility matrix

2017-09-07 Thread Jon Haddad
There aren’t any drivers maintained by the Cassandra project. Compatibility is up to each driver. Usually a section is included in the README. For instance, in the DataStax Java Driver: https://github.com/datastax/java-driver#compatibility

Re: C* 3 node issue -Urgent

2017-09-06 Thread Jon Haddad
I wouldn’t worry about being meticulous about keeping RF = N as the cluster grows. If you had 60 nodes and your auth data was only on 9 you’d be completely fine. > On Sep 6, 2017, at 11:36 AM, Cogumelos Maravilha > wrote: > > After insert a new node we should:

Re: new question ;-) // RE: understanding batch atomicity

2017-09-29 Thread Jon Haddad
The use of “atomic” for batches is misleading. Batches will eventually complete, that doesn’t make them atomic. “All or nothing” is also incorrect, as you can read them in the middle and get “some parts of it”, and without a rollback it’s just “eventually all”. > On Sep 29, 2017, at 10:59

Reaper 0.7 is released!

2017-09-27 Thread Jon Haddad
Hey folks, We (The Last Pickle) are proud to announce the release of Reaper 0.7! In this release we've added support to run Reaper across multiple data centers as well as supporting Reaper failover when using the Cassandra storage backend. You can grab DEB, RPM and tarballs off the downloads

Re: Reaper 0.7 is released!

2017-09-27 Thread Jon Haddad
> Wednesday, September 27, 2017 10:33 AM -07:00 from Aiman Parvaiz > <ai...@steelhouse.com>: > > Thanks!! Love Reaper :) > > Sent from my iPhone > > On Sep 27, 2017, at 10:01 AM, Jon Haddad <j...@jonhaddad.com > > wrote: > >> Hey folks,

Re: Limit on having number of nodes in C* cluster

2017-08-21 Thread Jon Haddad
As far as I know, those 75K nodes are not in a single cluster. If memory serves correctly (and this article seems to indicate that it does http://www.techrepublic.com/article/apples-secret-nosql-sauce-includes-a-hefty-dose-of-cassandra/

Re: Looking for advice and assistance upgrading from Cassandra 1.2.9

2017-10-17 Thread Jon Haddad
I recommend going all the way to 2.2. > On Oct 17, 2017, at 12:37 PM, Jeff Jirsa wrote: > > You’ll go from 1.2 to 2.0 to 2.1 - should be basic steps: > - make sure you have all 1.2 sstables by running upgradesstable > - one node at a time, swap the 1.2 binaries for latest in

Re: Inter Data Center Latency calculation of a Multi DC cluster running in AWS

2017-10-17 Thread Jon Haddad
I recommend figuring out the latency between your datacenters. Cassandra isn’t going to be any more than that barring JVM pauses on the remote coordinator. > On Oct 17, 2017, at 4:17 PM, Bill Walters wrote: > > Hi Everyone, > > I need some suggestions on finding the

Re: CQL Map vs clustering keys

2017-11-15 Thread Jon Haddad
In 3.0, clustering columns are not actually part of the column name anymore. Yay. Aaron Morton wrote a detailed analysis of the 3.x storage engine here: http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html

Reaper 1.0

2017-11-14 Thread Jon Haddad
We’re excited to announce the release of the 1.0 version of Reaper for Apache Cassandra! We’ve made a lot of improvements to the flexibility of managing repairs and simplified the UI based on feedback we’ve received. We’ve written a blog post discussing the changes in detail here:

Re: can't reach cassandra outside my lan

2017-11-30 Thread Jon Haddad
Cassandra is listening on your localhost address, 127.0.0.1, not your laptop’s address on the network. Set rpc_address to the address on your network, or use rpc_interface and let Cassandra figure it out. > On Nov 30, 2017, at 10:38 AM, Andrea Giordano > wrote:

Re: Schema version mismatch with 3.0.8 and 3.0.14

2017-12-01 Thread Jon Haddad
Generally speaking, I would never advise someone to add nodes to a cluster using a different version than the rest of the cluster. > On Dec 1, 2017, at 11:58 AM, Jai Bheemsen Rao Dhanwada > wrote: > > Thanks Jeff, > > I did some more testing on this version upgrade

Re: Tablesnap with custom endpoint?

2017-12-14 Thread Jon Haddad
Tablesnap uses boto, you may be able to override the S3 endpoint. This Stack Overflow answer suggests it’s possible, but you might have to modify the tablesnap script a little: https://stackoverflow.com/questions/32618216/overwrite-s3-endpoint-using-boto3-configuration-file

Re: Upgrade using rebuild

2017-12-14 Thread Jon Haddad
no > On Dec 14, 2017, at 10:59 AM, Anshu Vajpayee wrote: > > Thanks! I am aware with these steps. > > I m just thinking , is it possible to do the upgrade using nodetool rebuild > like we rebuld new dc ? > > Has anyone tried - upgrade with nodetool rebuild ? > >

Re: Upgrade using rebuild

2017-12-14 Thread Jon Haddad
Heh, hit send accidentally. You generally can’t run rebuild to upgrade, because it’s a streaming operation. Streaming isn’t supported between versions, although on 3.x it might work. > On Dec 14, 2017, at 11:01 AM, Jon Haddad <j...@jonhaddad.com> wrote: > > no > >> O

Re: Reaper 1.0

2017-11-17 Thread Jon Haddad
ly > email and delete all copies of this message. > Please click here > <http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for > Company Registration Information. > > From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad > Sent: Wedn

Re: best practice for repair

2017-11-13 Thread Jon Haddad
We (The Last Pickle) maintain Reaper, an open source repair tool, specifically to address all the complexity around repairs. http://cassandra-reaper.io/ Jon > On Nov 13, 2017, at 3:18 AM, Peng Xiao <2535...@qq.com> wrote: > > sub-range repair is much like primary

Re: Solr Search With Apache Cassandra

2017-11-20 Thread Jon Haddad
That’s long since been abandoned (last commit was 5 years ago) > On Nov 20, 2017, at 12:10 PM, Nageswara Rao wrote: > > There is a fork with name on this combo called solandra > > https://github.com/tjake/Solandra > > Please

Re: How quickly we can bootstrap

2017-11-19 Thread Jon Haddad
It sounds like you’re asking how to bootstrap without paying the cost of bootstrapping :) If you want to scale out, you’ll need to deal with the time it takes. You can’t add a node and have it up in 15 minutes, if you’re running 3 TB it’ll take a while. The exact amount of time depends

Re: Time series modeling in C* for range queries

2017-11-19 Thread Jon Haddad
Hi Junaid, I wrote a blog post a few months ago on massively scalable time series, going into a couple techniques on bucketing that you might find helpful. http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html

Re: Reaper 1.0

2017-11-15 Thread Jon Haddad
losure by > others is strictly prohibited. If you are not the intended recipient (or > authorized to receive for the recipient), please contact the sender by reply > email and delete all copies of this message. > Please click here > <http://www.cisco.com/web/about/doing_business/le

Re: Stable Cassandra 3.x version for production

2017-11-07 Thread Jon Haddad
I regularly work with teams that have 3.11.{0.1} in prod, and would recommend it for new clusters. Avoid materialized views and SASI until you really understand how they work and their limitations. MVs solve about one use case correctly, SASI is good if you’re querying a single partition

Re: Reg:- Data modelling For E-Commerce Pattern data modelling for Search

2017-12-07 Thread Jon Haddad
1. No, Apache Cassandra is pretty terrible for search on it’s own. Even with SASI. 2. Maybe, but it’s complicated, and doing it right takes a lot of experience. I’d use Elastic Search instead. > On Dec 7, 2017, at 5:39 PM, @Nandan@ wrote: > > Hi Peoples, >

  1   2   3   >