from:"Jon Haddad"

Re: Recommendation for hosting multi tenant clusters

2013-08-13 Thread Jon Haddad

I strongly recommend against EBS, even with optimized ebs provisioned. The throughput you'll get from local drives is significantly better than what you'll get with EBS (even 4K iops provisioned) On Aug 13, 2013, at 2:10 PM, Rahul Gupta rgu...@dekaresearch.com wrote: I am working on

Re: Custom commands in cassandra

2013-08-14 Thread Jon Haddad

Aside from the problems mentioned below, it's a rare case that tightly coupling your application code directly into your database makes it easier to maintain your codebase, especially as you scale. If you roll out your custom Cassandra application, then decide you need search, will you also

Re: Configuring ephemeral only column family

2013-08-16 Thread Jon Haddad

+1 for redis for this use case. On Aug 16, 2013, at 10:54 AM, Robert Coli rc...@eventbrite.com wrote: On Fri, Aug 16, 2013 at 10:43 AM, Todd Nine tn...@apigee.com wrote: We're using expiring columns as a mean for locking. Perhaps a log structured data store with immutable data files is

Re: Failed decommission

2013-08-25 Thread Jon Haddad

We ran into a similar issue as well. I believe we removed the node via cqlsh from the system keyspace, restarted the cluster, then ran a repair. I'm not sure how safe this really is though. On Aug 25, 2013, at 8:47 AM, Mike Heffner m...@librato.com wrote: Janne, We ran into this too.

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jon Haddad

Does your previous snapshot include the system keyspace? I haven't tried upgrading from 1.0.x then rolling back, but it's possible there's some backwards incompatible changes.Other than that, make sure you also rolled back your config files? On Aug 30, 2013, at 8:57 AM, Mike Neir

Re: Upgrade from 1.0.9 to 1.2.8

2013-08-30 Thread Jon Haddad

Sorry, I didn't see the test procedure, it's still early. On Aug 30, 2013, at 8:57 AM, Mike Neir m...@liquidweb.com wrote: Greetings folks, I'm faced with the need to update a 36 node cluster with roughly 25T of data on disk to a version of cassandra in the 1.2.x series. While it seems

Re: CQL Thrift

2013-08-30 Thread Jon Haddad

If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, If i a create a table with CQL3 as create table user(user_id text PRIMARY KEY, first_name text,

Re: CQL Thrift

2013-08-30 Thread Jon Haddad

is much more powerful in that respect. not everyone needs to take advantage of the full power of dynamic columns. On Fri, Aug 30, 2013 at 1:58 PM, Jon Haddad j...@jonhaddad.com wrote: Just curious - what do you need to do that requires thrift? We've build our entire platform using CQL3 and we

Re: CQL Thrift

2013-08-30 Thread Jon Haddad

for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't

Re: CQL Thrift

2013-08-30 Thread Jon Haddad

the sweet spot is thrift for insert/update and CQL for select queries. CQL is too limiting and negates the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're

Re: CQL Thrift

2013-08-30 Thread Jon Haddad

the power of storing arbitrary data types in dynamic columns. On Fri, Aug 30, 2013 at 1:45 PM, Jon Haddad j...@jonhaddad.com wrote: If you're going to work with CQL, work with CQL. If you're going to work with Thrift, work with Thrift. Don't mix. On Aug 30, 2013, at 10:38 AM, Vivek

Re: Cassandra cluster migration in Amazon EC2

2013-09-02 Thread Jon Haddad

If you launch the new servers, have them join the cluster, then decommission the old ones, you'll be able to do it without downtime. It'll also have the effect of randomizing the tokens, I believe. On Sep 2, 2013, at 4:21 PM, Renat Gilfanov gren...@mail.ru wrote: Hello, Currently we have

Re: is there any type of table existing on all nodes(slow to up date, fast to read in map/reduce)?

2013-09-13 Thread Jon Haddad

It sounds some something that's only useful in a really limited use case. In an 11 node cluster it would be quorum reads / writes would need to come from 6 nodes. It would probably be much slower for both reads writes. It sounds like what you want is a database with replication, not

Re: DELETE does not delete :)

2013-10-07 Thread Jon Haddad

I haven't used VMWare but it seems odd that it would lock up the ntp port. try ps aux | grep ntp to see if ntpd it's already running. On Oct 7, 2013, at 12:23 AM, Alexander Shutyaev shuty...@gmail.com wrote: Hi Michał, I didn't notice your message at first.. Well this seems like a real

Re: one big cluster vs multiple smaller clusters

2013-10-13 Thread Jon Haddad

This is a pretty vague question. What are you trying to achieve? On Oct 12, 2013, at 9:05 PM, Wei Zhu wz1...@yahoo.com wrote: Hi, As we bring more use cases to Cassandra, we have been thinking about the best way to host it. Let's say we will have 15 physical machines available, we can use

Re: Output of nodetool ring with virtual nodes

2013-10-15 Thread Jon Haddad

It's expected. I think nodetool status is meant to replace nodetool ring. On Oct 15, 2013, at 11:45 AM, Paulo Motta pauloricard...@gmail.com wrote: Hello, I recently did the Enabling virtual nodes on an existing production cluster procedure

Re: mixed linux/windows cluster in Cassandra-1.2

2013-10-21 Thread Jon Haddad

I can't imagine any situation where this would be practical. What would be the reason to even consider this? On Oct 21, 2013, at 11:06 AM, Robert Coli rc...@eventbrite.com wrote: On Mon, Oct 21, 2013 at 12:55 AM, Илья Шипицин chipits...@gmail.com wrote: is mixed linux/windows cluster

Re: Wide rows/composite keys clarification needed

2013-10-21 Thread Jon Haddad

If you're working with CQL, you don't need to worry about the column names, it's handled for you. If you specify multiple keys as part of the primary key, they become clustering keys and are mapped to the column names. So if you have a sensor_id / time_stamp, all your sensor readings will be

Re: Efficient IP address location lookup

2013-11-15 Thread Jon Haddad

Instead of determining your table first, you should figure out what you want to ask Cassandra. What do you want to look up your data by? For each query you may need to store the data multiple times, which is perfectly reasonable and is recommended. On Nov 15, 2013, at 4:36 PM, Jacob Rhoden

Re: Struggling to understand CFS and its use.

2013-11-17 Thread Jon Haddad

Having used (and moved off of) Titan I do not recommend it as a primary database. Until it overcomes it’s extremely unoptimized graph traversals, it will increase the load on your database by several orders of magnitude. As a secondary analytics database, it might do fine. Just don’t rely

Re: Securing Cassandra database

2014-04-05 Thread Jon Haddad

This isn’t Cassandra specific, but this is why I hate including db configuration with the main codebase instead of making it the responsibility of ops. This case you described shouldn’t even be possible. The production db configs should be provided by the team maintaining the production

Re: Recommended Approach for Config Changes

2014-04-25 Thread Jon Haddad

You might want to take a peek at what’s happening in the process via strace -p or tcpdump. I can’t remember ever waiting an hour for a node to rejoin. On Apr 25, 2014, at 8:59 AM, Tyler Hobbs ty...@datastax.com wrote: On Fri, Apr 25, 2014 at 10:43 AM, Phil Burress philburress...@gmail.com

Re: Cassandra data retention policy

2014-04-28 Thread Jon Haddad

He said below that he’d like to keep the old data, so that might rule out TTLs in any case. You’ve got a few options that I can think of off the top of my head. The easiest from a management perspective is to use one table per month. WhateverData042014 would be this months. It’s easy enough

Re: Cassandra vs Elasticsearch.

2014-05-03 Thread Jon Haddad

Agreed w/ ES not being the durable data store. I would recommend treating it as ephemeral, and using Cassandra as your source of truth. Keep in mind if you change your ES index mapping, you’ll require a full reindex in order to search the data properly. It’s not like adding a secondary index

Re: Do I need to run repair and compaction every node?

2015-04-13 Thread Jon Haddad

Or use spotify’s reaper and forget about it https://github.com/spotify/cassandra-reaper https://github.com/spotify/cassandra-reaper On Apr 13, 2015, at 3:45 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Apr 13, 2015 at 3:33 PM, Jeff Ferland j...@tubularlabs.com

Re: timestamp as clustering key doesn't work as expected

2015-10-23 Thread Jon Haddad

What version of Cassandra? I can’t think of a reason why you’d see this output. If you can reliably reproduce, this should be filed as a JIRA. https://issues.apache.org/jira > On Oct 23, 2015, at 8:55 AM, Kai Wang wrote: > > Hi, > > I use a timestamp column as the last

Re: Oracle TIMESTAMP(9) equivalent in Cassandra

2015-10-29 Thread Jon Haddad

Keep in mind that in a distributed environment you probably have so much variance that nanosecond precision is pointless. Even google notes that in the paper, Dapper, a Large-Scale Distributed Systems Tracing Infrastructure [http://research.google.com/pubs/pub36356.html

Re: Deletes Reappeared even when nodes are not down

2015-11-13 Thread Jon Haddad

than the one got deleted (based on last modified date field). We > are definitely not talking about few millis here. > > Praveen > > From: Jon Haddad <jonathan.had...@gmail.com > <mailto:jonathan.had...@gmail.com>> > Reply-To: "user@cassandra.apache.org <

Re: Deletes Reappeared even when nodes are not down

2015-11-13 Thread Jon Haddad

ing on AWS servers and no clocks are not 20 minutes off. > > > From: Jon Haddad <jonathan.had...@gmail.com > <mailto:jonathan.had...@gmail.com>> > Reply-To: "user@cassandra.apache.org <mailto:user@cassandra.apache.org>" > <user@cassandra.apache.org <mai

Re: Deletes Reappeared even when nodes are not down

2015-11-13 Thread Jon Haddad

Any chance your clocks are off? > On Nov 13, 2015, at 1:09 PM, Peddi, Praveen wrote: > > Hi, > We are using Cassandra 2.0.8, with replication factor of 3. > > We are seeing a scenario where some of the rows in the table reappears even > after they are deleted. We have seen

Re: Overriding timestamp with light weight transactions

2015-11-16 Thread Jon Haddad

Perhaps you should fix your clock drift issues instead of trying to use a workaround? > On Nov 16, 2015, at 11:39 AM, Peddi, Praveen wrote: > > Hi, > We are using Cassandra 2.0.9 and we currently have “using timestamp” clause > in all our update queries. We did this to fix

Re: Overriding timestamp with light weight transactions

2015-11-16 Thread Jon Haddad

Europe has longer drifts. We override > the timestamp only if we see current timestamp on the row is in future. Why > do you think overriding timestamp is a work around? It seems like a valid > reason to override timestamps. > > Thanks > Praveen > > > From: Jon Haddad

Re: scylladb

2015-11-05 Thread Jon Haddad

Nope, no one I know. Let me know if you try it I'd love to hear your feedback. > On Nov 5, 2015, at 9:22 AM, tommaso barbugli wrote: > > Hi guys, > > did anyone already try Scylladb (yet another fastest NoSQL database in town) > and has some thoughts/hands-on experience

Re: compression cpu overhead

2015-11-03 Thread Jon Haddad

You won't see any overhead on writes because you don't actually write to sstables when performing a write. Just the commit log & memtable. Memtables are flushes asynchronously. > On Nov 4, 2015, at 1:57 AM, Tushar Agrawal wrote: > > For writes it's negligible. For

Re: Read query taking a long time

2015-10-19 Thread Jon Haddad

I wrote a blog post a while back you may find helpful on diagnosing problems in production. There's a lot of potential things that could be wrong with your cluster and going back and forth on the ML to pin down the right one will take forever.

Re: LOCAL_SERIAL

2015-10-15 Thread Jon Haddad

ZK seems a little overkill for just 1 feature though. LOCAL_SERIAL is fine if all you want to do is keep a handful of keys up to date. There’s a massive cost in adding something new to your infrastructure, and imo, very little gain in this case. > On Oct 15, 2015, at 8:29 AM, Eric Stevens

Re: Data visualization tools for Cassandra

2015-10-20 Thread Jon Haddad

PySpark (dataframes) + Pandas + Seaborn/Matplotlib > On Oct 20, 2015, at 11:22 AM, Charles Rich wrote: > > Take a look at jKool, a DataStax partner at jKoolCloud.com > . It provides visualization for data in DSE. > > Regards, > > Charley > >

Re: G1 GC settings

2015-10-13 Thread Jon Haddad

You may want to read Al Tobey’s Cassandra tuning guide. It’s got a section on G1. It’s being widely used, successfully, at massive scale. https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html > On Oct 13, 2015,

Re: SELECT some_column vs SELECT *

2015-11-24 Thread Jon Haddad

If it's sparsely populated you'll get the same benefit from the schema definition. You don't pay for fields you don't use. > On Nov 24, 2015, at 12:17 PM, Jack Krupansky wrote: > > Are all or ost of the 1000+ columns populated for a given row? If they are > sparse

Re: Triggering Deletion/Updation

2015-11-22 Thread Jon Haddad

There's no built in way of doing cascading deletes in Cassandra, I really wouldn't recommend using triggers for this either. My advice is to manage it in your app code. > On Nov 22, 2015, at 9:59 AM, Prem Yadav wrote: > > if it is cassandra 2.0+, > you can implement

Re: Cassandra version numbering

2017-02-23 Thread Jon Haddad

No > On Feb 23, 2017, at 1:59 PM, Rakesh Kumar wrote: > > Is ver 3.0.10 same as 3.10. > > Cassandra website mentions this: Cassandra 3.10 Changelog > > But in other places 3.0.10 is mentioned.

Re: How does cassandra achieve Linearizability?

2017-02-09 Thread Jon Haddad

LWT != Last Write Wins. They are totally different. LWTs give you (assuming you also read at SERIAL) “atomic consistency”, meaning you are able to perform operations atomically and in isolation. That’s the safety blanket everyone wants but is extremely expensive, especially in Cassandra.

Re: Priority for cassandra nodes in cluster

2016-11-12 Thread Jon Haddad

Agreed w/ Benjamin. Trying to diagnose issues in prod will be a nightmare. Keep your DB servers homogeneous. > On Nov 12, 2016, at 1:52 PM, Benjamin Roth wrote: > > 1. From a 15 year experience of running distributed Services: dont Mix > Services on machines if

Re: Storing videos in cassandra

2016-11-14 Thread Jon Haddad

You’ve asked a lot of questions on this mailing list, and you’ve gotten help on a ton of beginner issues. Making fun of someone for asking similar beginner questions is not cool at all. Cut it out. > On Nov 14, 2016, at 10:13 AM, Ali Akhtar wrote: > > Another

Re: Storing videos in cassandra

2016-11-14 Thread Jon Haddad

> On Nov 14, 2016, at 10:25 AM, Ali Akhtar <ali.rac...@gmail.com> wrote: > > Excuse me? I did not make fun of anyone. I gave valid suggestions that are > all theoretically possible. > > If it came off in a condescending way, i am genuinely sorry. > > > On

Re: Storing videos in cassandra

2016-11-14 Thread Jon Haddad

While Cassandra *can* be used this way, I don’t recommend it. It’s going to be far cheaper and easier to maintain to store data in an Object store like S3, like Oskar recommended. > On Nov 14, 2016, at 10:16 AM, l...@airstreamcomm.net wrote: > > We store videos and files in Cassandra by

Re: Query on Data Modelling of a specific usecase

2017-04-19 Thread Jon Haddad

How much data do you plan to store in each table? I’ll be honest, this doesn’t sound like a Cassandra use case at first glance. 1 table per report x 1000 is going to be a bad time. Odds are with different queries, you’ll need multiple views, so lets call that a handful of tables per report.

Re: Slow writes and Frequent timeouts

2017-04-17 Thread Jon Haddad

What are your hardware specs? Where are you running the cluster? Is every node in the same physical datacenter? What command are you using to run stress? > On Apr 17, 2017, at 9:57 AM, Akshay Suresh > wrote: > > Hi > > I have not done much. Just created a

Re: Downside to running multiple nodetool repairs at the same time?

2017-04-21 Thread Jon Haddad

We (The Last Pickle) forked reaper a while ago and added support for 3.0. https://github.com/thelastpickle/cassandra-reaper We set up a mailing list here for Reaper specific questions:

Re: How does clustering key works with TimeWindowCompactionStrategy (TWCS)

2017-04-07 Thread Jon Haddad

Alex Dejanovski wrote a good post on how the LIMIT clause works and why it doesn’t (until 3.4) work the way you think it would. http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html > On Apr 7, 2017, at 7:23 AM, Jerry Lam

Re: Cassandra Data migration from 2.2.3 to 3.7

2017-08-01 Thread Jon Haddad

Just curious, why go to 3.7? 3.11 has hundreds of bug fixes that 3.7 doesn’t and will continue to receive fixes. > On Aug 1, 2017, at 3:44 PM, Harika Vangapelli -T (hvangape - AKRAYA INC at > Cisco) wrote: > > Jeff, I tried the below steps for just 3 rows of data, It

Re: cqlsh -e output - How to change the default delimiter '|' in the output

2017-08-15 Thread Jon Haddad

Using COPY .. TO you can export using the DELIMITER option, does that help? > On Aug 15, 2017, at 9:01 PM, Harikrishnan A wrote: > > Thank you all > > Regards, > Hari > > > On Tuesday, August 15, 2017 12:55 AM, Erick Ramirez > wrote: > >

Re: Migrate from DSE (Datastax) to Apache Cassandra

2017-08-15 Thread Jon Haddad

I agree with Jeff, it’s not necessary to launch a new cluster for this operation. > On Aug 15, 2017, at 7:39 PM, Jeff Jirsa wrote: > > Or just alter the key space replication strategy and remove the DSE specific > strategies in favor of network topology strategy > > > --

Re: Multi datacenter node loss

2017-07-21 Thread Jon Haddad

SimpleStrategy doesn’t take DC or rack into account at all. It simply places replicas on subsequent tokens. You could end up with 3 copies in 1 DC and zero in another. /** * This class returns the nodes responsible for a given * key but does not respect rack awareness. Basically *

Re: Data Loss irreparabley so

2017-07-27 Thread Jon Haddad

We (The Last Pickle) maintain an open source tool to help manage repairs across your clusters called Reaper. It’s a lot easier to set up and manage than trying to manage it through cron. http://thelastpickle.com/reaper.html > On Jul 27, 2017, at 12:38 AM,

Re: Upgrade requirements for upgrading from cassandra 2.1.x to 2.2.x

2017-08-22 Thread Jon Haddad

NEWS.txt is the goto spot for upgrade instructions, caveats, etc. Jon > On Aug 22, 2017, at 2:46 PM, Chuck Reynolds wrote: > > Anyone? > > From: "Chuck (me) Reynolds" > Reply-To: "user@cassandra.apache.org" > Date:

Re: Service discovery in the Cassandra cluster

2017-05-01 Thread Jon Haddad

>> num_tokens: recommended value: 256 >> -seeds: internal IP address of each seed node > > I saw also hostnames mentioned few times, but it just makes it even more > confusing. > > — > Roman > >> On May 1, 2017, at 3:50 PM, Jon Haddad <jonathan.had...@gma

Re: Service discovery in the Cassandra cluster

2017-05-01 Thread Jon Haddad

— > Roman > >> On May 1, 2017, at 4:14 PM, Jon Haddad <jonathan.had...@gmail.com >> <mailto:jonathan.had...@gmail.com>> wrote: >> >> The in-tree docs do not mention this anywhere, and even have some of the >> answers you’re asking: >> >

Re: Smart Table creation for 2D range query

2017-05-08 Thread Jon Haddad

It gets a little tricky when you try to add in the coordinates to the clustering key if you want to do operations that are more complex. For instance, finding all the elements within a radius of point (x,y) isn’t particularly fun with Cassandra. I recommend moving that logic into the

Re: Smart Table creation for 2D range query

2017-05-09 Thread Jon Haddad

could > be efficiently queried. > > Jim > > On Tue, May 9, 2017 at 11:19 AM, Jon Haddad <jonathan.had...@gmail.com> > wrote: > >> The problem with using geohashes is that you can’t efficiently do ranges >> with random token distribution. So even if your scalar val

Re: Cassandra 3.10 has partial partition key search but does it result in a table scan?

2017-05-09 Thread Jon Haddad

I don’t see any way it wouldn’t. Have you tried tracing it? > On May 9, 2017, at 8:32 AM, Kant Kodali wrote: > > Hi All, > > It looks like Cassandra 3.10 has partial partition key search but does it > result in a table scan? for example I can have the following > > create

Re: Cassandra 3.10 has partial partition key search but does it result in a table scan?

2017-05-09 Thread Jon Haddad

Output from both queries, demonstrating full cluster scans: https://gist.github.com/rustyrazorblade/c4947fc37da85bca50e08aa1ef3c7a06 <https://gist.github.com/rustyrazorblade/c4947fc37da85bca50e08aa1ef3c7a06> Jon > On May 9, 2017, at 9:24 AM, Jon Haddad <jonathan.had...@gmai

Re: Cassandra 3.10 has partial partition key search but does it result in a table scan?

2017-05-09 Thread Jon Haddad

; partial partition key and get the max b ? > > > > > > On Tue, May 9, 2017 at 6:33 AM, Jon Haddad <jonathan.had...@gmail.com > <mailto:jonathan.had...@gmail.com>> wrote: > I don’t see any way it wouldn’t. Have you tried tracing it? > > > On May 9, 2017, at 8:32 AM,

Re: Smart Table creation for 2D range query

2017-05-09 Thread Jon Haddad

S_jwNVHRPZTTDzXXn6Q/view#slide=id.i0>. > As Jon mentions, this puts more work on the client, but might give you a lot > of querying flexibility when using Cassandra. > > Jim > > On Mon, May 8, 2017 at 11:13 PM, Jon Haddad <jonathan.had...@gmail.com > <mailto:jonathan.had..

Re: Service discovery in the Cassandra cluster

2017-05-02 Thread Jon Haddad

gt; wrote: > Lol yeah, why > I guess I run some ec2 instances, drop some cassandra deb packages on 'em - > the thing will figure out how to run... > > Also, how would you get "initial state of the cluster" if the cluster... is > being initialized? > Or that

Re: Smart Table creation for 2D range query

2017-05-05 Thread Jon Haddad

I think you’ll want to model your table similar to how an R-Tree [1] / Quad tree [2] works. Let’s suppose you had a 10x10 meter land area and you wanted to put stuff in there. In order to find “all the things in point x,y”, you could break your land area into a grid. A partition would

Re: manual deletes with TWCS

2017-05-05 Thread Jon Haddad

You cannot. From Alex’s TLP post: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html TWCS is no fit for workload that perform deletes on non TTLed data. Consider that SSTables from different time windows will never be compacted

Re: DTCS to TWCS

2017-05-04 Thread Jon Haddad

We (The Last Pickle) wrote a blog post on using TWCS pre-3.0: http://thelastpickle.com/blog/2017/01/10/twcs-part2.html Alex Dejanovski wrote a very comprehensive guide to TWCS I recommend reading before putting it in prod:

Re: Totally unbalanced cluster

2017-05-04 Thread Jon Haddad

Adding nodes with NTS is easier, in my opinion. You don’t need to worry about replica placement, if you do it right. > On May 4, 2017, at 7:43 AM, Cogumelos Maravilha > wrote: > > Hi Alain thanks for your kick reply. > > > Regarding SimpleStrategy perhaps you

Re: Service discovery in the Cassandra cluster

2017-05-01 Thread Jon Haddad

Sure, you could use DNS. Where does it say IP addresses are a requirement? > On May 1, 2017, at 1:36 PM, Roman Naumenko wrote: > > If I understand how Cassandra nodes work, they must contain a list of seed’s > IP addressed in config file. > > This requirement makes

Re: detail of compactionstats, pending tasks

2017-09-21 Thread Jon Haddad

Pending tasks are not a queue, they are an estimation of the amount of work it would take to reach a perfect compaction point, but the compactions aren’t independent from one another. For instance, with LCS you may have a compaction from L0 -> L1, which triggers a L1 -> L2 compaction. You

Re: Massive deletes -> major compaction?

2017-09-21 Thread Jon Haddad

Have you considered the fantastic DeletingCompactionStrategy? https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy > On Sep 21, 2017, at 11:51 AM, Jeff Jirsa

Re: Node failure

2017-10-06 Thread Jon Haddad

I’ve had a few use cases for downgrading consistency over the years. If you’re showing a customer dashboard w/ some Ad summary data, it’s great to be right, but showing a number that’s close is better than not being up. > On Oct 6, 2017, at 1:32 PM, Jeff Jirsa wrote: > > I

Re: Increasing VNodes

2017-10-04 Thread Jon Haddad

The site (with the docs) is probably more helpful to learn about how reaper works: http://cassandra-reaper.io/ > On Oct 4, 2017, at 9:54 AM, Chris Lohfink wrote: > > Increasing number of tokens will make repairs worse not better. You can just

Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

2017-10-04 Thread Jon Haddad

Seems pretty overengineered, imo, given you can just save the pagination state as Andy Tolbert pointed out. > On Oct 4, 2017, at 8:38 AM, Daniel Hölbling-Inzko > wrote: > > Thanks for pointing me to Elassandra. > Have you had any experience running this in

Re: Could not connect to localhost:9160 when installing Cassandra on AWS

2017-10-10 Thread Jon Haddad

How did you install Cassandra? Try passing the machine’s IP address to cqlsh, like “cqlsh 192.168.1.1" > On Oct 10, 2017, at 10:43 AM, Lutaya Shafiq Holmes > wrote: > > Hello Cassandra Gurus, > > After I installed Cassandra on AWS- This error comes up when I try to

Re: Cassandra 3.11.0 compaction attempting impossible to complete compactions

2017-10-13 Thread Jon Haddad

Can you paste the output of cassandra compactionstats? What you’re describing should not happen. There’s a check that drops sstables out of a compaction task if there isn’t enough available disk space, see https://issues.apache.org/jira/browse/CASSANDRA-12979

Re: Cassandra compatibility matrix

2017-09-07 Thread Jon Haddad

There aren’t any drivers maintained by the Cassandra project. Compatibility is up to each driver. Usually a section is included in the README. For instance, in the DataStax Java Driver: https://github.com/datastax/java-driver#compatibility

Re: C* 3 node issue -Urgent

2017-09-06 Thread Jon Haddad

I wouldn’t worry about being meticulous about keeping RF = N as the cluster grows. If you had 60 nodes and your auth data was only on 9 you’d be completely fine. > On Sep 6, 2017, at 11:36 AM, Cogumelos Maravilha > wrote: > > After insert a new node we should:

Re: new question ;-) // RE: understanding batch atomicity

2017-09-29 Thread Jon Haddad

The use of “atomic” for batches is misleading. Batches will eventually complete, that doesn’t make them atomic. “All or nothing” is also incorrect, as you can read them in the middle and get “some parts of it”, and without a rollback it’s just “eventually all”. > On Sep 29, 2017, at 10:59

Reaper 0.7 is released!

2017-09-27 Thread Jon Haddad

Hey folks, We (The Last Pickle) are proud to announce the release of Reaper 0.7! In this release we've added support to run Reaper across multiple data centers as well as supporting Reaper failover when using the Cassandra storage backend. You can grab DEB, RPM and tarballs off the downloads

Re: Reaper 0.7 is released!

2017-09-27 Thread Jon Haddad

> Wednesday, September 27, 2017 10:33 AM -07:00 from Aiman Parvaiz > <ai...@steelhouse.com>: > > Thanks!! Love Reaper :) > > Sent from my iPhone > > On Sep 27, 2017, at 10:01 AM, Jon Haddad <j...@jonhaddad.com > > wrote: > >> Hey folks,

Re: Limit on having number of nodes in C* cluster

2017-08-21 Thread Jon Haddad

As far as I know, those 75K nodes are not in a single cluster. If memory serves correctly (and this article seems to indicate that it does http://www.techrepublic.com/article/apples-secret-nosql-sauce-includes-a-hefty-dose-of-cassandra/

Re: Looking for advice and assistance upgrading from Cassandra 1.2.9

2017-10-17 Thread Jon Haddad

I recommend going all the way to 2.2. > On Oct 17, 2017, at 12:37 PM, Jeff Jirsa wrote: > > You’ll go from 1.2 to 2.0 to 2.1 - should be basic steps: > - make sure you have all 1.2 sstables by running upgradesstable > - one node at a time, swap the 1.2 binaries for latest in

Re: Inter Data Center Latency calculation of a Multi DC cluster running in AWS

2017-10-17 Thread Jon Haddad

I recommend figuring out the latency between your datacenters. Cassandra isn’t going to be any more than that barring JVM pauses on the remote coordinator. > On Oct 17, 2017, at 4:17 PM, Bill Walters wrote: > > Hi Everyone, > > I need some suggestions on finding the

Re: CQL Map vs clustering keys

2017-11-15 Thread Jon Haddad

In 3.0, clustering columns are not actually part of the column name anymore. Yay. Aaron Morton wrote a detailed analysis of the 3.x storage engine here: http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html

Reaper 1.0

2017-11-14 Thread Jon Haddad

We’re excited to announce the release of the 1.0 version of Reaper for Apache Cassandra! We’ve made a lot of improvements to the flexibility of managing repairs and simplified the UI based on feedback we’ve received. We’ve written a blog post discussing the changes in detail here:

Re: can't reach cassandra outside my lan

2017-11-30 Thread Jon Haddad

Cassandra is listening on your localhost address, 127.0.0.1, not your laptop’s address on the network. Set rpc_address to the address on your network, or use rpc_interface and let Cassandra figure it out. > On Nov 30, 2017, at 10:38 AM, Andrea Giordano > wrote:

Re: Schema version mismatch with 3.0.8 and 3.0.14

2017-12-01 Thread Jon Haddad

Generally speaking, I would never advise someone to add nodes to a cluster using a different version than the rest of the cluster. > On Dec 1, 2017, at 11:58 AM, Jai Bheemsen Rao Dhanwada > wrote: > > Thanks Jeff, > > I did some more testing on this version upgrade

Re: Tablesnap with custom endpoint?

2017-12-14 Thread Jon Haddad

Tablesnap uses boto, you may be able to override the S3 endpoint. This Stack Overflow answer suggests it’s possible, but you might have to modify the tablesnap script a little: https://stackoverflow.com/questions/32618216/overwrite-s3-endpoint-using-boto3-configuration-file

Re: Upgrade using rebuild

2017-12-14 Thread Jon Haddad

no > On Dec 14, 2017, at 10:59 AM, Anshu Vajpayee wrote: > > Thanks! I am aware with these steps. > > I m just thinking , is it possible to do the upgrade using nodetool rebuild > like we rebuld new dc ? > > Has anyone tried - upgrade with nodetool rebuild ? > >

Re: Upgrade using rebuild

2017-12-14 Thread Jon Haddad

Heh, hit send accidentally. You generally can’t run rebuild to upgrade, because it’s a streaming operation. Streaming isn’t supported between versions, although on 3.x it might work. > On Dec 14, 2017, at 11:01 AM, Jon Haddad <j...@jonhaddad.com> wrote: > > no > >> O

Re: Reaper 1.0

2017-11-17 Thread Jon Haddad

ly > email and delete all copies of this message. > Please click here > <http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for > Company Registration Information. > > From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad > Sent: Wedn

Re: best practice for repair

2017-11-13 Thread Jon Haddad

We (The Last Pickle) maintain Reaper, an open source repair tool, specifically to address all the complexity around repairs. http://cassandra-reaper.io/ Jon > On Nov 13, 2017, at 3:18 AM, Peng Xiao <2535...@qq.com> wrote: > > sub-range repair is much like primary

Re: Solr Search With Apache Cassandra

2017-11-20 Thread Jon Haddad

That’s long since been abandoned (last commit was 5 years ago) > On Nov 20, 2017, at 12:10 PM, Nageswara Rao wrote: > > There is a fork with name on this combo called solandra > > https://github.com/tjake/Solandra > > Please

Re: How quickly we can bootstrap

2017-11-19 Thread Jon Haddad

It sounds like you’re asking how to bootstrap without paying the cost of bootstrapping :) If you want to scale out, you’ll need to deal with the time it takes. You can’t add a node and have it up in 15 minutes, if you’re running 3 TB it’ll take a while. The exact amount of time depends

Re: Time series modeling in C* for range queries

2017-11-19 Thread Jon Haddad

Hi Junaid, I wrote a blog post a few months ago on massively scalable time series, going into a couple techniques on bucketing that you might find helpful. http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html

Re: Reaper 1.0

2017-11-15 Thread Jon Haddad

losure by > others is strictly prohibited. If you are not the intended recipient (or > authorized to receive for the recipient), please contact the sender by reply > email and delete all copies of this message. > Please click here > <http://www.cisco.com/web/about/doing_business/le

Re: Stable Cassandra 3.x version for production

2017-11-07 Thread Jon Haddad

I regularly work with teams that have 3.11.{0.1} in prod, and would recommend it for new clusters. Avoid materialized views and SASI until you really understand how they work and their limitations. MVs solve about one use case correctly, SASI is good if you’re querying a single partition

Re: Reg:- Data modelling For E-Commerce Pattern data modelling for Search

2017-12-07 Thread Jon Haddad

1. No, Apache Cassandra is pretty terrible for search on it’s own. Even with SASI. 2. Maybe, but it’s complicated, and doing it right takes a lot of experience. I’d use Elastic Search instead. > On Dec 7, 2017, at 5:39 PM, @Nandan@ wrote: > > Hi Peoples, >

1 2 3 >

1 - 100 of 221 matches

Mail list logo