Re: TTransportException (java.net.SocketException: Broken pipe)
Hi Bhaskar, Can you check your limits using 'ulimit -a'? The default is 1024, which needs to be increased if you have not done so already. Here you will find a list of recommended production settings: http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html Mark On Tue, Jul 8, 2014 at 5:30 AM, Bhaskar Singhal bhaskarsing...@yahoo.com wrote: Hi, I am using Cassandra 2.0.7 (with default settings and 16GB heap on quad core ubuntu server with 32gb ram) and trying to ingest 1MB values using cassandra-stress. It works fine for a while(1600secs) but after ingesting around 120GB data, I start getting the following error: Operation [70668] retried 10 times - error inserting key 0070668 ((TTransportException): java.net.SocketException: Broken pipe) The cassandra server is still running but in the system.log I see the below mentioned errors. ERROR [COMMIT-LOG-ALLOCATOR] 2014-07-07 22:39:23,617 CassandraDaemon.java (line 198) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main] java.lang.NoClassDefFoundError: org/apache/cassandra/db/commitlog/CommitLog$4 at org.apache.cassandra.db.commitlog.CommitLog.handleCommitError(CommitLog.java:374) at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:116) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassNotFoundException: org.apache.cassandra.db.commitlog.CommitLog$4 at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 4 more Caused by: java.io.FileNotFoundException: /path/2.0.7/cassandra/build/classes/main/org/apache/cassandra/db/commitlog/CommitLog$4.class (Too many open files) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:146) at sun.misc.URLClassPath$FileLoader$1.getInputStream(URLClassPath.java:1086) at sun.misc.Resource.cachedInputStream(Resource.java:77) at sun.misc.Resource.getByteBuffer(Resource.java:160) at java.net.URLClassLoader.defineClass(URLClassLoader.java:436) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ... 10 more ERROR [FlushWriter:7] 2014-07-07 22:39:24,924 CassandraDaemon.java (line 198) Exception in thread Thread[FlushWriter:7,5,main] FSWriteError in /cassandra/data4/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-593-Filter.db at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:475) at org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:212) at org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:301) at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:417) at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.FileNotFoundException: /cassandra/data4/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-593-Filter.db (Too many open files) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:221) at java.io.FileOutputStream.init(FileOutputStream.java:110) at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:466) ... 9 more There are around 9685 open files by the Cassandra server process (using lsof), 3938 commit log segments in /cassandra/commitlog and around 572 commit log segments deleted during the course of the test. I am wondering what is causing Cassandra to open so many files, is the flushing slow? or something else? I tried increasing the flush writers, but that didn't help. Regards, Bhaskar CREATE KEYSPACE Keyspace1 WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '1' }; CREATE TABLE Standard1 ( key blob, C0 blob, PRIMARY KEY (key) ) WITH COMPACT STORAGE AND
RE: Compaction causing listeners to stall
Robert, New development today: FSReadError in /mnt/data/cassandra/COMPANY/crmFieldInfo/COMPANYFieldInfo-jb-710-Data.db at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:95) at org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer(CompressedThrottledReader.java:41) at org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:323) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:444) at java.io.RandomAccessFile.readFully(RandomAccessFile.java:424) at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:348) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85) at org.apache.cassandra.db.Column$1.computeNext(Column.java:75) at org.apache.cassandra.db.Column$1.computeNext(Column.java:64) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:185) at org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer$1.runMayThrow(ParallelCompactionIterable.java:271) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.lang.Thread.run(Thread.java:744) Caused by: java.nio.channels.ClosedChannelException at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99) at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:250) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:101) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:87) ... 17 more Followed by: ERROR [Deserialize SSTableReader(path='/mnt/data/cassandra/COMPANY/crmFieldInfo/COMPANYFieldInfo-jb-710-Data.db')] 2014-07-08 05:00:09,126 StorageService.java (line 364) Stopping gossiper WARN [Deserialize SSTableReader(path='/mnt/data/cassandra/COMPANY/crmFieldInfo/COMPANYFieldInfo-jb-710-Data.db')] 2014-07-08 05:00:09,126 StorageService.java (line 278) Stopping gossip by operator request INFO [Deserialize SSTableReader(path='/mnt/data/cassandra/COMPANY/crmFieldInfo/COMPANYFieldInfo-jb-710-Data.db')] 2014-07-08 05:00:09,126 Gossiper.java (line 1251) Announcing shutdown After this the listeners are no longer available but the DB does not officially die just hangs and needs a restart. *From:* Robert Coli [mailto:rc...@eventbrite.com] *Sent:* Monday, July 7, 2014 6:55 PM *To:* user@cassandra.apache.org *Subject:* Re: Compaction causing listeners to stall On Mon, Jul 7, 2014 at 5:20 AM, Bryon Spahn bsp...@kitedesk.com wrote: I am experiencing a strange issue where we run a compaction job weekly and as a result, the listeners stall. This is a single node cluster running on an i2.2xl instance in AWS. We are getting the message: There are almost no cases where it makes sense to run a single node of Cassandra, especially in production. *[StorageServiceShutdownHook]* I bet you a donut that you're OOMing the JVM. Stop doing that, and your Cassandra node will stop crashing. https://issues.apache.org/jira/browse/CASSANDRA-7507 Is probably the case you have just hit. Basically, in some pathological circumstances, the JVM will send Cassandra a signal that it handles as if you were an operator attempting a clean shutdown. This probably usually does not succeed, but may be worth a shot. =Rob
help on querying cassandra
Hi All, When i query the cassandra, it creates a cartesion produt on my input request (row keys * column keys). Is there any way to query cassandra with map kind of input as below. current scenarios 5 row keys, 5 column keys as below = result data with 25 (if data is available). example rowkey1=100, column key1=1 rowkey2=101, column key2=10001 rowkey3=102, column key3=10002 rowkey4=103, column key4=10003 rowkey5=104, column key5=10004 but i am expecting the scenarios as below rowkey1=100, column key1=1 rowkey2=101, column key2=10001 rowkey3=102, column key3=10002 rowkey4=103, column key4=10003 rowkey5=104, column key5=10004 5 row keys, 5 colkumn keys = result data with 5 Note: we can try with multiple calls to cassandra for each entry set. Please suggest me any better option. Thanks Srinivas
Easy diff of schema from dev-production
Are there any easy/elegant ways to compare dev schema to production schema. I want to find if there are any rows/columns we need to add. I could try to format the output and just use 'diff' … but with the table options that isn't super clean either. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: Easy diff of schema from dev-production
I'd suggest looking at the system keyspace. Like schema_columns On Jul 8, 2014 9:39 AM, Kevin Burton bur...@spinn3r.com wrote: Are there any easy/elegant ways to compare dev schema to production schema. I want to find if there are any rows/columns we need to add. I could try to format the output and just use 'diff' … but with the table options that isn't super clean either. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: Easy diff of schema from dev-production
If you are using CQL3, the meta data are stored in system tables On Tue, Jul 8, 2014 at 5:38 PM, Kevin Burton bur...@spinn3r.com wrote: Are there any easy/elegant ways to compare dev schema to production schema. I want to find if there are any rows/columns we need to add. I could try to format the output and just use 'diff' … but with the table options that isn't super clean either. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: TTransportException (java.net.SocketException: Broken pipe)
Thanks Mark. Yes the 1024 is the limit. I haven't changed it as per the recommended production settings. But I am wondering why does Cassandra need to keep 3000+ commit log segment files open? Regards, Bhaskar On Tuesday, 8 July 2014 1:50 PM, Mark Reddy mark.re...@boxever.com wrote: Hi Bhaskar, Can you check your limits using 'ulimit -a'? The default is 1024, which needs to be increased if you have not done so already. Here you will find a list of recommended production settings: http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html Mark On Tue, Jul 8, 2014 at 5:30 AM, Bhaskar Singhal bhaskarsing...@yahoo.com wrote: Hi, I am using Cassandra 2.0.7 (with default settings and 16GB heap on quad core ubuntu server with 32gb ram) and trying to ingest 1MB values using cassandra-stress. It works fine for a while(1600secs) but after ingesting around 120GB data, I start getting the following error: Operation [70668] retried 10 times - error inserting key 0070668 ((TTransportException): java.net.SocketException: Broken pipe) The cassandra server is still running but in the system.log I see the below mentioned errors. ERROR [COMMIT-LOG-ALLOCATOR] 2014-07-07 22:39:23,617 CassandraDaemon.java (line 198) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main] java.lang.NoClassDefFoundError: org/apache/cassandra/db/commitlog/CommitLog$4 at org.apache.cassandra.db.commitlog.CommitLog.handleCommitError(CommitLog.java:374) at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:116) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.ClassNotFoundException: org.apache.cassandra.db.commitlog.CommitLog$4 at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 4 more Caused by: java.io.FileNotFoundException: /path/2.0.7/cassandra/build/classes/main/org/apache/cassandra/db/commitlog/CommitLog$4.class (Too many open files) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.init(FileInputStream.java:146) at sun.misc.URLClassPath$FileLoader$1.getInputStream(URLClassPath.java:1086) at sun.misc.Resource.cachedInputStream(Resource.java:77) at sun.misc.Resource.getByteBuffer(Resource.java:160) at java.net.URLClassLoader.defineClass(URLClassLoader.java:436) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) ... 10 more ERROR [FlushWriter:7] 2014-07-07 22:39:24,924 CassandraDaemon.java (line 198) Exception in thread Thread[FlushWriter:7,5,main] FSWriteError in /cassandra/data4/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-593-Filter.db at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:475) at org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:212) at org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:301) at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:417) at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.FileNotFoundException: /cassandra/data4/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-593-Filter.db (Too many open files) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:221) at java.io.FileOutputStream.init(FileOutputStream.java:110) at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:466) ... 9 more There are around 9685 open files by the Cassandra server process (using lsof), 3938 commit log segments in /cassandra/commitlog and around 572 commit log segments deleted during the course of the test. I am wondering what is causing Cassandra to open so many files, is the flushing slow? or something else? I tried increasing the flush writers, but that didn't help.
New application - separate column family or separate cluster?
Do you prefer purpose-specific Cassandra clusters that support a single application's data set, or a single Cassandra cluster that contains column families for many applications? I realize there is no ideal answer for every situation, but what have your experiences been in this area for cluster planning? My reason for asking is that we have one application with high data volume (multiple TB, thousands of writes/sec) that caused us to adopt Cassandra in the first place. Now we have the tools and cluster management infrastructure built up to the point where it is not a major investment to store smaller sets of data for other applications in C* also, and I am debating whether to: 1) Store everything in one large cluster (no isolation, low cost) 2) Use one cluster for the high-volume data, and one for everything else (good isolation, medium cost) 3) Give every major service its own cluster, even if they have small amounts of data (best isolation, highest cost) I suspect #2 is the way to go as far as balancing hosting costs and application performance isolation. Any pros or cons am I missing? -j
Re: Easy diff of schema from dev-production
Ah.. I think that's what I was hoping for! On Tue, Jul 8, 2014 at 9:05 AM, DuyHai Doan doanduy...@gmail.com wrote: If you are using CQL3, the meta data are stored in system tables On Tue, Jul 8, 2014 at 5:38 PM, Kevin Burton bur...@spinn3r.com wrote: Are there any easy/elegant ways to compare dev schema to production schema. I want to find if there are any rows/columns we need to add. I could try to format the output and just use 'diff' … but with the table options that isn't super clean either. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: Compaction causing listeners to stall
On Tue, Jul 8, 2014 at 5:31 AM, Bryon Spahn bsp...@kitedesk.com wrote: FSReadError in /mnt/data/cassandra/COMPANY/crmFieldInfo/COMPANYFieldInfo- jb-710-Data.db FSReadError seems relatively likely to be a problem on the underlying filesystem, though I don't have a sec right now to grep and check. 08 05:00:09,126 StorageService.java (line 364) Stopping gossiper 08 05:00:09,126 StorageService.java (line 278) Stopping gossip by operator request This sequence almost certainly means that the gossiper was in fact shut down by the JBOD functionality. After this the listeners are no longer available but the DB does not officially die just hangs and needs a restart. Other than during clean shutdown, there is no circumstance under which Cassandra the application terminates itself within the JVM. tl;dr - fix your broken disk. :) =Rob
Re: Cassandra use cases/Strengths/Weakness
On Fri, Jul 4, 2014 at 2:10 PM, DuyHai Doan doanduy...@gmail.com wrote: c. operational simplicity due to master-less architecture. This feature is, although quite transparent for developers, is a key selling point. Having suffered when installing manually a Hadoop cluster, I happen to love the deployment simplicity of C*, only one process per node, no moving parts. Asserting that Cassandra, as a fully functioning production system, is currently easier to operate than RDBMS is just false. It is still false even if we ignore the availability of experienced RDBMS operators and decades of RDBMS operational best practice. The quality of software engineering practice in RDBMS land also most assuredly results in a more easily operable system in many, many use cases. Yes, Cassandra is more tolerant to individual node failures. This turns out to not matter as much in terms of operability as non-operators appear to think it does. Very trivial operational activities (create a new columnfamily or replace a failed node) are subject to failure mode edge cases which often are not resolvable without brute force methods. I am unable to get my head around the oft-heard marketing assertion that a data-store in which such common activities are not bulletproof is capable of being than better to operate than the RDBMS status quo. The production operators I know also do not agree that Cassandra is simple to operate. All the above aside, I continue to maintain that Cassandra is the best at being the type of thing that it is. If you have a need to horizontally scale a use case that is well suited for its strength and poorly suited for RDBMS, you should use it. Far fewer people actually have this sort of case than think they do. =Rob
Re: Cassandra use cases/Strengths/Weakness
I've used various databases in production for over 10 years. Each has strengths and weaknesses. I ran Cassandra for just shy of 2 years in production as part of both development teams and operations, and I only hit 1 serious problem that Rob mentioned. Ideally C* would have guarded against it, but it did not. I did not have any downtime as a result, however. For those curious, I tried to add 1.2 nodes to a 1.1 cluster. Aside from that, I actually did find Cassandra simple to operate manage. I used Cassandra as more of a general purpose database. I was willing to give up some query flexibility in favor of high availability and multi dc support. There were times we needed to add more servers to deal with additional load, it handled it perfectly. For me it wasn't such a big problem, there's always optimizations that need to be made no matter what DB you use. Disclaimer: I now work for Datastax. On Tue, Jul 8, 2014 at 5:51 PM, Robert Coli rc...@eventbrite.com wrote: On Fri, Jul 4, 2014 at 2:10 PM, DuyHai Doan doanduy...@gmail.com wrote: c. operational simplicity due to master-less architecture. This feature is, although quite transparent for developers, is a key selling point. Having suffered when installing manually a Hadoop cluster, I happen to love the deployment simplicity of C*, only one process per node, no moving parts. Asserting that Cassandra, as a fully functioning production system, is currently easier to operate than RDBMS is just false. It is still false even if we ignore the availability of experienced RDBMS operators and decades of RDBMS operational best practice. The quality of software engineering practice in RDBMS land also most assuredly results in a more easily operable system in many, many use cases. Yes, Cassandra is more tolerant to individual node failures. This turns out to not matter as much in terms of operability as non-operators appear to think it does. Very trivial operational activities (create a new columnfamily or replace a failed node) are subject to failure mode edge cases which often are not resolvable without brute force methods. I am unable to get my head around the oft-heard marketing assertion that a data-store in which such common activities are not bulletproof is capable of being than better to operate than the RDBMS status quo. The production operators I know also do not agree that Cassandra is simple to operate. All the above aside, I continue to maintain that Cassandra is the best at being the type of thing that it is. If you have a need to horizontally scale a use case that is well suited for its strength and poorly suited for RDBMS, you should use it. Far fewer people actually have this sort of case than think they do. =Rob -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: New application - separate column family or separate cluster?
I've seen a lot of deployments, and I think you captured the scenarios and reasoning quite well. You can apply other nuances and details to #2 (e.g. segment based on SLA or topology), but I agree with all of your reasoning. -Tupshin -Global Field Strategy -Datastax On Jul 8, 2014 10:54 AM, Jeremy Jongsma jer...@barchart.com wrote: Do you prefer purpose-specific Cassandra clusters that support a single application's data set, or a single Cassandra cluster that contains column families for many applications? I realize there is no ideal answer for every situation, but what have your experiences been in this area for cluster planning? My reason for asking is that we have one application with high data volume (multiple TB, thousands of writes/sec) that caused us to adopt Cassandra in the first place. Now we have the tools and cluster management infrastructure built up to the point where it is not a major investment to store smaller sets of data for other applications in C* also, and I am debating whether to: 1) Store everything in one large cluster (no isolation, low cost) 2) Use one cluster for the high-volume data, and one for everything else (good isolation, medium cost) 3) Give every major service its own cluster, even if they have small amounts of data (best isolation, highest cost) I suspect #2 is the way to go as far as balancing hosting costs and application performance isolation. Any pros or cons am I missing? -j
Re: Cassandra use cases/Strengths/Weakness
“Is cassandra only for use cases with data load 100TB and massive user counts?” I wouldn’t make that extreme a statement! There are plenty of more moderate use cases for Cassandra. For example, a dozen nodes with 300 GB per node for just a few million users and their interactions and transactions. I would say that as a rough rule of thumb that a traditional RDBMS is great for up to low millions of rows, and Cassandra is clearly needed when you have more than a few hundred millions of rows. In between, it becomes a more subjective choice. Tens of millions of rows can probably be dealt with effectively by an RDBMS, but... you’re starting to have to be careful and configure high-end systems and manage them carefully. 100 million rows? Sure, you could still do that on an RDBMS if you are motivated and put in the effort. For example, some relational databases may require manual partitioning when you have more than 25 million rows or so. And then you have to pay attention to query latency as well. First big question: It may be 100 million rows today, but what growth rate do you anticipate? -- Jack Krupansky From: Matthias Hübner Sent: Saturday, July 5, 2014 5:49 AM To: user@cassandra.apache.org Subject: Re: Cassandra use cases/Strengths/Weakness Hi, i am a bit confused if cassandra is a choice for my use case especially after reading this thread. Is cassandra only for use cases with data load 100TB and massive user counts? What about all the other features of cassandra, are they not useable to avoid limitations of relational databases, even for smaller use cases? What do you think for my use case: I need to manage data data for around 1000 retail stores to produce each day a delivery plan (including predictions several weeks in the future) to refill the stores. For each store I have to collect data about every single store item. A store has some 10 thousand items. This makes around 100 million items to manage. Each day I have store some updates for every single store item. Also I receive for all items sale predictions day by day. Every day I have to produce one ore more delivery plans. Most data will replace old data, so its not increasing that much. I thought i can handle data load easier with cassandra than with mariadb. I don’t have to care about locking, I could write all incoming data and merge into my tables. And I could use aggregations. So I would be able to add all store item related data together that I need to compute my delivery plans. Finally I would be able to use commodity hardware and can scale easier. Have a nice weekend, Matthias 2014-07-05 0:37 GMT+02:00 Jack Krupansky j...@basetechnology.com: Elasticsearch and Solr are “search platforms”, not “databases”. The best description for Cassandra, especially for a CTO, is its home page: http://cassandra.apache.org/ Even if you have seen it before, please read it again. There is a lot packed into a few words. DataStax Enterprise (DSE) combines Cassandra, Hadoop and Spark for analytics, and tightly integrated Solr for rich search of the Cassandra data. The main, biggest benefit of Cassandra is that it is a master-free distributed real-time database designed for scale, including support for multiple data centers, so that it is ready for managing mission critical operational data, for applications that need low latency and high availability for real-time data access. And OpsCenter is great for managing a Cassandra or DSE cluster. I’m sure a CTO would appreciate it: http://www.datastax.com/what-we-offer/products-services/datastax-opscenter Here’s a feature comparison of some NoSQL databases: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis -- Jack Krupansky From: Prem Yadav Sent: Friday, July 4, 2014 10:37 AM To: user@cassandra.apache.org Subject: Cassandra use cases/Strengths/Weakness Hi, I have seen this in a lot of replies that Cassandra is not designed for this and that. I don't want to sound rude, i just need some info about this so that i can compare it to technologies like hbase, mongo, elasticsearch, solr, etc. 1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or ElasticSearch What is the use case(s) that suit Cassandra. 2) What kind of queries are best suited for Cassandra. I ask this Because I have seen people asking about queries and getting replies that its not suited for Cassandra. For ex: queries where large number of rows are requested and timeout happens. Or range queries or aggregate queries. 3) Where does Cassandra excel compared to other technologies? I have been working on Casandra for some time. I know how it works and I like it very much. We are moving towards building a big cluster. But at this point, I am not sure if its a right decision. A lot of people including me like Cassandra in my company. But it has more to do with the CQL and not the internals or the