Re: TTransportException (java.net.SocketException: Broken pipe)

2014-07-08 Thread Mark Reddy
Hi Bhaskar,

Can you check your limits using 'ulimit -a'? The default is 1024, which
needs to be increased if you have not done so already.

Here you will find a list of recommended production settings:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html


Mark

On Tue, Jul 8, 2014 at 5:30 AM, Bhaskar Singhal bhaskarsing...@yahoo.com
wrote:

 Hi,

 I am using Cassandra 2.0.7 (with default settings and 16GB heap on quad
 core ubuntu server with 32gb ram) and trying to ingest 1MB values using
 cassandra-stress. It works fine for a while(1600secs) but after ingesting
 around 120GB data, I start getting the following error:
 Operation [70668] retried 10 times - error inserting key 0070668
 ((TTransportException): java.net.SocketException: Broken pipe)

 The cassandra server is still running but in the system.log I see the
 below mentioned errors.

 ERROR [COMMIT-LOG-ALLOCATOR] 2014-07-07 22:39:23,617 CassandraDaemon.java
 (line 198) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
 java.lang.NoClassDefFoundError:
 org/apache/cassandra/db/commitlog/CommitLog$4
 at
 org.apache.cassandra.db.commitlog.CommitLog.handleCommitError(CommitLog.java:374)
 at
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:116)
 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.cassandra.db.commitlog.CommitLog$4
 at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 4 more
 Caused by: java.io.FileNotFoundException:
 /path/2.0.7/cassandra/build/classes/main/org/apache/cassandra/db/commitlog/CommitLog$4.class
 (Too many open files)
 at java.io.FileInputStream.open(Native Method)
 at java.io.FileInputStream.init(FileInputStream.java:146)
 at
 sun.misc.URLClassPath$FileLoader$1.getInputStream(URLClassPath.java:1086)
 at sun.misc.Resource.cachedInputStream(Resource.java:77)
 at sun.misc.Resource.getByteBuffer(Resource.java:160)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:436)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 ... 10 more
 ERROR [FlushWriter:7] 2014-07-07 22:39:24,924 CassandraDaemon.java (line
 198) Exception in thread Thread[FlushWriter:7,5,main]
 FSWriteError in
 /cassandra/data4/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-593-Filter.db
 at
 org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:475)
 at
 org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:212)
 at
 org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:301)
 at
 org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:417)
 at
 org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350)
 at
 org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.FileNotFoundException:
 /cassandra/data4/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-593-Filter.db
 (Too many open files)
 at java.io.FileOutputStream.open(Native Method)
 at java.io.FileOutputStream.init(FileOutputStream.java:221)
 at java.io.FileOutputStream.init(FileOutputStream.java:110)
 at
 org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:466)
 ... 9 more

 There are around 9685 open files by the Cassandra server process (using
 lsof), 3938 commit log segments in /cassandra/commitlog and around 572
 commit log segments deleted during the course of the test.

 I am wondering what is causing Cassandra to open so many files, is the
 flushing slow? or something else?

 I tried increasing the flush writers, but that didn't help.


 Regards,
 Bhaskar


 CREATE KEYSPACE Keyspace1 WITH replication = {
   'class': 'SimpleStrategy',
   'replication_factor': '1'
 };

 CREATE TABLE Standard1 (
   key blob,
   C0 blob,
   PRIMARY KEY (key)
 ) WITH COMPACT STORAGE AND
   

RE: Compaction causing listeners to stall

2014-07-08 Thread Bryon Spahn
Robert,



New development today:



FSReadError in
/mnt/data/cassandra/COMPANY/crmFieldInfo/COMPANYFieldInfo-jb-710-Data.db

at
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:95)

at
org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer(CompressedThrottledReader.java:41)

at
org.apache.cassandra.io.util.RandomAccessReader.read(RandomAccessReader.java:323)

at java.io.RandomAccessFile.readFully(RandomAccessFile.java:444)

at java.io.RandomAccessFile.readFully(RandomAccessFile.java:424)

at
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:348)

at
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)

at
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)

at
org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)

at
org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)

at org.apache.cassandra.db.Column$1.computeNext(Column.java:75)

at org.apache.cassandra.db.Column$1.computeNext(Column.java:64)

at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)

at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)

at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:185)

at
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Deserializer$1.runMayThrow(ParallelCompactionIterable.java:271)

at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)

at java.lang.Thread.run(Thread.java:744)

Caused by: java.nio.channels.ClosedChannelException

at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)

at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:250)

at
org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:101)

at
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:87)

... 17 more



Followed by:



ERROR [Deserialize
SSTableReader(path='/mnt/data/cassandra/COMPANY/crmFieldInfo/COMPANYFieldInfo-jb-710-Data.db')]
2014-07-08 05:00:09,126 StorageService.java (line 364) Stopping gossiper

WARN [Deserialize
SSTableReader(path='/mnt/data/cassandra/COMPANY/crmFieldInfo/COMPANYFieldInfo-jb-710-Data.db')]
2014-07-08 05:00:09,126 StorageService.java (line 278) Stopping gossip by
operator request

INFO [Deserialize
SSTableReader(path='/mnt/data/cassandra/COMPANY/crmFieldInfo/COMPANYFieldInfo-jb-710-Data.db')]
2014-07-08 05:00:09,126 Gossiper.java (line 1251) Announcing shutdown



After this the listeners are no longer available but the DB does not
officially die just hangs and needs a restart.



*From:* Robert Coli [mailto:rc...@eventbrite.com]
*Sent:* Monday, July 7, 2014 6:55 PM
*To:* user@cassandra.apache.org
*Subject:* Re: Compaction causing listeners to stall



On Mon, Jul 7, 2014 at 5:20 AM, Bryon Spahn bsp...@kitedesk.com wrote:

I am experiencing a strange issue where we run a compaction job weekly and
as a result, the listeners stall. This is a single node cluster running on
an i2.2xl instance in AWS. We are getting the message:



There are almost no cases where it makes sense to run a single node of
Cassandra, especially in production.



*[StorageServiceShutdownHook]*



I bet you a donut that you're OOMing the JVM. Stop doing that, and your
Cassandra node will stop crashing.



https://issues.apache.org/jira/browse/CASSANDRA-7507



Is probably the case you have just hit.



Basically, in some pathological circumstances, the JVM will send Cassandra
a signal that it handles as if you were an operator attempting a clean
shutdown. This probably usually does not succeed, but may be worth a shot.



=Rob


help on querying cassandra

2014-07-08 Thread srinivas rao
Hi All,

When i query the cassandra, it creates a cartesion produt on my input
request (row keys * column keys). Is there any way to query cassandra with
map kind of input as below.


current scenarios

5 row keys, 5 column keys as below = result data with 25 (if data is
available).
example

rowkey1=100, column key1=1
rowkey2=101, column key2=10001
rowkey3=102, column key3=10002
rowkey4=103, column key4=10003
rowkey5=104, column key5=10004


but i am expecting the scenarios as below

rowkey1=100, column key1=1
rowkey2=101, column key2=10001
rowkey3=102, column key3=10002
rowkey4=103, column key4=10003
rowkey5=104, column key5=10004

5 row keys, 5 colkumn keys = result data with 5



Note: we can try with multiple calls to cassandra for each entry set.
Please suggest me any better option.


Thanks
Srinivas


Easy diff of schema from dev-production

2014-07-08 Thread Kevin Burton
Are there any easy/elegant ways to compare dev schema to production schema.
 I want to find if there are any rows/columns we need to add.

I could try to format the output and just use 'diff' … but with the table
options that isn't super clean either.



-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: Easy diff of schema from dev-production

2014-07-08 Thread Shane Hansen
I'd suggest looking at the system keyspace. Like schema_columns
On Jul 8, 2014 9:39 AM, Kevin Burton bur...@spinn3r.com wrote:

 Are there any easy/elegant ways to compare dev schema to production
 schema.  I want to find if there are any rows/columns we need to add.

 I could try to format the output and just use 'diff' … but with the table
 options that isn't super clean either.



 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




Re: Easy diff of schema from dev-production

2014-07-08 Thread DuyHai Doan
If you are using CQL3, the meta data are stored in system tables


On Tue, Jul 8, 2014 at 5:38 PM, Kevin Burton bur...@spinn3r.com wrote:

 Are there any easy/elegant ways to compare dev schema to production
 schema.  I want to find if there are any rows/columns we need to add.

 I could try to format the output and just use 'diff' … but with the table
 options that isn't super clean either.



 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




Re: TTransportException (java.net.SocketException: Broken pipe)

2014-07-08 Thread Bhaskar Singhal
Thanks Mark. Yes the 1024 is the limit. I haven't changed it as per the 
recommended production settings.

But I am wondering why does Cassandra need to keep 3000+ commit log segment 
files open?

Regards,
Bhaskar



On Tuesday, 8 July 2014 1:50 PM, Mark Reddy mark.re...@boxever.com wrote:
 


Hi Bhaskar,

Can you check your limits using 'ulimit -a'? The default is 1024, which needs 
to be increased if you have not done so already.

Here you will find a list of recommended production settings: 
http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html


Mark


On Tue, Jul 8, 2014 at 5:30 AM, Bhaskar Singhal bhaskarsing...@yahoo.com 
wrote:

Hi,



I am using Cassandra 2.0.7 (with default settings and 16GB heap on quad core 
ubuntu server with 32gb ram) and trying to ingest 1MB values using 
cassandra-stress. It works fine for a while(1600secs) but after ingesting 
around 120GB data, I start getting the following error:
Operation [70668] retried 10 times - error inserting key 0070668 
((TTransportException): java.net.SocketException: Broken pipe)



The cassandra server is still running but in the system.log I see the below 
mentioned errors.



ERROR [COMMIT-LOG-ALLOCATOR] 2014-07-07 22:39:23,617 CassandraDaemon.java 
(line 198) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
java.lang.NoClassDefFoundError: org/apache/cassandra/db/commitlog/CommitLog$4
    at 
org.apache.cassandra.db.commitlog.CommitLog.handleCommitError(CommitLog.java:374)
    at 
org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:116)
    at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassNotFoundException: 
org.apache.cassandra.db.commitlog.CommitLog$4
    at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 4 more
Caused by: java.io.FileNotFoundException: 
/path/2.0.7/cassandra/build/classes/main/org/apache/cassandra/db/commitlog/CommitLog$4.class
 (Too many open files)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.init(FileInputStream.java:146)
    at 
sun.misc.URLClassPath$FileLoader$1.getInputStream(URLClassPath.java:1086)
    at sun.misc.Resource.cachedInputStream(Resource.java:77)
    at sun.misc.Resource.getByteBuffer(Resource.java:160)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:436)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    ... 10 more
ERROR [FlushWriter:7] 2014-07-07 22:39:24,924 CassandraDaemon.java (line 198) 
Exception in thread
 Thread[FlushWriter:7,5,main]
FSWriteError in 
/cassandra/data4/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-593-Filter.db
    at 
org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:475)
    at 
org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:212)
    at 
org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:301)
    at 
org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:417)
    at 
org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350)
    at
 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
    at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.FileNotFoundException: 
/cassandra/data4/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-593-Filter.db 
(Too many open files)
    at java.io.FileOutputStream.open(Native Method)
    at
 java.io.FileOutputStream.init(FileOutputStream.java:221)
    at java.io.FileOutputStream.init(FileOutputStream.java:110)
    at 
org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:466)
    ... 9 more



There are around 9685 open files by the Cassandra server process (using lsof), 
3938 commit log segments in /cassandra/commitlog and around 572 commit log 
segments deleted during the course of the test.


I am wondering what is causing Cassandra to open so many files, is the 
flushing slow? or something else?


I tried increasing the flush writers, but that didn't help. 


New application - separate column family or separate cluster?

2014-07-08 Thread Jeremy Jongsma
Do you prefer purpose-specific Cassandra clusters that support a single
application's data set, or a single Cassandra cluster that contains column
families for many applications? I realize there is no ideal answer for
every situation, but what have your experiences been in this area for
cluster planning?

My reason for asking is that we have one application with high data volume
(multiple TB, thousands of writes/sec) that caused us to adopt Cassandra in
the first place. Now we have the tools and cluster management
infrastructure built up to the point where it is not a major investment to
store smaller sets of data for other applications in C* also, and I am
debating whether to:

1) Store everything in one large cluster (no isolation, low cost)
2) Use one cluster for the high-volume data, and one for everything else
(good isolation, medium cost)
3) Give every major service its own cluster, even if they have small
amounts of data (best isolation, highest cost)

I suspect #2 is the way to go as far as balancing hosting costs and
application performance isolation. Any pros or cons am I missing?

-j


Re: Easy diff of schema from dev-production

2014-07-08 Thread Kevin Burton
Ah.. I think that's what I was hoping for!


On Tue, Jul 8, 2014 at 9:05 AM, DuyHai Doan doanduy...@gmail.com wrote:

 If you are using CQL3, the meta data are stored in system tables


 On Tue, Jul 8, 2014 at 5:38 PM, Kevin Burton bur...@spinn3r.com wrote:

 Are there any easy/elegant ways to compare dev schema to production
 schema.  I want to find if there are any rows/columns we need to add.

 I could try to format the output and just use 'diff' … but with the table
 options that isn't super clean either.



 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com





-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: Compaction causing listeners to stall

2014-07-08 Thread Robert Coli
On Tue, Jul 8, 2014 at 5:31 AM, Bryon Spahn bsp...@kitedesk.com wrote:

 FSReadError in /mnt/data/cassandra/COMPANY/crmFieldInfo/COMPANYFieldInfo-
 jb-710-Data.db


FSReadError seems relatively likely to be a problem on the underlying
filesystem, though I don't have a sec right now to grep and check.


 08 05:00:09,126 StorageService.java (line 364) Stopping gossiper

 08 05:00:09,126 StorageService.java (line 278) Stopping gossip by operator
 request


This sequence almost certainly means that the gossiper was in fact shut
down by the JBOD functionality.


 After this the listeners are no longer available but the DB does not
 officially die just hangs and needs a restart.


Other than during clean shutdown, there is no circumstance under which
Cassandra the application terminates itself within the JVM.

 tl;dr - fix your broken disk. :)

=Rob


Re: Cassandra use cases/Strengths/Weakness

2014-07-08 Thread Robert Coli
On Fri, Jul 4, 2014 at 2:10 PM, DuyHai Doan doanduy...@gmail.com wrote:

  c. operational simplicity due to master-less architecture. This feature
 is, although quite transparent for developers, is a key selling point.
 Having suffered when installing manually a Hadoop cluster, I happen to love
 the deployment simplicity of C*, only one process per node, no moving parts.


Asserting that Cassandra, as a fully functioning production system, is
currently easier to operate than RDBMS is just false. It is still false
even if we ignore the availability of experienced RDBMS operators and
decades of RDBMS operational best practice.

The quality of software engineering practice in RDBMS land also most
assuredly results in a more easily operable system in many, many use cases.
Yes, Cassandra is more tolerant to individual node failures. This turns out
to not matter as much in terms of operability as non-operators appear to
think it does. Very trivial operational activities (create a new
columnfamily or replace a failed node) are subject to failure mode edge
cases which often are not resolvable without brute force methods.

I am unable to get my head around the oft-heard marketing assertion that a
data-store in which such common activities are not bulletproof is capable
of being than better to operate than the RDBMS status quo. The production
operators I know also do not agree that Cassandra is simple to operate.

All the above aside, I continue to maintain that Cassandra is the best at
being the type of thing that it is. If you have a need to horizontally
scale a use case that is well suited for its strength and poorly suited for
RDBMS, you should use it. Far fewer people actually have this sort of case
than think they do.

=Rob


Re: Cassandra use cases/Strengths/Weakness

2014-07-08 Thread Jonathan Haddad
I've used various databases in production for over 10 years.  Each has
strengths and weaknesses.

I ran Cassandra for just shy of 2 years in production as part of both
development teams and operations, and I only hit 1 serious problem
that Rob mentioned.  Ideally C* would have guarded against it, but it
did not.  I did not have any downtime as a result, however.  For those
curious, I tried to add 1.2 nodes to a 1.1 cluster.  Aside from that,
I actually did find Cassandra simple to operate  manage.

I used Cassandra as more of a general purpose database.  I was willing
to give up some query flexibility in favor of high availability and
multi dc support.  There were times we needed to add more servers to
deal with additional load, it handled it perfectly.

For me it wasn't such a big problem, there's always optimizations that
need to be made no matter what DB you use.

Disclaimer: I now work for Datastax.


On Tue, Jul 8, 2014 at 5:51 PM, Robert Coli rc...@eventbrite.com wrote:
 On Fri, Jul 4, 2014 at 2:10 PM, DuyHai Doan doanduy...@gmail.com wrote:

  c. operational simplicity due to master-less architecture. This feature
 is, although quite transparent for developers, is a key selling point.
 Having suffered when installing manually a Hadoop cluster, I happen to love
 the deployment simplicity of C*, only one process per node, no moving parts.


 Asserting that Cassandra, as a fully functioning production system, is
 currently easier to operate than RDBMS is just false. It is still false even
 if we ignore the availability of experienced RDBMS operators and decades of
 RDBMS operational best practice.

 The quality of software engineering practice in RDBMS land also most
 assuredly results in a more easily operable system in many, many use cases.
 Yes, Cassandra is more tolerant to individual node failures. This turns out
 to not matter as much in terms of operability as non-operators appear to
 think it does. Very trivial operational activities (create a new
 columnfamily or replace a failed node) are subject to failure mode edge
 cases which often are not resolvable without brute force methods.

 I am unable to get my head around the oft-heard marketing assertion that a
 data-store in which such common activities are not bulletproof is capable of
 being than better to operate than the RDBMS status quo. The production
 operators I know also do not agree that Cassandra is simple to operate.

 All the above aside, I continue to maintain that Cassandra is the best at
 being the type of thing that it is. If you have a need to horizontally scale
 a use case that is well suited for its strength and poorly suited for RDBMS,
 you should use it. Far fewer people actually have this sort of case than
 think they do.

 =Rob



-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: New application - separate column family or separate cluster?

2014-07-08 Thread Tupshin Harper
I've seen a lot of deployments, and I think you captured the scenarios and
reasoning quite well. You can apply other nuances and details to #2 (e.g.
segment based on SLA or topology), but I agree with all of your reasoning.

-Tupshin
-Global Field Strategy
-Datastax
On Jul 8, 2014 10:54 AM, Jeremy Jongsma jer...@barchart.com wrote:

 Do you prefer purpose-specific Cassandra clusters that support a single
 application's data set, or a single Cassandra cluster that contains column
 families for many applications? I realize there is no ideal answer for
 every situation, but what have your experiences been in this area for
 cluster planning?

 My reason for asking is that we have one application with high data volume
 (multiple TB, thousands of writes/sec) that caused us to adopt Cassandra in
 the first place. Now we have the tools and cluster management
 infrastructure built up to the point where it is not a major investment to
 store smaller sets of data for other applications in C* also, and I am
 debating whether to:

 1) Store everything in one large cluster (no isolation, low cost)
 2) Use one cluster for the high-volume data, and one for everything else
 (good isolation, medium cost)
 3) Give every major service its own cluster, even if they have small
 amounts of data (best isolation, highest cost)

 I suspect #2 is the way to go as far as balancing hosting costs and
 application performance isolation. Any pros or cons am I missing?

 -j



Re: Cassandra use cases/Strengths/Weakness

2014-07-08 Thread Jack Krupansky
“Is cassandra only for use cases with data load  100TB and massive user 
counts?”

I wouldn’t make that extreme a statement! There are plenty of more moderate use 
cases for Cassandra. For example, a dozen nodes with 300 GB per node for just a 
few million users and their interactions and transactions.

I would say that as a rough rule of thumb that a traditional RDBMS is great for 
up to low millions of rows, and Cassandra is clearly needed when you have more 
than a few hundred millions of rows. In between, it becomes a more subjective 
choice.

Tens of millions of rows can probably be dealt with effectively by an RDBMS, 
but... you’re starting to have to be careful and configure high-end systems and 
manage them carefully. 100 million rows? Sure, you could still do that on an 
RDBMS if you are motivated and put in the effort. For example, some relational 
databases may require manual partitioning when you have more than 25 million 
rows or so. And then you have to pay attention to query latency as well.

First big question: It may be 100 million rows today, but what growth rate do 
you anticipate?

-- Jack Krupansky

From: Matthias Hübner 
Sent: Saturday, July 5, 2014 5:49 AM
To: user@cassandra.apache.org 
Subject: Re: Cassandra use cases/Strengths/Weakness

Hi,

i am a bit confused if cassandra is a choice for my use case especially after 
reading this thread.


Is cassandra only for use cases with data load  100TB and massive user counts?


What about all the other features of cassandra, are they not useable to avoid 
limitations of relational databases, even for smaller use cases?


What do you think for my use case:


I need to manage data data for around 1000 retail stores to produce each day a 
delivery plan (including predictions several weeks in the future) to refill the 
stores. For each store I have to collect data about every single store item. A 
store has some 10 thousand items. This makes around 100 million items to 
manage. Each day I have store some updates for every single store item. Also I 
receive for all items sale predictions day by day. Every day I have to produce 
one ore more delivery plans. Most data will replace old data, so its not 
increasing that much. 

I thought i can handle data load easier with cassandra than with mariadb. I 
don’t have to care about locking, I could write all incoming data and merge 
into my tables. And I could use aggregations. So I would be able to add all 
store item related data together that I need to compute my delivery plans. 
Finally I would be able to use commodity hardware and can scale easier.




Have a nice weekend,

Matthias








2014-07-05 0:37 GMT+02:00 Jack Krupansky j...@basetechnology.com:

  Elasticsearch and Solr are “search platforms”, not “databases”. The best 
description for Cassandra, especially for a CTO, is its home page:
  http://cassandra.apache.org/
  Even if you have seen it before, please read it again. There is a lot packed 
into a few words.

  DataStax Enterprise (DSE) combines Cassandra, Hadoop and Spark for analytics, 
and tightly integrated Solr for rich search of the Cassandra data.

  The main, biggest benefit of Cassandra is that it is a master-free 
distributed real-time database designed for scale, including support for 
multiple data centers, so that it is ready for managing mission critical 
operational data, for applications that need low latency and high availability 
for real-time data access.

  And OpsCenter is great for managing a Cassandra or DSE cluster. I’m sure a 
CTO would appreciate it:
  http://www.datastax.com/what-we-offer/products-services/datastax-opscenter

  Here’s a feature comparison of some NoSQL databases:
  http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

  -- Jack Krupansky

  From: Prem Yadav 
  Sent: Friday, July 4, 2014 10:37 AM
  To: user@cassandra.apache.org 
  Subject: Cassandra use cases/Strengths/Weakness

  Hi,
  I have seen this in a lot of replies that Cassandra is not designed for this 
and that. I don't want to sound rude, i just need some info about this so that 
i can compare it to technologies like hbase, mongo, elasticsearch, solr, etc. 

  1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or 
ElasticSearch
  What is the use case(s) that suit Cassandra.

  2) What kind of queries are best suited for Cassandra.
  I ask this Because I have seen people asking about queries and getting 
replies that its not suited for Cassandra. For ex: queries where large number 
of rows are requested and timeout happens. Or range queries or aggregate 
queries.



  3) Where does Cassandra excel compared to other technologies?

  I have been working on Casandra for some time. I know how it works and I like 
it very much. 
  We are moving towards building a big cluster. But at this point, I am not 
sure if its a right decision. 

  A lot of people including me like Cassandra in my company. But it has more to 
do with the CQL and not the internals or the