sstablemetadata and sstablerepairedset not working with DSC on Debian

2014-12-18 Thread Jan Kesten
Hi, while curious on the new incremental repairs I updated our cluster to C* version 2.1.2 via the Debian apt-repository. Everything went quite well, but trying to start the tools sstablemetadata and sstablerepairedset lead to the following error: root@a01:/home/ifjke#

RE: Cassandra metrics Graphite

2014-12-18 Thread Nigel LEACH
Many thanks for information Dennis and Karl. I don’t think I can test until Monday, but I will let you know what (hopefully) works. Regards Nigel From: d...@aegisco.com [mailto:d...@aegisco.com] Sent: 17 December 2014 22:31 To: user@cassandra.apache.org Subject: Re: Cassandra metrics Graphite

Re: simple data movement ?

2014-12-18 Thread Ryan Svihla
I'm not sure that'll work with that many version moves in the middle, upgrades are to my knowledge only tested between specific steps, namely from 1.2.9 to the latest 2.0.x http://www.datastax.com/documentation/upgrade/doc/upgrade/cassandra/upgradeC_c.html Specifically: Cassandra 2.0.x

Re: bootstrapping manually when auto_bootstrap=false ?

2014-12-18 Thread Ryan Svihla
why auto_bootstrap=false? The documentation even suggests the opposite. If you don't auto_bootstrap the node will take queries before it has copies of all the data, and you'll get the wrong answer (it'd not be unlike using CL ONE when you've got a bunch of dropped mutations on a single node in the

Re: Cassandra for Analytics?

2014-12-18 Thread Ryan Svihla
I'd argue the higher latency for reads than HBase, I'm not sure of what experience you have with both, and that may have been true at one point, but with Leveled Compaction Strategy and proper JVM tunings I'm not sure how this is true, it would at least be comparable. I've worked with buffer

Re: Cassandra for Analytics?

2014-12-18 Thread Peter Lin
that depends on what you mean by real-time analytics. For things like continuous data streams, neither are appropriate platforms for doing analytics. They're good for storing the results (aka output) of the streaming analytics. I would suggest before you decide cassandra vs hbase, first figure

Re: Cassandra for Analytics?

2014-12-18 Thread Ryan Svihla
Since Ajay is already using spark the Spark Cassandra Connector really gets them where they want to be pretty easily https://github.com/datastax/spark-cassandra-connector (joins, etc). As far as spark streaming having basic support I'd challenge that assertion (namely Storm has a number of

Re: Cassandra for Analytics?

2014-12-18 Thread Peter Lin
some of the most common types of use cases in stream processing is sliding windows based on time or count. Based on my understanding of spark architecture and spark streaming, it does not provide the same functionality. One can fake it by setting spark streaming to really small micro-batches, but

Re: Cassandra for Analytics?

2014-12-18 Thread Ryan Svihla
I'll decline to continue the commentary on spark, as again this probably belongs on another list, other than to say, microbatches is an intentional design tradeoff that has notable benefits for the same use cases you're referring too, and that while you may disagree with those tradeoffs, it's a

Re: Cassandra for Analytics?

2014-12-18 Thread Peter Lin
for the record I think spark is good and I'm glad we have options. my point wasn't to bad mouth spark. I'm not comparing spark to storm at all, so I think there's some confusion here. I'm thinking of espers, streambase, and other stream processing products. My point is to think about the problems

Understanding tombstone WARN log output

2014-12-18 Thread Jens Rantil
Hi, I am occasionally seeing: WARN [ReadStage:9576] 2014-12-18 11:16:19,042 SliceQueryFilter.java (line 225) Read 756 live and 17027 tombstoned cells in mykeyspace.mytable (see tombstone_warn_threshold). 5001 columns was requested, slices=[73c31274-f45c-4ba5-884a-6d08d20597e7:myfield-],

Re: Cassandra for Analytics?

2014-12-18 Thread Ryan Svihla
My mistake on Storm, and I'm certain there are a number of use cases where you're right Spark isn't the right answer, but I'd argue your treating it like 0.5 Spark feature set wise instead of 1.1 Spark. As for filtering before persistence..this is the common use case for spark streaming and I've

Re: Cassandra for Analytics?

2014-12-18 Thread Ajay
Thanks Ryan and Peter for the suggestions. Our requirement(an ecommerce company) at a higher level is to build a Datawarehouse as a platform or service(for different product teams to consume) as below: Datawarehouse as a platform/service | Spark SQL

Re: Cassandra for Analytics?

2014-12-18 Thread Peter Lin
in the interest of knowledge sharing on the general topic of stream processing. the domain is quite old and there's a lot of existing literature. within this space there are several important factors which many products don't address: temporal windows (sliding windows, discrete windows, dynamic

Re: Cassandra for Analytics?

2014-12-18 Thread Peter Lin
by data warehouse, what kind do you mean? is it the traditional warehouse where people create multi-dimensional cubes? or is it the newer class of UI tools that makes it easier for users to explore data and the warehouse is mostly a denormalized (ie flattened) format of the OLTP? or is it a

Re: Cassandra for Analytics?

2014-12-18 Thread Ajay
Hi Peter, You are right.The idea is to directly query the data from No SQL, in our case via Spark SQL on Spark (as largely Spark support Mongo/Cassandra/HBase/Hadoop). As you said, the business users still need to query using Spark SQL. We are already using No SQL BI tools like Pentaho (which

Re: Cassandra for Analytics?

2014-12-18 Thread Colin
Almost every stream processing system I know of offers joins out of the box and has done so for years Even open source offerings like Esper have offered joins for years. What hasnt are systems like storm, spark, etc which I dont really classify as stream processors anyway. -- Colin

Re: Cassandra for Analytics?

2014-12-18 Thread Peter Lin
@Colin - I bounce back and forth on classifying storm and spark as stream processing frameworks. Clearly they are marketed as stream processing frameworks and they can process data streams. Even with the commercial stream processing products, expressing joins with some of the products is a bit

Replacing nodes disks

2014-12-18 Thread Or Sher
Hi all, We have a situation where some of our nodes have smaller disks and we would like to align all nodes by replacing the smaller disks to bigger ones without replacing nodes. We don't have enough space to put data on / disk and copy it back to the bigger disks so we would like to rebuild the

Re: Replacing nodes disks

2014-12-18 Thread Jens Rantil
Hi Or, You don't have another machine on the network that would temporarily be able to host your /var/lib/cassandra content? That way you would simply be scp:ing the files temporarily to another machine and copy them back when done. You obviously want to do a repair afterwards just in case, but

Re: bootstrapping manually when auto_bootstrap=false ?

2014-12-18 Thread Jonathan Haddad
I'd consider solving your root problem of people are starting and stopping servers in prod accidentally instead of making Cassandra more difficult to manage operationally. On Thu Dec 18 2014 at 4:04:34 AM Ryan Svihla rsvi...@datastax.com wrote: why auto_bootstrap=false? The documentation even

Re: Problem with very many small SSTables

2014-12-18 Thread Robert Coli
On Mon, Dec 15, 2014 at 12:41 AM, Mathijs Vogelzang math...@apptornado.com wrote: Would it be possible to trigger a manual partial compaction, to first compact 4x 256 tables? Could this be added to nodetool if it doesn't exist already? JMX call forceUserDefinedCompaction. =Rob

Re: bootstrapping manually when auto_bootstrap=false ?

2014-12-18 Thread Robert Coli
On Wed, Dec 17, 2014 at 7:04 PM, Kevin Burton bur...@spinn3r.com wrote: I’m trying to figure out the best way to bootstrap our nodes. I *think* I want our nodes to be manually bootstrapped. This way an admin has to explicitly bring up the node in the cluster and I don’t have to worry about

Re: In place vnode conversion possible?

2014-12-18 Thread Robert Coli
On Tue, Dec 16, 2014 at 12:38 AM, Jonas Borgström jo...@borgstrom.se wrote: That said, I've done some testing and it appears to be possible to perform an in place conversion as long as all nodes contain all data (3 nodes and replication factor 3 for example) like this: I would expect this to

Re: full gc too oftenvAquin p y l mmm am m

2014-12-18 Thread Y.Wong
V On Dec 4, 2014 11:14 PM, Philo Yang ud1...@gmail.com wrote: Hi,all I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with full gc that sometime there may be one or two nodes full gc more than one time per minute and over 10 seconds each time, then the node will be unreachable

Re: full gc too oftenvAquin p y l mmm am m

2014-12-18 Thread Jonathan Haddad
This topic comes up quite a bit. Enough, in fact, that I've done a 1 hour webinar on the topic. I cover how the JVM GC works and things you need to consider when tuning it for Cassandra. https://www.youtube.com/watch?v=7B_w6YDYSwA With your specific problem - full GC not reducing the old gen -

Practical use of counters in the industry

2014-12-18 Thread Rajath Subramanyam
Hi Folks, Have any of you come across blogs that describe how companies in the industry are using Cassandra counters practically. Thanks in advance. Regards, Rajath Rajath Subramanyam

Re: Practical use of counters in the industry

2014-12-18 Thread Ken Hancock
Here's one from Twitter... http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011 On Thu, Dec 18, 2014 at 6:08 PM, Rajath Subramanyam rajat...@gmail.com wrote: Hi Folks, Have any of you come across blogs that describe how companies in the industry are using

Re: Practical use of counters in the industry

2014-12-18 Thread Rajath Subramanyam
Thanks Ken. Any other use cases where counters are used apart from Rainbird ? Rajath Subramanyam On Thu, Dec 18, 2014 at 5:12 PM, Ken Hancock ken.hanc...@schange.com wrote: Here's one from Twitter...

Re: Replacing nodes disks

2014-12-18 Thread Kai Wang
do you have to replace those disks? can you simply add new disks to those nodes and configure C* to use JBOD? On Dec 18, 2014 10:18 AM, Or Sher or.sh...@gmail.com wrote: Hi all, We have a situation where some of our nodes have smaller disks and we would like to align all nodes by replacing

Cassandra 2.1.0 Crashes the JVM with OOM with heaps of memory free

2014-12-18 Thread Leon Oosterwijk
All, We have a Cassandra cluster which seems to be struggling a bit. I have one node which crashes continually, and others which crash sporadically. When they crash it's with a JVM couldn't allocate memory, even though there's heaps available. I suspect it's because one table which is very

2014 nosql benchmark

2014-12-18 Thread diwayou
i just have read this benchmark pdf, does anyone have some opinion about this? i think it's not fair about cassandra url:http://www.bankmark.de/wp-content/uploads/2014/12/bankmark-20141201-WP-NoSQLBenchmark.pdf‍ http://msrg.utoronto.ca/papers/NoSQLBenchmark‍

Re: Replacing nodes disks

2014-12-18 Thread Jan Kesten
Hi Or, I did some sort of this a while ago. If your machines do have a free disk slot - just put another disk there and use it as another data_file_directory. If not - as in my case: - grab an usb dock for disks - put the new one in there, plug in, format, mount to /mnt etc. - I did an

Re: 2014 nosql benchmark

2014-12-18 Thread Wilm Schumacher
Hi, I'm always interessted in such benchmark experiments, because the databases evolve so fast, that the race is always open and there is a lot motion in there. And of course I askes myself the same question. And I think that this publication is unreliable. For 4 reasons (from reading very fast,