Re: Presentations from NYC?

2011-12-27 Thread Brian O'Neill
Yep. They put them up here: http://www.datastax.com/events/cassandranyc2011/presentations -brian On Dec 27, 2011, at 4:52 AM, Alain RODRIGUEZ wrote: Anything new about this ? I'm specifically interestead in the Joe Stein (Medialets) talk about how to manage real-time multidimensional

Re: better anti OOM

2011-12-27 Thread Radim Kolar
I don't know what you are basing that on. It seems unlikely to me that the working set of a compaction is 600 MB. However, it may very well be that the allocation rate is such that it contributes to an additional 600 MB average heap usage after a CMS phase has completed. I will investigate

Previously deleted rows resurrected by repair?

2011-12-27 Thread Jonas Borgström
Hi, I Have a 3 node cluster running Cassandra 1.0.3 and using replication factor=3. Recently I've noticed that some previously deleted rows have started to reappear for some reason. And now I wonder if this is a known issue with 1.0.3? Repairs have been running every weekend (gc_grace is

Re: Newbie question about writer/reader consistency

2011-12-27 Thread R. Verlangen
You might consider a hybrid solution with a transactional db for all data that should be ACID complient and Cassandra for the huge amounts of data you want to store. 2011/12/27 Radim Kolar h...@sendmail.cz makes me feel disappointed about consistency in Cassandra, but I wonder is there is a

Re: Peregrine: A new map reduce framework for iterative/pipelined jobs.

2011-12-27 Thread Brian O'Neill
Kevin, I just pulled the code and read through the design. Great stuff. Any thought to potentially using this for real-time processing as well? Right now, we have a set of Hadoop M/R jobs that operate against Cassandra for ETL. We were looking at using Storm for the real-time processing

Re: Presentations from NYC?

2011-12-27 Thread Alain RODRIGUEZ
Anything new about this ? I'm specifically interestead in the Joe Stein (Medialets) talk about how to manage real-time multidimensional metrics. 2011/12/10 Jonathan Ellis jbel...@gmail.com Not yet -- we're working on it. On Fri, Dec 9, 2011 at 1:48 PM, Brian O'Neill b...@alumni.brown.edu

index sampling

2011-12-27 Thread Radim Kolar
That is a good reason for both to be configurable IMO. index sampling is currently configurable only per node, it would be better to have it per Keyspace because we are using OLTP like and OLAP keyspaces in same cluster. OLAP Keyspaces has about 1000x more rows. But its difficult to estimate

Re: Peregrine: A new map reduce framework for iterative/pipelined jobs.

2011-12-27 Thread Kevin Burton
A key innovation here is a partitioning layout algorithm that can support fast many to many recovery similar to HDFS but still support partitioned operation with deterministic key placement. Thanks for your contribution. Is here more detail info on this point? yes... our design

Re: Peregrine: A new map reduce framework for iterative/pipelined jobs.

2011-12-27 Thread Zhu Han
On Tue, Dec 27, 2011 at 2:31 PM, Kevin Burton burtona...@gmail.com wrote: I'm pleased to announce Peregrine 0.5.0 - a new map reduce framework optimized for iterative and pipelined map reduce jobs. http://peregrine_mapreduce.bitbucket.org/ This originally started off with some internal

Re: Newbie question about writer/reader consistency

2011-12-27 Thread Radim Kolar
makes me feel disappointed about consistency in Cassandra, but I wonder is there is a way to work around it. cassandra is not suitable for this kind of programs. CouchDB is slightly better, it has transactions but no locking and i am not sure if transaction isolation is supported now. mongodb

Re: better anti OOM

2011-12-27 Thread Peter Schuller
I will investigate situation more closely using gc via jconsole, but isn't bloom filter for new sstable entirely in memory? On disk there are only 2 files Index and Data. -rw-r--r--  1 root  wheel   1388969984 Dec 27 09:25 sipdb-tmp-hc-4634-Index.db -rw-r--r--  1 root  wheel  10965221376 Dec

Re: Newbie question about writer/reader consistency

2011-12-27 Thread Radim Kolar
But is there any way of implementing minimum required ACID subset on top of Cassandra? try this, its nosql ACID compliant. I haven't tested this, it will have most likely pretty slow writes and lot of bugs like any other oracle application.

Re: index sampling

2011-12-27 Thread Peter Schuller
on node with 300m rows (small node), it will be 585937 index sample entries with 512 sampling. lets say 100 bytes per entry this will be 585 MB, bloom filters are 884 MB. With default sampling 128, sampled entries will use majority of node memory. Index sampling should be reworked like bloom

Re: will compaction delete empty rows after all columns expired?

2011-12-27 Thread Peter Schuller
Compaction should delete empty rows once gc_grace_seconds is passed, right? Yes. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: will compaction delete empty rows after all columns expired?

2011-12-27 Thread Peter Schuller
Compaction should delete empty rows once gc_grace_seconds is passed, right? Yes. But just to be extra clear: Data will not actually be removed once the row in question participates in compaction. Compactions will not be actively triggered by Cassandra for tombstone processing reasons. -- /

Re: will compaction delete empty rows after all columns expired?

2011-12-27 Thread Radim Kolar
But just to be extra clear: Data will not actually be removed once the row in question participates in compaction. Compactions will not be actively triggered by Cassandra for tombstone processing reasons. leveled compaction is really good for this because it compacts often

improving cassandra-vs-mongodb-vs-couchdb-vs-redis

2011-12-27 Thread Igor Lino
Hi! I was trying to get an understanding of the real strengths of Cassandra against other competitors. Its actually not that simple and depends a lot on details on the actual requirements. Reading the following comparison: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis It felt

Re: improving cassandra-vs-mongodb-vs-couchdb-vs-redis

2011-12-27 Thread Edward Capriolo
This is not really a comparison of anything because each NoSQL has its own bullet points like: Boats great for traveling on water Cars great for traveling on land So the conclusion I should gather is? Also as for the Cassandra bullet points, they are really thin (and wrong). Such as:

Re: improving cassandra-vs-mongodb-vs-couchdb-vs-redis

2011-12-27 Thread Peter Schuller
Also when comparing these technologies very subtle differences in design have profound in effects in operation and performance. Thus someone trying to paper over 6 technologies and compare them with a few bullet points is really doing the world an injustice. +1. Same goes for 99% of all

Restart for change of endpoint_snitch ?

2011-12-27 Thread A J
If I change endpoint_snitch from SimpleSnitch to PropertyFileSnitch, does it require restart of cassandra on that node ? Thanks.

new configurable bloom filters - coming soon

2011-12-27 Thread Radim Kolar
demo, it will be in cassandra 1.0.7 standard cassa bloom filter -rw-r--r-- 1 root wheel 19307376721 Dec 27 20:06 sipdb-hc-4634-Data.db -rw-r--r-- 1 root wheel 63 Dec 27 20:06 sipdb-hc-4634-Digest.sha1 -rw-r--r-- 1 root wheel770714896 Dec 27 20:06 sipdb-hc-4634-Filter.db

Re: improving cassandra-vs-mongodb-vs-couchdb-vs-redis

2011-12-27 Thread Igor Lino
You are totally right. I'm far from being an expert on the subject, but the comparison felt inconsistent and incomplete. (I could not express that in my 1st email, not to bias the opinion) Do you know of any similar comparison, which is not biased towards some particular technology or

Re: improving cassandra-vs-mongodb-vs-couchdb-vs-redis

2011-12-27 Thread CharSyam
Don't trust NoSQL Benchmark. It's not a lie. but. NoSQL has different performance in many different environment. Do Benchmark with your real environment. and choose it. Thank you. 2011/12/28 Igor Lino icampi...@gmx.de You are totally right. I'm far from being an expert on the subject, but

Re: Restart for change of endpoint_snitch ?

2011-12-27 Thread Peter Schuller
If I change endpoint_snitch from SimpleSnitch to PropertyFileSnitch, does it require restart of cassandra on that node ? Yes. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: better anti OOM

2011-12-27 Thread Edward Capriolo
I do major companions and I have ran into bloom filters causing oom. One trick I did was using nodetool to lower the size of row/key caches before triggering the compact and raising them after companion finished. As suggested running with spare heap is a very good idea it lowers the chance of a

cassandra site wsod's /mysql site functions

2011-12-27 Thread Tim Dunphy
hello, I am new to the world of non-relational databases. Cassandra is refreshingly easy to setup and has a great command line environment. I genuinely like the command line tools and look forward to learning more. However I have been asked to setup a php/cassandra site that also has some mysql