How to deal with SSTable FileNotFoundException?

2014-06-30 Thread Philo Yang
Hi all, I have a Cassandra 2.0.6 cluster with 12 nodes. I find in some nodes' system.log, there are many RuntimeException such like: java.lang.RuntimeException: java.io.FileNotFoundException: /disk4/cassandra/data/{keyspace}/{cfname}/{keyspace}-{cfname}-jb-87-Data.db (No such file or directory)

Any better solution to avoid TombstoneOverwhelmingException?

2014-06-30 Thread Jason Tang
Our application will use Cassandra to persistent for asynchronous tasks, so in one time period, lots of records will be created in Cassandra (more then 10M). Later it will be executed. Due to disk space limitation, the executed records will be deleted. After gc_grace_seconds, it is expected to be

Re: Does 'tracing on' always write to the system tracing table?

2014-06-30 Thread DuyHai Doan
If I enable per request tracing, does this always make it into the system traces table -- Yes, but asynchronously, and data are stored with a TTL (don't remember how long). It means that even though your request returns successfully, the complete traces might not be completely written to the

Re: Any better solution to avoid TombstoneOverwhelmingException?

2014-06-30 Thread DuyHai Doan
Why don't you store all current data into one partition and for the next round of execution, switch to a new partition ? This way you don't even need to remove data (if you insert with a given TTL) On Mon, Jun 30, 2014 at 8:43 AM, Jason Tang ares.t...@gmail.com wrote: Our application will use

Best way to delete by day?

2014-06-30 Thread Wim Deblauwe
Hi, I am getting started with Cassandra (coming from MySQL). I have made a table with timeseries data (inspired on http://planetcassandra.org/blog/post/getting-started-with-time-series-data-modeling/ ). The table looks like this: CREATE TABLE event_message ( message_id uuid, message_source_id

Re: Best way to delete by day?

2014-06-30 Thread DuyHai Doan
Hello Wim TTL is a good fit for your requirement if you want Cassandra to handle the deletion task for you. Now, clearly there are 2 strategies: 1) Store data on the same partition (physical row) and set TTL to expire data automatically 2) Store data on several partitions, one for each day for

Re: Best way to delete by day?

2014-06-30 Thread Wim Deblauwe
Hi, Thanks for the answers. Are you saying that I could store big binary files in Cassandra ? I have read somewhere that if the file is more than 10 Mb, it is probably not such a good idea? The binary files can be up to 50 or 100 Mb, no more in my case. So the way I understand it, if I store

Re: Best way to delete by day?

2014-06-30 Thread DuyHai Doan
It is recommended to split your big binary data into small chunks of 5Mb/10Mb if you want to store them in C*. Astyanax framework can help with this. If you store binary files outside C*, yes, you'll need to manage the deletion manually, at the same time you delete the partition for a day of

[RELEASE] Apache Cassandra 1.2.17 released

2014-06-30 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra version 1.2.17. Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. You can read more here:

[RELEASE] Apache Cassandra 2.0.9 released

2014-06-30 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra version 2.0.9. Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. You can read more here:

Windows uname -o not supported.

2014-06-30 Thread Lars Schouw
How do I start cassandra on Windows? And what does my environment have to look like? I am getting an error when starting cassandra on Windows... uname -o not supported.  I am using uname (GNU sh-utils) 2.0 I am not running in cygwin but just a pure powershell Window. My Cassandra version

Compacting large row … incrementally … with HUGE values.

2014-06-30 Thread Kevin Burton
I'm running a full compaction now and noticed this: Compacting large row … incrementally … and the values were in the 300-500MB range. I'm storing NOTHING anywhere near that large. Max is about 200k... However, I'm storing my schema in a way so that I can do efficient time/range scans of the

Re: Compacting large row … incrementally … with HUGE values.

2014-06-30 Thread DuyHai Doan
Hello Kevin. With CQL3 there are some important terms to define: a. Row : means a logical row in the CQL3 semantics, logical row is what is displayed as a row in cqlsh client b. Partition: means a physical row on disk in the CQL3 semantics Even if you have tiny logical rows, if you store a

why aren't my nodes show same percentage of ownership

2014-06-30 Thread S C
When I run nodetoool ring I see the ownership with different percentages. However, LOAD column shows not a huge deviation. Why is that? I am using Datastax 3.0. http://pastebin.com/EcWbZn26

Re: Compacting large row … incrementally … with HUGE values.

2014-06-30 Thread Kevin Burton
yup… that's what I was thinking.. but good point on the physical vs logical row… cassandra should be more rigorous about this term… it just says large row not large physical row … Any idea how much this is going to slow me down? On Mon, Jun 30, 2014 at 10:10 AM, DuyHai Doan

Re: SSTable compression ratio… percentage or 0.0 - 1.0???

2014-06-30 Thread Robert Coli
I've had a to-do list item to mention this to docs AT datastax DOT com for a while. I have bcc:ed them here! I'm pretty sure there are other places throughout the docs codebase where ratios are incorrectly called percentages and vice versa.. :) =Rob On Sun, Jun 29, 2014 at 6:57 AM, Jack

Read 75k live rows in a query that should only return 500 (in queue-like table).

2014-06-30 Thread Kevin Burton
I have a queue-like table which is reading 75k Iive rows… and then only returning 500. … I'm trying to figure out why this could be. Following this: http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets I BELIEVE that I'm doing everything right. Essentially

Re: why aren't my nodes show same percentage of ownership

2014-06-30 Thread Mark Reddy
You should run nodetool ring and specify a keyspace, otherwise the ownership information will be nonsense. https://issues.apache.org/jira/browse/CASSANDRA-7173 Mark On Mon, Jun 30, 2014 at 6:13 PM, S C as...@outlook.com wrote: When I run nodetoool ring I see the ownership with different

Re: Batch of prepared statements exceeding specified threshold

2014-06-30 Thread Marcelo Elias Del Valle
Hi, I think it's a bit late for this reply, but anyway... We hired support from http://thelastpickle.com/ to solve our problem and thanks to them we were able to solve our issue as well. What was causing this behavior was a large query being executed by mistake in our code. It was needed to open

nodetool repair -snapshot option?

2014-06-30 Thread Phil Burress
We are running into an issue with nodetool repair. One or more of our nodes will die with OOM errors when running nodetool repair on a single node. Was reading this http://www.datastax.com/dev/blog/advanced-repair-techniques and it mentioned using the -snapshot option, however, that doesn't appear

Re: nodetool repair -snapshot option?

2014-06-30 Thread Yuki Morishita
Repair uses snapshot option by default since 2.0.2 (see NEWS.txt). So you don't have to specify in your version. Do you have stacktrace when OOMed? On Mon, Jun 30, 2014 at 4:54 PM, Phil Burress philburress...@gmail.com wrote: We are running into an issue with nodetool repair. One or more of our

Re: nodetool repair -snapshot option?

2014-06-30 Thread Kevin Burton
The stack won't help a ton since the memory leak will occur elsewhere… the stack will just have the point where the memory allocation failed :-( On Mon, Jun 30, 2014 at 3:08 PM, Yuki Morishita mor.y...@gmail.com wrote: Repair uses snapshot option by default since 2.0.2 (see NEWS.txt). So you

Re: nodetool repair -snapshot option?

2014-06-30 Thread Robert Coli
On Mon, Jun 30, 2014 at 3:08 PM, Yuki Morishita mor.y...@gmail.com wrote: Repair uses snapshot option by default since 2.0.2 (see NEWS.txt). As a general meta comment, the process by which operationally important defaults change in Cassandra seems ad-hoc and sub-optimal. For to record, my

Re: nodetool repair -snapshot option?

2014-06-30 Thread Phil Burress
We are running repair -pr. We've tried subrange manually and that seems to work ok. I guess we'll go with that going forward. Thanks for all the info! On Mon, Jun 30, 2014 at 6:52 PM, Jaydeep Chovatia chovatia.jayd...@gmail.com wrote: Are you running full repair or on subset? If you are

Re: Any better solution to avoid TombstoneOverwhelmingException?

2014-06-30 Thread Jason Tang
The traffic is continuously, which means when insert new records, at the same time, old records are executed (deleted) And the execution are based on time condition, so some stored records will be executed (deleted), some will be executed in the next round. For given TTL, it is same as delete,

Re: nodetool repair -snapshot option?

2014-06-30 Thread Phil Burress
One last question. Any tips on scripting a subrange repair? On Mon, Jun 30, 2014 at 7:12 PM, Phil Burress philburress...@gmail.com wrote: We are running repair -pr. We've tried subrange manually and that seems to work ok. I guess we'll go with that going forward. Thanks for all the info!

Re: nodetool repair -snapshot option?

2014-06-30 Thread Paulo Ricardo Motta Gomes
If you find it useful, I created a tool where you input the node IP, keyspace, column family, and optionally the number of partitions (default: 32K), and it outputs the list of subranges for that node, CF, partition size: https://github.com/pauloricardomg/cassandra-list-subranges So you can

Re: nodetool repair -snapshot option?

2014-06-30 Thread Phil Burress
@Paulo, this is very cool! Thanks very much for the link! On Mon, Jun 30, 2014 at 9:37 PM, Paulo Ricardo Motta Gomes paulo.mo...@chaordicsystems.com wrote: If you find it useful, I created a tool where you input the node IP, keyspace, column family, and optionally the number of partitions

Re: Connection reset by peer error

2014-06-30 Thread cass savy
The app and Cassandra are connected via firewall. For some reason, connections are still remaining on Cassandra side even after stopping services on app server. On Mon, Jun 30, 2014 at 3:29 PM, Jacob Rhoden jacob.rho...@me.com wrote: How are the two machines connected? Direct cable? Via a hub,