Java Driver paging slower than manual/token paging?

2015-07-17 Thread Keith Freeman
We've recently started upgrading from 1.2.12 to 2.1.7. In 1.2.12 we wrote code that used the well-known pagination pattern (tokens) to process all rows in one of our tables. For 2.1.7 we tried replacing that code with the new built-in pagination code: ListRow queryRows = new

Re: Cassandra use cases/Strengths/Weakness

2014-07-14 Thread Keith Freeman
We've struggled getting consistent write latency linear write scalability with a pretty heavy insert load (1000's of records/second), and our records are about 1k-2k of data (mix of integer/string columns and a blob). Wondering if you have any rough numbers for your small to medium write

any way to REALLY turn off commitlog?

2014-03-27 Thread Keith Freeman
We're running an insert-heavy use-case and have set durable_writes = false for all of our keyspaces. While inserts are coming in (about 2000 1k-records per second), we are still seeing 50Mb written to files in the commitlog directory every 6-10 seconds (using iostat). Anybody know why so

Re: cleanup failure; FileNotFoundException deleting (wrong?) db file

2013-11-06 Thread Keith Freeman
Is it possible that the keyspace was dropped then re-created ( https://issues.apache.org/jira/browse/CASSANDRA-4857)? I've seen similar stack traces in that case. On 11/05/2013 10:47 PM, Elias Ross wrote: I'm seeing the following: Caused by: java.lang.RuntimeException:

Re: CQL selecting individual items from a map

2013-10-29 Thread Keith Freeman
There's some rationale here: http://mail-archives.apache.org/mod_mbox/cassandra-user/201305.mbox/%3CCAENxBwx6pcSA=cWn=dkw_52k5odw5f3xigj-zn_4bwfth+4...@mail.gmail.com%3E And I'm sure part of the reason is the 64k size limit: maps (and sets and lists) are limited to 64k total size

Re: Disappearing index data.

2013-10-07 Thread Keith Freeman
We use Jmxterm: http://wiki.cyclopsgroup.org/jmxterm On 10/07/2013 07:53 AM, Tom van den Berge wrote: Thanks, I'll give that a try. Is there a way to do this without JMX? I wouldn't know now to run a JMX console on my production servers without a graphical interface. On Mon, Oct 7, 2013 at

commitlog partition

2013-09-16 Thread Keith Freeman
I'm spec'ing out some hardware for a small cassandra cluster. I know the recommendation (v1.2+) on spinning media is to have the commitlog on a separate physical disk from the data, but is it considered ok for performance to put the commitlog on a partition of the OS's disk?

Re: heavy insert load overloads CPUs, with MutationStage pending

2013-09-13 Thread Keith Freeman
Paul- Sorry to go off-list but I'm diving pretty far into details here. Ignore if you wish. Thanks a lot for the example, definitely very helpful. I'm surprised that the Cassandra experts aren't more interested-in/alarmed-by our results, it seems like we've proved that insert performance

Re: heavy insert load overloads CPUs, with MutationStage pending

2013-09-12 Thread Keith Freeman
as the batch_mutate when it came to writing a full wide-row at once. I think Cassandra 2.0 makes CQL work better for these cases (CASSANDRA-4693), but I haven't tested it yet. -Paul -Original Message- From: Keith Freeman [mailto:8fo...@gmail.com] Sent: Wednesday, September 11, 2013 1:06 PM

Re: heavy insert load overloads CPUs, with MutationStage pending

2013-09-11 Thread Keith Freeman
I have RF=2 On 09/10/2013 11:18 AM, Robert Coli wrote: On Tue, Sep 10, 2013 at 10:17 AM, Robert Coli rc...@eventbrite.com mailto:rc...@eventbrite.com wrote: On Tue, Sep 10, 2013 at 7:55 AM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: On my 3-node cluster

Re: heavy insert load overloads CPUs, with MutationStage pending

2013-09-11 Thread Keith Freeman
On 09/10/2013 11:42 AM, Nate McCall wrote: With SSDs, you can turn up memtable_flush_writers - try 3 initially (1 by default) and see what happens. However, given that there are no entries in 'All time blocked' for such, they may be something else. Tried that, it seems to have reduced the

Re: FileNotFoundException while inserting (1.2.8)

2013-09-11 Thread Keith Freeman
, this is the first time I've been able to follow-up and report it to the mailing list. On 09/11/2013 10:55 AM, Robert Coli wrote: On Wed, Sep 11, 2013 at 6:49 AM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: Yes, I started with a fresh keyspace (dropped and re-created

Re: heavy insert load overloads CPUs, with MutationStage pending

2013-09-11 Thread Keith Freeman
/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque . I was able to solve my issue by switching to using the thrift batch_mutate to write a full wide-row at once instead of using many CQL INSERT statements. -Paul -Original Message- From: Keith Freeman [mailto

Re: heavy insert load overloads CPUs, with MutationStage pending

2013-09-11 Thread Keith Freeman
. However, given that there are no entries in 'All time blocked' for such, they may be something else. How are you inserting the data? On Tue, Sep 10, 2013 at 12:40 PM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: On 09/10/2013 11:17 AM, Robert Coli wrote

FileNotFoundException while inserting (1.2.8)

2013-09-10 Thread Keith Freeman
While running a heavy insert load, one of my nodes started throwing this exception when trying a compaction: INFO [CompactionExecutor:23] 2013-09-09 16:08:07,528 CompactionTask.java (line 105) Compacting [SSTableReader(p

Re: heavy insert load overloads CPUs, with MutationStage pending

2013-09-10 Thread Keith Freeman
On 09/10/2013 11:17 AM, Robert Coli wrote: On Tue, Sep 10, 2013 at 7:55 AM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: On my 3-node cluster (v1.2.8) with 4-cores each and SSDs for commitlog and data On SSD, you don't need to separate commitlog and data. You only

Re: insert performance (1.2.8)

2013-08-26 Thread Keith Freeman
across rows for wide rows gave us normal insert rates. When you mutate a entire wide row at once it hits a bottleneck. On Mon, Aug 26, 2013 at 4:49 PM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: I can believe that I'm IO bound with the current disk configuration

Re: insert performance (1.2.8)

2013-08-21 Thread Keith Freeman
will be doing a lot more in the same payload message. Otherwise CQL is more efficient. If you do build those giant string, yes you should see a performance improvement. On Tue, Aug 20, 2013 at 8:03 PM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: Thanks. Can you tell me

Re: insert performance (1.2.8)

2013-08-20 Thread Keith Freeman
Futures API to get better throughput on the client side. On Mon, Aug 19, 2013 at 10:14 PM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: Sure, I've tried different numbers for batches and threads, but generally I'm running 10-30 threads at a time on the client, each

Re: insert performance (1.2.8)

2013-08-20 Thread Keith Freeman
? On Tue, Aug 20, 2013 at 8:56 AM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: Ok, I'll try prepared statements. But while sending my statements async might speed up my client, it wouldn't improve throughput on the cassandra nodes would it? They're running at pretty

Re: insert performance (1.2.8)

2013-08-20 Thread Keith Freeman
in your case). Again, apologies, I would not have recommended that route if I knew it was only in 2.0. I would be willing to bet you could hit those insert numbers pretty easily with thrift given the shape of your mutation. On Tue, Aug 20, 2013 at 5:00 PM, Keith Freeman 8fo...@gmail.com mailto

insert performance (1.2.8)

2013-08-19 Thread Keith Freeman
I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz machines not shared with any other VMs). I'm inserting time-series data into a single column-family using wide rows (timeuuids) and have a 3-part partition key so my primary key is something like ((a, b, day),

Re: insert performance (1.2.8)

2013-08-19 Thread Keith Freeman
without seeing some example code (on pastebin, gist or similar, ideally). On Mon, Aug 19, 2013 at 5:49 PM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz machines not shared with any other VMs). I'm

Re: token(), limit and wide rows

2013-08-16 Thread Keith Freeman
I've run into the same problem, surprised nobody's responded to you. Any time someone asks how do I page through all the rows of a table in CQL3?, the standard answer is token() and limit. But as you point out, this method will often miss some data from wide rows. Maybe a Cassandra expert

write load while idle?

2013-08-16 Thread Keith Freeman
I have a 3-node cluster running 1.2.8, and with no clients connected (for about an hour) opscenter is showing a heartbeat-like pattern for total writes in the Cluster Reads Writes panel on the dashboard ranging from about 10/sec to 26/sec. Total reads on the other hand are showing a straight

Re: Any good GUI based tool to manage data in Casandra?

2013-08-09 Thread Keith Freeman
Sounds like a good tool, but isn't it odd to only have Windows Mac versions? My impression has been that most users run Cassandra on Linux. Is a Linux version coming (please!)? On 08/09/2013 01:27 PM, Alex Popescu wrote: On Fri, Aug 9, 2013 at 10:12 AM, David McNelis dmcne...@gmail.com

clarification of token() in CQL3

2013-08-06 Thread Keith Freeman
I've seen in several places the advice to use queries like to this page through lots of rows: select id from mytable where token(id) token(last_id) But it's hard to find detailed information about how this works (at least that I can understand -- the description in the Cassandra manual is

Re: clarification of token() in CQL3

2013-08-06 Thread Keith Freeman
fall sequentially after token(last_processed_row)) On 08/06/2013 08:18 AM, Richard Low wrote: On 6 August 2013 15:12, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: I've seen in several places the advice to use queries like to this page through lots of rows: select id

CQL3 select between is broken?

2013-08-06 Thread Keith Freeman
I've been looking at examples about modeling series data in Cassandra, and in one experiment created a table like this: create table vvv (k text, t bigint, value text, primary key (k, t)); After inserting some data with identical k values and differing t values, I tried this query (which is

Re: CQL and undefined columns

2013-08-05 Thread Keith Freeman
From the Cassandra 1.2 Manual: Using the compact storage directive prevents you from adding more than one column that is not part of the PRIMARY KEY. At this time, updates to data in a table created with compact storage are not allowed. The table with compact storage that uses a compound