Re: idempotent counters

2014-05-19 Thread Aaron Morton
Does anybody else use another technique for achieving this idempotency with counters? The idempotency problem with counters has to do with what will happen when you get a timeout. If you reply the write there is a chance of the increment been applied twice. This is inherent in the current

Re: Effect of number of keyspaces on write-throughput....

2014-05-19 Thread Aaron Morton
Each client is writing to a separate keyspace simultaneously. Hence, is there a lot of switching of keyspaces? I would think not. If the client app is using one keyspace per connection there should be no reason for the driver to change keyspaces. But, I observed that when using a

Re: Schema errors when bootstrapping / restarting node

2014-05-19 Thread Aaron Morton
I am able to fix this error by clearing out the schema_columns system table on disk. After that, a node can boot successfully. Does anyone have a clue what's going on here? Something has come corrupted in the system tables as you say. A less aggressive way to reset the local schema is to

Re: Query returns incomplete result

2014-05-19 Thread Aaron Morton
Calling execute the second time runs the query a second time, and it looks like the query mutates instance state during the pagination. What happens if you only call execute() once ? Cheers Aaron - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache

Re: Datacenter understanding question

2014-05-19 Thread Aaron Morton
Depends on how you have setup the replication. If you are using SimpleStrategy with RF 1, then there will be a single copy of each row in the cluster. If you are using the NetworkTopologyStrategy with RF 1 in each DC then there will be two copies of each row in the cluster. One in each DC.

Re: Cassandra counter column family performance

2014-05-19 Thread Aaron Morton
I get a lot of TExceptions What are the exceptions ? In general counters are slower than writes, but that does not lead them to fail like that. Check the logs for errors and/or messages from the GCInspector saying the garbage collection is going on. Cheers A - Aaron Morton

Re: Question about READS in a multi DC environment.

2014-05-19 Thread Aaron Morton
In this case I was not thinking about what was happening synchronous to client request, only that the request was hitting all nodes. You are right, when reading at LOCAL_ONE the coordinator will only be blocking for one response (the data response). Cheers Aaron - Aaron

Re: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)

2014-05-19 Thread Aaron Morton
The limit is just ignored and the entire column family is scanned. Which limit ? 1. Am I right that there is no way to get some data limited by token range with ColumnFamilyInputFormat? From what I understand setting the input range is used when calculating the splits. The token ranges in

Re: Effect of number of keyspaces on write-throughput....

2014-05-19 Thread Krishna Chaitanya
Thankyou for making these issues clear. Currently, in my datamodel, I have the current second( seconds-from-epoch) as the row key and micro second with the client number as the column key. Hence, all the packets received during a particular second on all the clients are stored in

Re: idempotent counters

2014-05-19 Thread Jabbar Azam
Thanks Aaron. I've mitigated this by removing the dependency on idempotent counters. But its good to know the limitations of counters. Thanks Jabbar Azam On 19 May 2014 08:36, Aaron Morton aa...@thelastpickle.com wrote: Does anybody else use another technique for achieving this idempotency

Changing default_time_to_live

2014-05-19 Thread Keith Wright
Hi all, we are using C* 2.0.6 and have set the default_time_to_live parameter on a number of our LCS column families. I was wondering what would happen if we were to decrease this value via a table alter. Would subsequent compactions of data written before that alter honor the new value and

Can SSTables overlap with SizeTieredCompactionStrategy?

2014-05-19 Thread Phil Luckhurst
We have a table defined using SizeTieredCompactionStrategy that is used to store time series data. Over a period of a few days we wrote approximately 200,000 unique time based entries for each of 700 identifiers, i.e. 700 wide rows with 200,000 entries in each. The table was empty when we started

Re: Index with same Name but different keyspace

2014-05-19 Thread mahesh rajamani
Sorry I just realized the table name in 2 schema are slightly different, but still i am not sure why i should not use same index name across different schema. Below is the instruction to reproduce. Created 2 keyspace using cassandra-cli [default@unknown] create keyspace keyspace1 with

Re: Cyclop - CQL web based editor has been released!

2014-05-19 Thread Maciej Miklas
thanks - I've fixed it. Regards, Maciej On Mon, May 12, 2014 at 2:50 AM, graham sanderson gra...@vast.com wrote: Looks cool - giving it a try now (note FYI when building, TestDataConverter.java line 46 assumes a specific time zone) On May 11, 2014, at 12:41 AM, Maciej Miklas

CQL 3 and wide rows

2014-05-19 Thread Maciej Miklas
Hi *, I’ve checked DataStax driver code for CQL 3, and it looks like the column names for particular table are fully loaded into memory, it this true? Cassandra should support wide rows, meaning tables with millions of columns. Knowing that, I would expect kind of iterator for column names. Am I

RE: CQL 3 and wide rows

2014-05-19 Thread James Campbell
Maciej, In CQL3 wide rows are expected to be created using clustering columns. So while the schema will have a relatively smaller number of named columns, the effect is a wide row. For example: CREATE TABLE keyspace.widerow ( row_key text, wide_row_column text, data_column text,

Re: CQL 3 and wide rows

2014-05-19 Thread Jack Krupansky
You might want to review this blog post on supporting dynamic columns in CQL3, which points out that “the way to model dynamic cells in CQL is with a compound primary key.” See: http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows -- Jack Krupansky From: Maciej Miklas

Filtering on Collections

2014-05-19 Thread Raj Janakarajan
Hello all, I am using Cassandra version 2.0.7. I am wondering if collections is efficient for filtering. We are thinking of using collections to maintain a list for a customer row but we have to be able to filter on the collection values. Select UUID from customer where eligibility_state IN

Re: Filtering on Collections

2014-05-19 Thread Eric Plowe
Collection types cannot be used for filtering (as part of the where statement). They cannot be used as a primary key or part of a primary key. Secondary indexes are not supported as well. On Mon, May 19, 2014 at 12:50 PM, Raj Janakarajan r...@zephyrhealthinc.comwrote: Hello all, I am using

Re: Filtering on Collections

2014-05-19 Thread Patricia Gorla
Raj, Secondary indexes across CQL3 collections were introduced into 2.1 beta1, so will be available in future versions. See https://issues.apache.org/jira/browse/CASSANDRA-4511 If your main concern is performance then you should find another way to model the data: each collection is read

Re: Filtering on Collections

2014-05-19 Thread Raj Janakarajan
Thank you Patricia. This is helpful. Raj On Mon, May 19, 2014 at 10:54 AM, Patricia Gorla patri...@thelastpickle.com wrote: Raj, Secondary indexes across CQL3 collections were introduced into 2.1 beta1, so will be available in future versions. See

Re: Filtering on Collections

2014-05-19 Thread Raj Janakarajan
Thanks Eric for the information. It looks like it will be supported in future versions. Raj On Mon, May 19, 2014 at 10:03 AM, Eric Plowe eric.pl...@gmail.com wrote: Collection types cannot be used for filtering (as part of the where statement). They cannot be used as a primary key or part

Ec2 Network I/O

2014-05-19 Thread Phil Burress
Has anyone experienced network i/o issues with ec2? We are seeing a lot of these in our logs: HintedHandOffManager.java (line 477) Timed out replaying hints to /10.0.x.xxx; aborting (15 delivered) and these... Cannot handshake version with /10.0.x.xxx and these... java.io.IOException: Cannot

Re: Query first 1 columns for each partitioning keys in CQL?

2014-05-19 Thread Bryan Talbot
I think there are several issues in your schema and queries. First, the schema can't efficiently return the single newest post for every author. It can efficiently return the newest N posts for a particular author. On Fri, May 16, 2014 at 11:53 PM, 後藤 泰陽 matope@gmail.com wrote: But I

Re: Best partition type for Cassandra with JBOD

2014-05-19 Thread Bryan Talbot
For XFS, using noatime and nodirtime isn't really useful either. http://xfs.org/index.php/XFS_FAQ#Q:_Is_using_noatime_or.2Fand_nodiratime_at_mount_time_giving_any_performance_benefits_in_xfs_.28or_not_using_them_performance_decrease.29.3F On Sat, May 17, 2014 at 7:52 AM, James Campbell

Re: Index with same Name but different keyspace

2014-05-19 Thread Bryan Talbot
On Mon, May 19, 2014 at 6:39 AM, mahesh rajamani rajamani.mah...@gmail.comwrote: Sorry I just realized the table name in 2 schema are slightly different, but still i am not sure why i should not use same index name across different schema. Below is the instruction to reproduce. Created 2

Re: Filtering on Collections

2014-05-19 Thread Eric Plowe
Ah, that is interesting, Patricia. Since they can be a secondary index, it's not too far off for them being able to be a primary key, no? On Mon, May 19, 2014 at 1:54 PM, Patricia Gorla patri...@thelastpickle.comwrote: Raj, Secondary indexes across CQL3 collections were introduced into 2.1

Re: Filtering on Collections

2014-05-19 Thread Patricia Gorla
I'm not sure about that — allowing collections as a primary key would be a much different implementation than setting up a secondary index. The primary key in CQL3 is actually the partition key which determines which token the row is assigned, so you would still need to have one partition key.

Re: Ec2 Network I/O

2014-05-19 Thread Nate McCall
It's a good idea to increase phi_convict_threshold to at least 12 on EC2. Using placement groups and single-tenant systems will certainly help. Another optimization would be dedicating an Enhanced Network Interface ( http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html) specifically

Re: CQL 3 and wide rows

2014-05-19 Thread Maciej Miklas
Hallo Jack, You have given a perfect example for wide row. Each reading from sensor creates new column within a row. It was also possible with Hector/CLI to have millions of columns within a single row. According to this page http://wiki.apache.org/cassandra/CassandraLimitations single row

Re: CQL 3 and wide rows

2014-05-19 Thread Maciej Miklas
Hi James, Clustering is based on rows. I think that you meant not clustering columns, but compound columns. Still all columns belong to single table and are stored within single folder on one computer. And it looks to me (but I’am not sure) that CQL 3 driver loads all column names into memory

Re: Multi-dc cassandra keyspace

2014-05-19 Thread Nate McCall
We did something similar with a split cloud/physical hardware deployment. There was a weird requirement that app authentication data (fortunately in it's own keyspace already) could not live on the cloud (shrug). This ended up being a simple configuration change in the schema just like your

RE: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)

2014-05-19 Thread Anton Brazhnyk
Hi Aaron, I've seen the code which you describe (working with splits and intersections) but that range is derived from keys and work only for ordered partitioners (in 1.2.15). I've already got one confirmation that in C* version I use (1.2.15) setting limits with setInputRange(startToken,