Re: Adding disk capacity to a running node

2016-10-17 Thread Laing, Michael
You could just expand the size of your ebs volume and extend the file system. No data is lost - assuming you are running Linux. On Monday, October 17, 2016, Seth Edwards wrote: > We're running 2.0.16. We're migrating to a new data model but we've had an > unexpected increase in

Re: Cassandra event notification on INSERT/DELETE of records

2016-05-25 Thread Laing, Michael
You could also follow this related issue: https://issues.apache.org/jira/browse/CASSANDRA-8844 On Wed, May 25, 2016 at 12:04 PM, Aaditya Vadnere wrote: > Thanks Eric and Mark, we were thinking along similar lines. But we already > need Cassandra for regular database purpose,

Re: UUID coming as int while using SPARK SQL

2016-05-24 Thread Laing, Michael
; SELECT id, workflow FROM sam WHERE dept='blah'; > > And in Spark with Python: > SELECT distinct id, dept, workflow FROM samd WHERE dept='blah'; > > > Best, > Rajesh R > > > -- > *From:* Laing, Michael [michael.la...@nytimes.com] > *Se

Re: UUID coming as int while using SPARK SQL

2016-05-24 Thread Laing, Michael
Try converting that int from decimal to hex and inserting dashes in the appropriate spots - or go the other way. Also, you are looking at different rows, based upon your selection criteria... ml On Tue, May 24, 2016 at 6:23 AM, Rajesh Radhakrishnan < rajesh.radhakrish...@phe.gov.uk> wrote: >

Re: Publishing from cassandra

2016-04-24 Thread Laing, Michael
You could take a look at, or follow: https://issues.apache.org/jira/browse/CASSANDRA-8844 On Sun, Apr 24, 2016 at 10:51 AM, Alexander Orr wrote: > Hi, > > I'm wondering if someone could help me, I'd like to use cassandra to store > data and publish this on dowstream to

Re: Migration from 2.0.10 to 2.1.12

2016-03-30 Thread Laing, Michael
fyi the list of reserved keywords is at: https://cassandra.apache.org/doc/cql3/CQL.html#appendixA ml On Wed, Mar 30, 2016 at 9:41 AM, Jean Carlo wrote: > Yes we did some reads and writes, the problem is that adding double quotes > force us to modify our code to

Re: Modeling contact list, plain table or List

2016-01-09 Thread Laing, Michael
Note that in C* 3.02 the second query is invalid: cqlsh> Select * from communication.user_contact_list where user_id = 98f50f00-b6d5-11e5-afec-6003089bf572 and is_favorite = true order by contact_name asc; *InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY column "is_favorite"

Re: Slow write speeds

2015-12-31 Thread Laing, Michael
To add to what Jonathan and Jack have said... To get high levels of performance with the python driver you should: - prepare your statements once (recent drivers default to Token Aware - and will correctly apply it if the statement is prepared). - execute asynchronously (up to ~150

Re: Materialized View: can the view's partition key change due to changes to the underlying table?

2015-12-15 Thread Laing, Michael
why don't you just try it? On Tue, Dec 15, 2015 at 6:30 PM, Will Zhang wrote: > Hi all, > > I originally raised this on SO, but not really getting any answer there, > thought I give it a try here. > > > Just thinking about this so please correct my understanding if any

Re: list data value multiplied x2 in multi-datacenter environment

2015-11-25 Thread Laing, Michael
You don't have any syntax in your application anywhere such as: UPDATE data SET field5 = field5 + [ 1,2,3 ] WHERE field1=...; Just a quick idempotency check :) On Wed, Nov 25, 2015 at 9:16 AM, Jack Krupansky wrote: > Is the data corrupted exactly the same way on all

Re: Getting code=2200 [Invalid query] message=Invalid column name ... while executing ALTER statement

2015-11-21 Thread Laing, Michael
> > All these pain we need to take because the column names have special >> character like " ' _- ( ) '' ¬ " etc. >> > Hmm. I tried: cqlsh:test> create table quoted_col_name ( pk int primary key, "'_-()""¬" int); cqlsh:test> select * from quoted_col_name; *pk* | *'_-()"¬* +- (0

Re: Getting code=2200 [Invalid query] message=Invalid column name ... while executing ALTER statement

2015-11-21 Thread Laing, Michael
v 21, 2015 at 8:52 AM, Laing, Michael <michael.la...@nytimes.com> wrote: > All these pain we need to take because the column names have special >>> character like " ' _- ( ) '' ¬ " etc. >>> >> > Hmm. I tried: > > cqlsh:test> create table quoted

Re: Convert timeuuid in timestamp programmatically

2015-11-16 Thread Laing, Michael
http://www.tutorialspoint.com/java/util/uuid_timestamp.htm On Mon, Nov 16, 2015 at 7:38 AM, Marlon Patrick wrote: > Hi Donfeng, > > I'm interested in convert a timeuuid already generated in a timestamp, > similar to dateOf function of the Cassandra, but in Java code.

Re: Overriding timestamp with light weight transactions

2015-11-16 Thread Laing, Michael
So you are reading the row before writing as you say you have the timestamp. If you really need CAS for the write *and* the timestamp you read is in the future (by local reckoning), why not delay that write until the future arrives and forget about explicitly setting the timestamp? Backtracking

Re: Getting code=2200 [Invalid query] message=Invalid column name ... while executing ALTER statement

2015-11-13 Thread Laing, Michael
Dynamic schema changes are generally a bad idea, especially if they are rapid. You should rethink your approach. On Fri, Nov 13, 2015 at 7:20 AM, Rajesh Radhakrishnan < rajesh.radhakrish...@phe.gov.uk> wrote: > > Thank you Carlos for looking. > But when I rand the nodetool describecluster. > It

Re: Read query taking a long time

2015-10-21 Thread Laing, Michael
Are the clocks synchronized across the cluster - probably, but I thought I would ask :) On Wed, Oct 21, 2015 at 3:35 AM, Brice Figureau < brice+cassan...@daysofwonder.com> wrote: > Hi, > > On 20/10/2015 19:48, Carlos Alonso wrote: > > I think also having the output of cfhistograms could help.

Re: Removed node is not completely removed

2015-10-14 Thread Laing, Michael
Remember that the system keyspace uses LocalStrategy: each node has its own set of system tables. -ml On Wed, Oct 14, 2015 at 9:17 AM, Tom van den Berge < tom.vandenbe...@gmail.com> wrote: > Hi Carlos, > > I'm using 2.1.6. The mysterious node is not in the peers table. Any other > ideas? > One

Re: Consistency Issues

2015-09-30 Thread Laing, Michael
What client are you using? Official java and python clients should not have a LB between them and the C* nodes AFAIK. Why aren't you using 2.1.9? Have you checked for schema agreement amongst all nodes? ml On Wed, Sep 30, 2015 at 11:22 AM, Walsh, Stephen wrote: >

Re: High read latency

2015-09-26 Thread Laing, Michael
Maybe compaction not keeping up - since you are hitting so many sstables? Read heavy... are you using LCS? Plenty of resources... tune to increase memtable size? On Sat, Sep 26, 2015 at 9:19 AM, Eric Stevens wrote: > Since you have most of your reads hitting 5-8 SSTables,

Re: Question about consistency

2015-09-09 Thread Laing, Michael
What are your read repair settings? On Tue, Sep 8, 2015 at 9:28 PM, Eric Plowe wrote: > To further expand. We have two data centers, Miami and Dallas. Dallas is > our disaster recovery data center. The cluster has 12 nodes, 6 in Miami and > 6 in Dallas. The servers in

Re: Question about consistency

2015-09-09 Thread Laing, Michael
_chance: 0.1 > > > On Wednesday, September 9, 2015, Laing, Michael <michael.la...@nytimes.com> > wrote: > >> What are your read repair settings? >> >> On Tue, Sep 8, 2015 at 9:28 PM, Eric Plowe <eric.pl...@gmail.com> wrote: >> >>> To further expand. We have

Re: Question about consistency

2015-09-09 Thread Laing, Michael
I'll give it a try and report back my findings. >> >> Thank you, Michael. >> >> >> On Wednesday, September 9, 2015, Laing, Michael < >> michael.la...@nytimes.com> wrote: >> >>> Perhaps a variation on >>> https://issues.apache.org/jira/bro

Re: Question about consistency

2015-09-09 Thread Laing, Michael
ndeed turn it > off. > > On Wednesday, September 9, 2015, Laing, Michael <michael.la...@nytimes.com> > wrote: > >> "alter table test.test_root WITH speculative_retry = '0.0PERCENTILE';" >> >> seemed to work for me with C* version 2.1.7 >> >> On Wed

Re: Question about consistency

2015-09-09 Thread Laing, Michael
Wiser heads may have to chime in then :) On Wed, Sep 9, 2015 at 3:07 PM, Eric Plowe <eric.pl...@gmail.com> wrote: > So I set speculative_retry to NONE and I encountered the situation about > 30 minutes ago. > > > > On Wednesday, September 9, 2015, Laing, Michael &l

Re: Is Cassandra really Strong consistency?

2015-09-06 Thread Laing, Michael
I think I saw this before. Clocks must be synchronized. On Sun, Sep 6, 2015 at 7:28 AM, ibrahim El-sanosi wrote: > Hi folks, > > Assume we have 4-nodes cluster N1, N2, N3, and N4 and replication factor > is 3. When write CL =ALL and read CL=ONE: > > Client c1 sends

Re: Is Cassandra really Strong consistency?

2015-09-06 Thread Laing, Michael
- >> Date: Sun, 6 Sep 2015 13:10:14 +0100 >> Subject: Re: Is Cassandra really Strong consistency? >> From: ibrahimsaba...@gmail.com >> To: user@cassandra.apache.org >> >> >> Do you mean Cassandra does synchronize the clock across all the cluster

Re: Convert joins in RDBMS to Cassandra

2015-09-06 Thread Laing, Michael
Denormalize your data to support the query, e.g.: CREATE TABLE name_by_cust_id (cust_id int, name text, PRIMARY KEY > (cust_id)); > SELECT name WHERE cust_id = 3; For additional queries, similarly denormalize. Refer to https://academy.datastax.com/courses for free online courses covering this

Re: Write request in Cassandra?

2015-08-21 Thread Laing, Michael
https://academy.datastax.com/courses/ds201-cassandra-core-concepts/internal-architecture-replication On Fri, Aug 21, 2015 at 11:53 AM, Laing, Michael michael.la...@nytimes.com wrote: 2 is more correct. On Fri, Aug 21, 2015 at 11:48 AM, ibrahim El-sanosi ibrahimsaba...@gmail.com wrote

Re: Write request in Cassandra?

2015-08-21 Thread Laing, Michael
2 is more correct. On Fri, Aug 21, 2015 at 11:48 AM, ibrahim El-sanosi ibrahimsaba...@gmail.com wrote: Dear folks, I have doubt on how Cassandra performs a write request; I have two scenarios, please read them and ensure which one is correct? Assume we have cluster consists of 4 nodes

Re: Question about how to remove data

2015-08-19 Thread Laing, Michael
Possibly you have snapshots? If so, use nodetool to clear them. On Wed, Aug 19, 2015 at 4:54 PM, Analia Lorenzatto analialorenza...@gmail.com wrote: Hello guys, I have a cassandra cluster 2.1 comprised of 4 nodes. I removed a lot of data in a Column Family, then I ran manually a

Re: Data model suggestions

2015-04-27 Thread Laing, Michael
No - it immediately removes the sstables on all nodes. On Mon, Apr 27, 2015 at 7:53 AM, Ali Akhtar ali.rac...@gmail.com wrote: Wouldn't truncating the table create tombstones? On Mon, Apr 27, 2015 at 11:55 AM, Peer, Oded oded.p...@rsa.com wrote: I recommend truncating the table instead of

Re: Cassandra tombstones being created by updating rows with TTL's

2015-04-21 Thread Laing, Michael
If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0. That's what we do. There have been discussions on the list over the last few years re this topic. ml On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen

Re: Cassandra tombstones being created by updating rows with TTL's

2015-04-21 Thread Laing, Michael
in the situation, for what I read we need to start doing this also. https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair *From:* Laing, Michael [mailto:michael.la...@nytimes.com] *Sent:* 21 April 2015 16:26 *To:* user@cassandra.apache.org *Subject:* Re: Cassandra

Re: Cassandra tombstones being created by updating rows with TTL's

2015-04-21 Thread Laing, Michael
approach? Any ideas? *From:* Laing, Michael [mailto:michael.la...@nytimes.com] *Sent:* 21 April 2015 17:09 *To:* user@cassandra.apache.org *Subject:* Re: Cassandra tombstones being created by updating rows with TTL's Discussions previously on the list show why this is not a problem

Re: [Cassandra 2.0] truncate table

2015-04-09 Thread Laing, Michael
rtfm - trncate creates snapshots by default, they must be cleared on all nodes to recover *disk space *as requested by the OP. On Thu, Apr 9, 2015 at 10:17 AM, Anuj Wadehra anujw_2...@yahoo.co.in wrote: You can try doing it from cassandra cli. Set consistency level to All and then truncate.

Re: How to store unique visitors in cassandra

2015-03-31 Thread Laing, Michael
We use Alain's solution as well to make major operational revisions. We have a red team and a blue team in each AWS region, so we just add and drop datacenters to get where we want to be. Pretty simple. ml On Tue, Mar 31, 2015 at 8:16 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: People keep

Re: High latencies for simple queries

2015-03-27 Thread Laing, Michael
I use callback chaining with the python driver and can confirm that it is very fast. You can chain the chains together to perform sequential processing. I do this when retrieving metadata and then the referenced payload for example, when the metadata has been inverted and the payload is larger

Re: Storing bi-temporal data in Cassandra

2015-02-15 Thread Laing, Michael
Perhaps you should learn more about Cassandra before you ask such questions. It's easy if you just look at the readily accessible docs. ml On Sat, Feb 14, 2015 at 6:05 PM, Raj N raj.cassan...@gmail.com wrote: I don't think thats solves my problem. The question really is why can't we use

Re: Adding more nodes causes performance problem

2015-02-09 Thread Laing, Michael
Use token-awareness so you don't have as much coordinator overhead. ml On Mon, Feb 9, 2015 at 5:32 AM, Marcelo Valle (BLOOMBERG/ LONDON) mvallemil...@bloomberg.net wrote: AFAIK, if you were using RF 3 in a 3 node cluster, so all your nodes had all your data. When the number of nodes started

Re: number of replicas per data center?

2015-01-19 Thread Laing, Michael
Since our workload is spread globally, we spread our nodes across AWS regions as well: 2 nodes per zone, 6 nodes per region (datacenter) (RF 3), 12 nodes total (except during upgrade migrations). We autodeploy into VPCs. If a region goes bad we can route all traffic to another and bring up a

Re: [Consitency on cqlsh command prompt]

2014-12-17 Thread Laing, Michael
http://datastax.github.io/python-driver/api/cassandra.html On Wed, Dec 17, 2014 at 9:27 AM, nitin padalia padalia.ni...@gmail.com wrote: Thanks! Philip/Ryan, Ryan I am using single Datacenter. Philip could you point some link where we could see those enums. -Nitin On Dec 17, 2014 7:14 PM,

Re: Using Cassandra for session tokens

2014-12-01 Thread Laing, Michael
Since the session tokens are random, perhaps computing a shard from each one and using it as the partition key would be a good idea. I would also use uuid v1 to get ordering. With such a small amount of data, only a few shards would be needed. On Mon, Dec 1, 2014 at 10:08 AM, Phil Wise

Re: Using Cassandra for session tokens

2014-12-01 Thread Laing, Michael
table as the OP suggested. On Mon Dec 01 2014 at 7:18:51 AM Laing, Michael michael.la...@nytimes.com wrote: Since the session tokens are random, perhaps computing a shard from each one and using it as the partition key would be a good idea. I would also use uuid v1 to get ordering

Re: OOM at Bootstrap Time

2014-10-27 Thread Laing, Michael
so I will try to upgrade before I look into downgrading. On Saturday, October 25, 2014, Laing, Michael michael.la...@nytimes.com wrote: Since no one else has stepped in... We have run clusters with ridiculously small nodes - I have a production cluster in AWS with 4GB nodes each

Re: OOM at Bootstrap Time

2014-10-25 Thread Laing, Michael
Since no one else has stepped in... We have run clusters with ridiculously small nodes - I have a production cluster in AWS with 4GB nodes each with 1 CPU and disk-based instance storage. It works fine but you can see those little puppies struggle... And I ran into problems such as you

Re: Help with select IN query in cassandra

2014-09-01 Thread Laing, Michael
), also increasing insert times(!) but thats the way things need to happen in cassandra world its okay. ( I am two-three weeks into learning about cassandra). -Subodh On Sun, Aug 31, 2014 at 6:44 PM, Laing, Michael michael.la...@nytimes.com wrote: Oh it must be late - I missed the fact

Re: Help with select IN query in cassandra

2014-09-01 Thread Laing, Michael
” rather than the exercise in futility of doing a massive number of deletes and updates in place? -- Jack Krupansky *From:* Laing, Michael michael.la...@nytimes.com *Sent:* Monday, September 1, 2014 9:33 AM *To:* user@cassandra.apache.org *Subject:* Re: Help with select IN query in cassandra

Re: EC2 - Performace Question

2014-09-01 Thread Laing, Michael
Is table track_user equivalent to table userpixel? On Monday, September 1, 2014, Eduardo Cusa eduardo.c...@usmediaconsulting.com wrote: Hi All. I Have a Cluster in Amazon with the following settings: * 2 Nodes M3.Large * Cassandra 2.0.7 * Default instaltion on ubuntu And I have one table

Re: EC2 - Performace Question

2014-09-01 Thread Laing, Michael
Is there a reason why updating a counter for this information will not work for you? On Monday, September 1, 2014, eduardo.cusa eduardo.c...@usmediaconsulting.com wrote: yes, is the same table, my mistake. On Mon, Sep 1, 2014 at 6:35 PM, Laing, Michael [via [hidden email] http://user

Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10

2014-08-31 Thread Laing, Michael
Actually I think you do want to use scopeId, scopeType as the partition key (and drop row caching until you upgrade to 2.1 where rows are in fact rows and not partitions): CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes ( scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar, timestamp

Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10

2014-08-31 Thread Laing, Michael
multiget use cases. Do you have any pointers to blogs or tutorials you've found helpful? Thanks, Todd On Sunday, August 31, 2014, Laing, Michael michael.l...@nytimes.com wrote: Actually I think you do want to use scopeId, scopeType as the partition key (and drop row caching until you

Re: Help with select IN query in cassandra

2014-08-31 Thread Laing, Michael
Are event_time and timestamp essentially representing the same datetime? On Sunday, August 31, 2014, Subodh Nijsure subodh.nijs...@gmail.com wrote: I have following database schema CREATE TABLE sensor_info_table ( asset_id text, event_time timestamp, timestamp timeuuid,

Re: Help with select IN query in cassandra

2014-08-31 Thread Laing, Michael
between Sal and nosql world. Subodh On Aug 31, 2014 5:33 PM, Laing, Michael michael.la...@nytimes.com wrote: Are event_time and timestamp essentially representing the same datetime? On Sunday, August 31, 2014, Subodh Nijsure subodh.nijs...@gmail.com wrote: I have following database schema

Re: Help with select IN query in cassandra

2014-08-31 Thread Laing, Michael
Oh it must be late - I missed the fact that you didn't want to specify asset_id. The above queries will still work but you have to use 'allow filtering' - generally not a good idea. I'll look again in the morning. On Sun, Aug 31, 2014 at 9:41 PM, Laing, Michael michael.la...@nytimes.com wrote

Re: select many rows one time or select many times?

2014-08-01 Thread Laing, Michael
I don't think there is an easy answer to this... A possible approach, based upon the implied dimensions of the problem, would be to maintain a bloom filter over words for each user as a partition key with the user as clustering key. Then a single query would efficiently yield the list of users

Re: Measuring WAN replication latency

2014-07-29 Thread Laing, Michael
I saw this awhile back: With requests possibly coming in from either US region, we need to make sure that the replication of data happens within an acceptable time threshold. This lead us to perform an experiment where we wrote 1 million records in one region of a multi-region cluster. We then

Re: Does SELECT … IN () use parallel dispatch?

2014-07-25 Thread Laing, Michael
We use IN (keeping the number down). The coordinator does parallel dispatch AND applies ORDERED BY to the aggregate results, which we would otherwise have to do ourselves. Anyway, worth it for us. ml On Fri, Jul 25, 2014 at 1:24 PM, Kevin Burton bur...@spinn3r.com wrote: Perhaps the best

Re: IN clause with composite primary key?

2014-07-25 Thread Laing, Michael
You may also want to use tuples for the clustering columns: The tuple notation may also be used for IN clauses on CLUSTERING COLUMNS: SELECT * FROM posts WHERE userid='john doe' AND (blog_title, posted_at) IN (('John''s Blog', '2012-01-01), ('Extreme Chess', '2014-06-01')) from

Re: How to maintain the N-most-recent versions of a value?

2014-07-18 Thread Laing, Michael
The cql you provided is invalid. You probably meant something like: CREATE TABLE foo ( rowkey text, family text, qualifier text, version int, value blob, PRIMARY KEY ((rowkey, family, qualifier), version)) WITH CLUSTERING ORDER BY (version DESC); We use

Re: Does the default LIMIT applies to automatic paging?

2014-06-24 Thread Laing, Michael
And with python use future.has_more_pages and future.start_fetching_next_page(). On Tue, Jun 24, 2014 at 1:20 AM, DuyHai Doan doanduy...@gmail.com wrote: With the Java Driver, set the fetchSize and use ResultSet.iterator Le 24 juin 2014 01:04, ziju feng pkdog...@gmail.com a écrit : Hi All,

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Laing, Michael
However my extensive benchmarking this week of the python driver from master shows a performance *decrease* when using 'token_aware'. This is on 12-node, 2-datacenter, RF-3 cluster in AWS. Also why do the work the coordinator will do for you: send all the queries, wait for everything to come

Re: Summarizing Timestamp datatype

2014-06-18 Thread Laing, Michael
. On Tue, Jun 17, 2014 at 9:46 PM, Laing, Michael michael.la...@nytimes.com wrote: If you can arrange to index your rows by: (something else, your timestamp) Then you can select ranges as you wish. This works because something else is the partition key, arrived at by hash (really

Re: Summarizing Timestamp datatype

2014-06-17 Thread Laing, Michael
If you can arrange to index your rows by: (something else, your timestamp) Then you can select ranges as you wish. This works because something else is the partition key, arrived at by hash (really it's a hash key), whereas your timestamp is the clustering key (really it is a range key) which

Re: Dynamic Columns in Cassandra 2.X

2014-06-13 Thread Laing, Michael
Just to add 2 more cents... :) The CQL3 protocol is asynchronous. This can provide a substantial throughput increase, according to my benchmarking, when one uses non-blocking techniques. It is also peer-to-peer. Hence the server can generate events to send to the client, e.g. schema changes - in

Re: Large number of row keys in query kills cluster

2014-06-12 Thread Laing, Michael
Just an FYI, my benchmarking of the new python driver, which uses the asynchronous CQL native transport, indicates that one can largely overcome client-to-node latency effects if you employ a suitable level of concurrency and non-blocking techniques. Of course response size and other factors come

Re: Large number of row keys in query kills cluster

2014-06-10 Thread Laing, Michael
Perhaps if you described both the schema and the query in more detail, we could help... e.g. did the query have an IN clause with 2 keys? Or is the key compound? More detail will help. On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma jer...@barchart.com wrote: I didn't explain clearly - I'm

Re: Bad Request: Type error: cannot assign result of function token (type bigint) to id (type int)

2014-06-06 Thread Laing, Michael
select * from test_paging where *token(*id*)* token(0); ml On Fri, Jun 6, 2014 at 1:47 AM, Jonathan Haddad j...@jonhaddad.com wrote: Sorry, the datastax docs are actually a bit better: http://www.datastax.com/documentation/cql/3.0/cql/cql_using/paging_c.html Jon On Thu, Jun 5, 2014 at

python fast table copy/transform (subject updated)

2014-06-06 Thread Laing, Michael
, Marcelo. 2014-06-04 22:28 GMT-03:00 Laing, Michael michael.la...@nytimes.com: BTW you might want to put a LIMIT clause on your SELECT for testing. -ml On Wed, Jun 4, 2014 at 6:04 PM, Laing, Michael michael.la...@nytimes.com wrote: Marcelo, Here is a link to the preview of the python fast

Re: High latency on 5 node Cassandra Cluster

2014-06-04 Thread Laing, Michael
I would first check to see if there was a time synchronization issue among nodes that triggered and/or perpetuated the event. ml On Wed, Jun 4, 2014 at 3:12 AM, Arup Chakrabarti a...@pagerduty.com wrote: Hello. We had some major latency problems yesterday with our 5 node cassandra cluster.

Re: migration to a new model

2014-06-04 Thread Laing, Michael
if you want. Besides, we have some bigger clusters, I could run on the just to test the speed if this is going to help. Regards Marcelo. 2014-06-03 11:40 GMT-03:00 Laing, Michael michael.la...@nytimes.com: Hi Marcelo, I could create a fast copy program by repurposing some python apps

Re: migration to a new model

2014-06-04 Thread Laing, Michael
BTW you might want to put a LIMIT clause on your SELECT for testing. -ml On Wed, Jun 4, 2014 at 6:04 PM, Laing, Michael michael.la...@nytimes.com wrote: Marcelo, Here is a link to the preview of the python fast copy program: https://gist.github.com/michaelplaing/37d89c8f5f09ae779e47

Re: migration to a new model

2014-06-03 Thread Laing, Michael
Hi Marcelo, I could create a fast copy program by repurposing some python apps that I am using for benchmarking the python driver - do you still need this? With high levels of concurrency and multiple subprocess workers, based on my current actual benchmarks, I think I can get well over 1,000

Re: Schema disagreement errors

2014-05-12 Thread Laing, Michael
Upgrade to 2.0.7 fixed this for me. You can also try 'nodetool resetlocalschema' on disagreeing nodes. This worked temporarily for me in 2.0.6. ml On Mon, May 12, 2014 at 3:31 PM, Gaurav Sehgal gsehg...@gmail.com wrote: We have recently started seeing a lot of Schema Disagreement errors. We

Re: Deleting column names

2014-04-22 Thread Laing, Michael
Referring to the original post, I think the confusion is what is a row in this context: So as far as I understand, the s column is now the *row *key ... Since I have multiple different p, o, c combinations per s, deleting the whole *row* identified by s is no option The s column is in fact

Re: Deleting column names

2014-04-22 Thread Laing, Michael
Your understanding is incorrect - the easiest way to see that is to try it. On Tue, Apr 22, 2014 at 12:00 PM, Sebastian Schmidt isib...@gmail.comwrote: From my understanding, this would delete all entries with the given s. Meaning, if I have inserted (sa, p1, o1, c1) and (sa, p2, o2, c2),

Re: clearing tombstones?

2014-04-11 Thread Laing, Michael
I have played with this quite a bit and recommend you set gc_grace_seconds to 0 and use 'nodetool compact [keyspace] [cfname]' on your table. A caveat I have is that we use C* 2.0.6 - but the space we expect to recover is in fact recovered. Actually, since we never delete explicitly (just ttl)

Re: clearing tombstones?

2014-04-11 Thread Laing, Michael
At the cost of really quite a lot of compaction, you can temporarily switch to SizeTiered, and when that is completely done (check each node), switch back to Leveled. it's like doing the laundry twice :) I've done this on CFs that were about 5GB but I don't see why it wouldn't work on larger

Re: clearing tombstones?

2014-04-11 Thread Laing, Michael
I've never noticed that that setting tombstone_threshold has any effect... at least in 2.0.6. What gets written to the log? On Fri, Apr 11, 2014 at 3:31 PM, DuyHai Doan doanduy...@gmail.com wrote: I was wondering, to remove the tombstones from Sstables created by LCS, why don't we just set

Re: Setting gc_grace_seconds to zero and skipping nodetool repair (was RE: Timeseries with TTL)

2014-04-07 Thread Laing, Michael
” reflects a design bug; it should be automated. Don *From:* Laing, Michael [mailto:michael.la...@nytimes.com] *Sent:* Sunday, April 06, 2014 11:31 AM *To:* user@cassandra.apache.org *Subject:* Re: Timeseries with TTL Since you are using LeveledCompactionStrategy there is no major/minor

Re: Timeseries with TTL

2014-04-06 Thread Laing, Michael
Since you are using LeveledCompactionStrategy there is no major/minor compaction - just compaction. Leveled compaction does more work - your logs don't look unreasonable to me - the real question is whether your nodes can keep up w the IO. SSDs work best. BTW if you never delete and only ttl

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Laing, Michael
In your step 4, be sure you create a consistent EBS snapshot. You may have pieces of your sstables that have not actually been flushed all the way to EBS. See https://github.com/alestic/ec2-consistent-snapshot ml On Fri, Mar 28, 2014 at 3:21 PM, Russ Lavoie ussray...@yahoo.com wrote: Thank

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Laing, Michael
As I tried to say, EBS snapshots require much care or you get corruption such as you have encountered. Does Cassandra quiesce the file system after a snapshot using fsfreeze or xfs_freeze? Somehow I doubt it... On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad j...@jonhaddad.com wrote: I have

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Laing, Michael
, it's easy to pull just the new tables out via aws-cli tools (s3 sync), to your remote, non-aws server, and not incur the overhead of routinely backing up the entire dataset. For a non trivial database, this matters quite a bit. On Fri, Mar 28, 2014 at 1:21 PM, Laing, Michael michael.la

Re: Kernel keeps killing cassandra process - OOM

2014-03-22 Thread Laing, Michael
I ran into the same problem some time ago. Upgrading to Cassandra 2, jdk 1.7, and default parameters fixed it. I think the jdk change was the key for my similarly small memory cluster. ml On Sat, Mar 22, 2014 at 1:36 PM, prem yadav ipremya...@gmail.com wrote: Michael, no memory

Re: Kernel keeps killing cassandra process - OOM

2014-03-22 Thread Laing, Michael
guys? I have already tried reducing the number of rpc threads. Also tried reducing the linux kernel overcommit. On Sat, Mar 22, 2014 at 5:44 PM, Laing, Michael michael.la...@nytimes.com wrote: I ran into the same problem some time ago. Upgrading to Cassandra 2, jdk 1.7, and default

Re: Data model for boolean attributes

2014-03-21 Thread Laing, Michael
Of course what you really want is this: create table x( id text, timestamp timeuuid, flag boolean, // other fields primary key (flag, id, timestamp) ) Whoops now there are only 2 partition keys! Not good if you have any reasonable number of rows... Faced with a situation like this

Re: ALLOW FILTERING usage

2014-03-17 Thread Laing, Michael
Your second query is invalid: *Bad Request: Partition KEY part key cannot be restricted by IN relation (only the last part of the partition key can)* ml On Mon, Mar 17, 2014 at 6:56 AM, Tupshin Harper tups...@tupshin.com wrote: It's the difference between reading from only the partitions

Re: Exception in thread event_loop

2014-03-16 Thread Laing, Michael
A possible workaround - not a fix - might be to install libev so the libev event loop is used. See http://datastax.github.io/python-driver/installation.html Also be sure you are running the latest version: 1.0.2 I believe. Your ';' is outside of your 'str' - actually shouldn't be a problem tho.

Re: Cassandra slow on some reads

2014-03-14 Thread Laing, Michael
*If* you do not need to do range queries on your 'timestam' (ts) column - *and* if you can change your schema (big if...), then you could move 'timestam' into the partition key like this (using your notation): PK((key String , timestam int), column1 string, col2 string) , list1 , list 2, list 3 .

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael
I have no problem doing this w 2.0.5 - what version of C* are you using? Or maybe I don't understand your data model... attach 'creates' if you don't mind. ml On Thu, Mar 13, 2014 at 9:24 AM, David Savage davemssav...@gmail.comwrote: Hi Peter, Thanks for the help, unfortunately I'm not sure

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael
at 1:56 PM, Laing, Michael michael.la...@nytimes.com wrote: I have no problem doing this w 2.0.5 - what version of C* are you using? Or maybe I don't understand your data model... attach 'creates' if you don't mind. ml On Thu, Mar 13, 2014 at 9:24 AM, David Savage davemssav...@gmail.comwrote

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael
) or PRIMARY KEY ((key1, key2)), any examples would be welcome if you have the time. Kind regards, Dave On Thu, Mar 13, 2014 at 2:56 PM, Laing, Michael michael.la...@nytimes.com wrote: Create your table like this and it will work: CREATE TABLE test.documents (group text,id bigint,data maptext

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael
be specified to identify individual rows in a partition. Without clustering columns, one partition is one row. So, it’s a matter of whether you want your rows to be in the same partition or distributed. -- Jack Krupansky *From:* Laing, Michael michael.la...@nytimes.com *Sent:* Thursday

Re: CQL Select Map using an IN relationship

2014-03-13 Thread Laing, Michael
These are my personal opinions, reflecting both my long experience w database systems, and my newness to Cassandra... [tl;dr] The Cassandra contributors, having made its history, tend to describe it in terms of implementation rather than action. And its implementation has a history, all

Re: CQL decimal encoding

2014-02-26 Thread Laing, Michael
go uses 'zig-zag' encoding, perhaps that is the difference? On Wed, Feb 26, 2014 at 6:52 AM, Peter Lin wool...@gmail.com wrote: You may need to bit shift if that is the case Sent from my iPhone On Feb 26, 2014, at 2:53 AM, Ben Hood 0x6e6...@gmail.com wrote: Hey Colin, On Tue, Feb

Re: Queuing System

2014-02-22 Thread Laing, Michael
We use RabbitMQ for queuing and Cassandra for persistence. RabbitMQ with clustering and/or federation should meet your high availability needs. Michael On Sat, Feb 22, 2014 at 10:25 AM, DuyHai Doan doanduy...@gmail.com wrote: Jagan Queue-like data structures are known to be one of the

Re: Performance problem with large wide row inserts using CQL

2014-02-20 Thread Laing, Michael
Just to add my 2 cents... We are very happy CQL users, running in production. I have had no problems modeling whatever I have needed to, including problems similar to the examples set forth previously, in CQL. Personally I think it is an excellent improvement to Cassandra, and we have no

Re: GC taking a long time

2014-02-06 Thread Laing, Michael
for the restart issue see CASSANDRA-6008https://issues.apache.org/jira/browse/CASSANDRA-6008 and 6086 On Thu, Feb 6, 2014 at 12:19 PM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi Robert, The heap, and GC are things a bit tricky to tune, I recently read a post about heap, explaining how

Re: No deletes - is periodic repair needed? I think not...

2014-01-28 Thread Laing, Michael
the separation of the case 2. (fixed ttl, no repair needed) and 2.a. (variable ttl, repair may be needed). -- Sylvain Unless i am missing something. On Monday, January 27, 2014, Laing, Michael michael.la...@nytimes.com wrote: Thanks Sylvain, Your assumption is correct! So I think I

Re: No deletes - is periodic repair needed? I think not...

2014-01-27 Thread Laing, Michael
Thanks Sylvain, Your assumption is correct! So I think I actually have 4 classes: 1.Regular values, no deletes, no overwrites, write heavy, variable ttl's to manage size 2.Regular values, no deletes, some overwrites, read heavy (10 to 1), fixed ttl's to manage size 2.a. Regular values,

  1   2   >