Re: repair strange behavior
Repairs generate new files that then need to be compacted. Maybe that's where the temporary extra volume comes from? Le 21 avr. 2012 20:43, Igor i...@4friends.od.ua a écrit : Hi I can't understand the repair behavior in my case. I have 12 nodes ring (all 1.0.7): 10.254.237.2LA ADS-LA-1Up Normal 50.92 GB 0.00% 0 10.254.238.2TX TX-24-RACK Up Normal 33.29 GB 0.00% 1 10.254.236.2VA ADS-VA-1Up Normal 50.07 GB 0.00% 2 10.254.93.2 IL R1 Up Normal 49.29 GB 0.00% 3 10.253.4.2 AZ R1 Up Normal 37.83 GB 0.00% 5 10.254.180.2GB GB-1Up Normal 42.86 GB 50.00% 850705917302346158658436518579**42052863 10.254.191.2LA ADS-LA-1Up Normal 47.64 GB 0.00% 850705917302346158658436518579**42052864 10.254.221.2TX TX-24-RACK Up Normal 43.42 GB 0.00% 850705917302346158658436518579**42052865 10.254.217.2VA ADS-VA-1Up Normal 38.44 GB 0.00% 850705917302346158658436518579**42052866 10.254.94.2 IL R1 Up Normal 49.31 GB 0.00% 850705917302346158658436518579**42052867 10.253.5.2 AZ R1 Up Normal 49.01 GB 0.00% 850705917302346158658436518579**42052869 10.254.179.2GB GB-1Up Normal 27.08 GB 50.00% 170141183460469231731687303715**884105727 I have single keyspace 'meter' and two column families (one 'ids' is small, and second is bigger). The strange thing happened today when I try to run nodetool -h 10.254.180.2 -pr meter ids two times one after another. First repair finished successfully INFO 16:33:02,492 [repair #db582370-8bba-11e1--**5b777f708bff] ids is fully synced INFO 16:33:02,526 [repair #db582370-8bba-11e1--**5b777f708bff] session completed successfully after moving near 50G of data, and I started second session one hour later: INFO 17:44:37,842 [repair #aa415d00-8bd9-11e1--**5b777f708bff] new session: will sync localhost/1 0.254.180.2, /10.254.221.2, /10.254.191.2, /10.254.217.2, /10.253.5.2, / 10.254.94.2 on range (5,8507 059173023461586584365185794205**2863] for meter.[ids] What is strange - when streams for the second repair starts they have the same or even bigger total volume, and I expected that second run will move less data (or even no data at all). Is it OK? Or should I fix something? Thanks!
Re: repair strange behavior
but after repair all nodes should be in sync regardless of whether new files were compacted or not. Do you suggest major compaction after repair? I'd like to avoid it. On 04/22/2012 11:52 AM, Philippe wrote: Repairs generate new files that then need to be compacted. Maybe that's where the temporary extra volume comes from? Le 21 avr. 2012 20:43, Igor i...@4friends.od.ua mailto:i...@4friends.od.ua a écrit : Hi I can't understand the repair behavior in my case. I have 12 nodes ring (all 1.0.7): 10.254.237.2LA ADS-LA-1Up Normal 50.92 GB 0.00% 0 10.254.238.2TX TX-24-RACK Up Normal 33.29 GB 0.00% 1 10.254.236.2VA ADS-VA-1Up Normal 50.07 GB 0.00% 2 10.254.93.2 IL R1 Up Normal 49.29 GB 0.00% 3 10.253.4.2 AZ R1 Up Normal 37.83 GB 0.00% 5 10.254.180.2GB GB-1Up Normal 42.86 GB 50.00% 85070591730234615865843651857942052863 10.254.191.2LA ADS-LA-1Up Normal 47.64 GB 0.00% 85070591730234615865843651857942052864 10.254.221.2TX TX-24-RACK Up Normal 43.42 GB 0.00% 85070591730234615865843651857942052865 10.254.217.2VA ADS-VA-1Up Normal 38.44 GB 0.00% 85070591730234615865843651857942052866 10.254.94.2 IL R1 Up Normal 49.31 GB 0.00% 85070591730234615865843651857942052867 10.253.5.2 AZ R1 Up Normal 49.01 GB 0.00% 85070591730234615865843651857942052869 10.254.179.2GB GB-1Up Normal 27.08 GB 50.00% 170141183460469231731687303715884105727 I have single keyspace 'meter' and two column families (one 'ids' is small, and second is bigger). The strange thing happened today when I try to run nodetool -h 10.254.180.2 -pr meter ids two times one after another. First repair finished successfully INFO 16:33:02,492 [repair #db582370-8bba-11e1--5b777f708bff] ids is fully synced INFO 16:33:02,526 [repair #db582370-8bba-11e1--5b777f708bff] session completed successfully after moving near 50G of data, and I started second session one hour later: INFO 17:44:37,842 [repair #aa415d00-8bd9-11e1--5b777f708bff] new session: will sync localhost/1 0.254.180.2, /10.254.221.2 http://10.254.221.2, /10.254.191.2 http://10.254.191.2, /10.254.217.2 http://10.254.217.2, /10.253.5.2 http://10.253.5.2, /10.254.94.2 http://10.254.94.2 on range (5,8507 0591730234615865843651857942052863] for meter.[ids] What is strange - when streams for the second repair starts they have the same or even bigger total volume, and I expected that second run will move less data (or even no data at all). Is it OK? Or should I fix something? Thanks!
Server Side Logic/Script - Triggers / StoreProc
I found that Triggers are coming in Cassandra 1.2 ( https://issues.apache.org/jira/browse/CASSANDRA-1311) but no mention of any StoreProc like pattern. I know this has been discussed so many times but never met with any initiative. Even Groovy was staged out of the trunk. Cassandra is great for logging and as such will be infinitely more useful if some logic can be pushed into the Cassandra cluster nearer to the location of Data to generate a materialized view useful for applications. Server Side Scripts/Routines in Distributed Databases could soon prove to be the differentiating factor. Let me reiterate things with a use case. In our application we store time series data in wide rows with TTL set on each point to prevent data from growing beyond acceptable limits. Still the data size can be a limiting factor to move all of it from the cluster node to the querying node and then to the application via thrift for processing and presentation. Ideally we should process the data on the residing node and pass only the materialized view of the data upstream. This should be trivial if Cassandra implements some sort of server side scripting and CQL semantics to call it. Is anybody else interested in a similar feature? Is it being worked on? Are there any alternative strategies to this problem? Praveen
Re: Server Side Logic/Script - Triggers / StoreProc
Praveen, We are certainly interested. To get things moving we implemented an add-on for Cassandra to demonstrate the viability (using AOP): https://github.com/hmsonline/cassandra-triggers Right now the implementation executes triggers asynchronously, allowing you to implement a java interface and plugin your own java class that will get called for every insert. Per the discussion on 1311, we intend to extend our proof of concept to be able to invoke scripts as well. (minimally we'll enable javascript, but we'll probably allow for ruby and groovy as well) -brian On Apr 22, 2012, at 12:23 PM, Praveen Baratam wrote: I found that Triggers are coming in Cassandra 1.2 (https://issues.apache.org/jira/browse/CASSANDRA-1311) but no mention of any StoreProc like pattern. I know this has been discussed so many times but never met with any initiative. Even Groovy was staged out of the trunk. Cassandra is great for logging and as such will be infinitely more useful if some logic can be pushed into the Cassandra cluster nearer to the location of Data to generate a materialized view useful for applications. Server Side Scripts/Routines in Distributed Databases could soon prove to be the differentiating factor. Let me reiterate things with a use case. In our application we store time series data in wide rows with TTL set on each point to prevent data from growing beyond acceptable limits. Still the data size can be a limiting factor to move all of it from the cluster node to the querying node and then to the application via thrift for processing and presentation. Ideally we should process the data on the residing node and pass only the materialized view of the data upstream. This should be trivial if Cassandra implements some sort of server side scripting and CQL semantics to call it. Is anybody else interested in a similar feature? Is it being worked on? Are there any alternative strategies to this problem? Praveen -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/