Re: repair strange behavior

2012-04-22 Thread Philippe
Repairs generate new files that then need to be compacted.
Maybe that's where the temporary extra volume comes from?
Le 21 avr. 2012 20:43, Igor i...@4friends.od.ua a écrit :

 Hi

 I can't understand the repair behavior in my case. I have 12 nodes ring
 (all 1.0.7):

 10.254.237.2LA  ADS-LA-1Up Normal  50.92 GB
  0.00%   0
 10.254.238.2TX  TX-24-RACK  Up Normal  33.29 GB
  0.00%   1
 10.254.236.2VA  ADS-VA-1Up Normal  50.07 GB
  0.00%   2
 10.254.93.2 IL  R1  Up Normal  49.29 GB
  0.00%   3
 10.253.4.2  AZ  R1  Up Normal  37.83 GB
  0.00%   5
 10.254.180.2GB  GB-1Up Normal  42.86 GB
  50.00%  850705917302346158658436518579**42052863
 10.254.191.2LA  ADS-LA-1Up Normal  47.64 GB
  0.00%   850705917302346158658436518579**42052864
 10.254.221.2TX  TX-24-RACK  Up Normal  43.42 GB
  0.00%   850705917302346158658436518579**42052865
 10.254.217.2VA  ADS-VA-1Up Normal  38.44 GB
  0.00%   850705917302346158658436518579**42052866
 10.254.94.2 IL  R1  Up Normal  49.31 GB
  0.00%   850705917302346158658436518579**42052867
 10.253.5.2  AZ  R1  Up Normal  49.01 GB
  0.00%   850705917302346158658436518579**42052869
 10.254.179.2GB  GB-1Up Normal  27.08 GB
  50.00%  170141183460469231731687303715**884105727

 I have single keyspace 'meter' and two column families (one 'ids' is
 small, and second is bigger). The strange thing happened today when I try
 to run
 nodetool -h 10.254.180.2 -pr meter ids
 two times one after another. First repair finished successfully

  INFO 16:33:02,492 [repair #db582370-8bba-11e1--**5b777f708bff] ids
 is fully synced
  INFO 16:33:02,526 [repair #db582370-8bba-11e1--**5b777f708bff]
 session completed successfully

 after moving near 50G of data, and I started second session one hour later:

 INFO 17:44:37,842 [repair #aa415d00-8bd9-11e1--**5b777f708bff] new
 session: will sync localhost/1
 0.254.180.2, /10.254.221.2, /10.254.191.2, /10.254.217.2, /10.253.5.2, /
 10.254.94.2 on range (5,8507
 059173023461586584365185794205**2863] for meter.[ids]

 What is strange - when streams for the second repair starts they have the
 same or even bigger total volume, and I expected that second run will move
 less data (or even no data at all).

 Is it OK? Or should I fix something?

 Thanks!




Re: repair strange behavior

2012-04-22 Thread Igor
but after repair all nodes should be in sync regardless of whether new 
files were compacted or not.

Do you suggest major compaction after repair? I'd like to avoid it.

On 04/22/2012 11:52 AM, Philippe wrote:


Repairs generate new files that then need to be compacted.
Maybe that's where the temporary extra volume comes from?

Le 21 avr. 2012 20:43, Igor i...@4friends.od.ua 
mailto:i...@4friends.od.ua a écrit :


Hi

I can't understand the repair behavior in my case. I have 12 nodes
ring (all 1.0.7):

10.254.237.2LA  ADS-LA-1Up Normal  50.92 GB  
 0.00%   0
10.254.238.2TX  TX-24-RACK  Up Normal  33.29 GB  
 0.00%   1
10.254.236.2VA  ADS-VA-1Up Normal  50.07 GB  
 0.00%   2
10.254.93.2 IL  R1  Up Normal  49.29 GB  
 0.00%   3
10.253.4.2  AZ  R1  Up Normal  37.83 GB  
 0.00%   5
10.254.180.2GB  GB-1Up Normal  42.86 GB  
 50.00%  85070591730234615865843651857942052863
10.254.191.2LA  ADS-LA-1Up Normal  47.64 GB  
 0.00%   85070591730234615865843651857942052864
10.254.221.2TX  TX-24-RACK  Up Normal  43.42 GB  
 0.00%   85070591730234615865843651857942052865
10.254.217.2VA  ADS-VA-1Up Normal  38.44 GB  
 0.00%   85070591730234615865843651857942052866
10.254.94.2 IL  R1  Up Normal  49.31 GB  
 0.00%   85070591730234615865843651857942052867
10.253.5.2  AZ  R1  Up Normal  49.01 GB  
 0.00%   85070591730234615865843651857942052869
10.254.179.2GB  GB-1Up Normal  27.08 GB  
 50.00%  170141183460469231731687303715884105727


I have single keyspace 'meter' and two column families (one 'ids'
is small, and second is bigger). The strange thing happened today
when I try to run
nodetool -h 10.254.180.2 -pr meter ids
two times one after another. First repair finished successfully

 INFO 16:33:02,492 [repair #db582370-8bba-11e1--5b777f708bff]
ids is fully synced
 INFO 16:33:02,526 [repair #db582370-8bba-11e1--5b777f708bff]
session completed successfully

after moving near 50G of data, and I started second session one
hour later:

INFO 17:44:37,842 [repair #aa415d00-8bd9-11e1--5b777f708bff]
new session: will sync localhost/1
0.254.180.2, /10.254.221.2 http://10.254.221.2, /10.254.191.2
http://10.254.191.2, /10.254.217.2 http://10.254.217.2,
/10.253.5.2 http://10.253.5.2, /10.254.94.2 http://10.254.94.2
on range (5,8507
0591730234615865843651857942052863] for meter.[ids]

What is strange - when streams for the second repair starts they
have the same or even bigger total volume, and I expected that
second run will move less data (or even no data at all).

Is it OK? Or should I fix something?

Thanks!





Server Side Logic/Script - Triggers / StoreProc

2012-04-22 Thread Praveen Baratam
I found that Triggers are coming in Cassandra 1.2 (
https://issues.apache.org/jira/browse/CASSANDRA-1311) but no mention of any
StoreProc like pattern.

I know this has been discussed so many times but never met with
any initiative. Even Groovy was staged out of the trunk.

Cassandra is great for logging and as such will be infinitely more useful
if some logic can be pushed into the Cassandra cluster nearer to the
location of Data to generate a materialized view useful for applications.

Server Side Scripts/Routines in Distributed Databases could soon prove to
be the differentiating factor.

Let me reiterate things with a use case.

In our application we store time series data in wide rows with TTL set on
each point to prevent data from growing beyond acceptable limits. Still the
data size can be a limiting factor to move all of it from the cluster node
to the querying node and then to the application via thrift for processing
and presentation.

Ideally we should process the data on the residing node and pass only the
materialized view of the data upstream. This should be trivial if Cassandra
implements some sort of server side scripting and CQL semantics to call it.

Is anybody else interested in a similar feature? Is it being worked on? Are
there any alternative strategies to this problem?

Praveen


Re: Server Side Logic/Script - Triggers / StoreProc

2012-04-22 Thread Brian O'Neill
Praveen,

We are certainly interested. To get things moving we implemented an add-on for 
Cassandra to demonstrate the viability (using AOP):
https://github.com/hmsonline/cassandra-triggers

Right now the implementation executes triggers asynchronously, allowing you to 
implement a java interface and plugin your own java class that will get called 
for every insert.

Per the discussion on 1311, we intend to extend our proof of concept to be able 
to invoke scripts as well.  (minimally we'll enable javascript, but we'll 
probably allow for ruby and groovy as well)

-brian

On Apr 22, 2012, at 12:23 PM, Praveen Baratam wrote:

 I found that Triggers are coming in Cassandra 1.2 
 (https://issues.apache.org/jira/browse/CASSANDRA-1311) but no mention of any 
 StoreProc like pattern.
 
 I know this has been discussed so many times but never met with any 
 initiative. Even Groovy was staged out of the trunk.
 
 Cassandra is great for logging and as such will be infinitely more useful if 
 some logic can be pushed into the Cassandra cluster nearer to the location of 
 Data to generate a materialized view useful for applications.
 
 Server Side Scripts/Routines in Distributed Databases could soon prove to be 
 the differentiating factor.
 
 Let me reiterate things with a use case.
 
 In our application we store time series data in wide rows with TTL set on 
 each point to prevent data from growing beyond acceptable limits. Still the 
 data size can be a limiting factor to move all of it from the cluster node to 
 the querying node and then to the application via thrift for processing and 
 presentation.
 
 Ideally we should process the data on the residing node and pass only the 
 materialized view of the data upstream. This should be trivial if Cassandra 
 implements some sort of server side scripting and CQL semantics to call it.
 
 Is anybody else interested in a similar feature? Is it being worked on? Are 
 there any alternative strategies to this problem?
 
 Praveen
 
 

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/