Re: trouble setting up initial cluster: Host ID collision between active endpoint

2013-01-24 Thread aaron morton
They both have 0 for their token, and this is stored in their System keyspace. 
Scrub them and start again. 

 But I found that the tokens that were being generated would require way too 
 much memory
Token assignments have nothing to do with memory usage. 

  m1.micro instances
You are better off using your laptop than micro instances. 
For playing around try m1.large and terminate them when not in use. 
To make life easier use this to make the cluster for you 
http://www.datastax.com/docs/1.2/install/install_ami

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/01/2013, at 5:17 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Hello list,
 
  I really do appreciate the advice I've gotten here as I start building 
 familiarity with Cassandra. Aside from the single node instance I setup for a 
 developer friend, I've just been playing with a single node in a VM on my 
 laptop and playing around with the cassandra-cli and PHP.
 
 Well I've decided to setup my first cluster on my amazon ec2 account and I'm 
 running into an issue getting the nodes to gossip. 
 
 I've set the IP's of 'node01' and 'node02' ec2 instances in their respective 
 listen_address, rpc_address and made sure that the 'cluster_name' on both was 
 in agreement.
 
  I believe the problem may be in one of two places: either the seeds or the 
 initial_token setting. 
 
 For the seeds I have it setup as such. I put the IPs for both machines in the 
 'seeds' settings for each, thinking this would be how each node would 
 discover each other:
 
  - seeds: 10.xxx.xxx.248,10.xxx.xxx.123
 
 Initially I tried the tokengen script that I found in the documentation. But 
 I found that the tokens that were being generated would require way too much 
 memory for the m1.micro instances that I'm experimenting with on the Amazon 
 free tier. And according to the docs in the config it is in some cases ok to 
 leave that field blank. So that's what I did on both instances. 
 
 Not sure how much/if this matters but I am using the setting - 
 endpoint_snitch: Ec2Snitch
 
 Finally, when I start up the first node all goes well.
 
 But when I startup the second node I see this exception on both hosts:
 
 node1
 
 INFO 11:02:32,231 Listening for thrift clients...
  INFO 11:02:59,262 Node /10.xxx.xxx.123 is now part of the cluster
  INFO 11:02:59,268 InetAddress /10.xxx.xxx.123 is now UP
 ERROR 11:02:59,270 Exception in thread Thread[GossipStage:1,5,main]
 java.lang.RuntimeException: Host ID collision between active endpoint 
 /10..xxx.248 and /10.xxx.xxx.123 (id=54ce7ccd-1b1d-418e-9861-1c281c078b8f)
 at 
 org.apache.cassandra.locator.TokenMetadata.updateHostId(TokenMetadata.java:227)
 at 
 org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1296)
 at 
 org.apache.cassandra.service.StorageService.onChange(StorageService.java:1157)
 at 
 org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1895)
 at 
 org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:805)
 at 
 org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:883)
 at 
 org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:43)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 
 And on node02 I see:
 
  INFO 11:02:58,817 Starting Messaging Service on port 7000
  INFO 11:02:58,835 Using saved token [0]
  INFO 11:02:58,837 Enqueuing flush of Memtable-local@672636645(84/84 
 serialized/live bytes, 4 ops)
  INFO 11:02:58,838 Writing Memtable-local@672636645(84/84 serialized/live 
 bytes, 4 ops)
  INFO 11:02:58,912 Completed flushing 
 /var/lib/cassandra/data/system/local/system-local-ia-43-Data.db (120 bytes) 
 for commitlog position ReplayPosition(segmentId=1358956977628, position=49266)
  INFO 11:02:58,922 Enqueuing flush of Memtable-local@1007604537(32/32 
 serialized/live bytes, 2 ops)
  INFO 11:02:58,923 Writing Memtable-local@1007604537(32/32 serialized/live 
 bytes, 2 ops)
  INFO 11:02:58,943 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ia-40-Data.db'),
  
 SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ia-42-Data.db'),
  
 SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ia-43-Data.db'),
  
 SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ia-41-Data.db')]
  INFO 11:02:58,953 Node /10.192.179.248 is now part of the cluster
  INFO 11:02:58,961 InetAddress /10.192.179.248 is now UP
  INFO 11:02:59,003 Completed flushing 
 /var/lib/cassandra/data/system/local/system-local-ia-44-Data.db (90 bytes) 
 for 

Re: Perfroming simple CQL Query using pyhton db-api 2.0 fails

2013-01-24 Thread aaron morton
How did you create the table? 

Anyways that looks like a bug, I *think* they should go here 
http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/issues/list

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/01/2013, at 7:14 AM, Paul van Hoven paul.van.ho...@googlemail.com wrote:

 I try to access my local cassandra database via python. Therefore I
 installed db-api 2.0 and thrift for accessing the database. Opening
 and closing a connection works fine. But a simply query is not
 working:
 
 The script looks like this:
 
c = conn.cursor()
c.execute(select * from users;)
data = c.fetchall()
print Query: select * from users; returned the following result:
print str(data)
 
 
 The table users looks like this:
 qlsh:demodb select * from users;
 
 user_name | birth_year | gender | password | session_token | state
 ---+++--+---+---
jsmith |   null |   null |   secret |  null |  null
 
 
 
 But when I try to execute it I get the following error:
 Open connection to localhost:9160 on keyspace demodb
 Traceback (most recent call last):
  File 
 /Users/Tom/Freelancing/Company/Python/ApacheCassandra/src/CassandraDemo.py,
 line 56, in module
perfromSimpleCQLQuery()
  File 
 /Users/Tom/Freelancing/Company/Python/ApacheCassandra/src/CassandraDemo.py,
 line 46, in perfromSimpleCQLQuery
c.execute(select * from users;)
  File /Library/Python/2.7/site-packages/cql/cursor.py, line 81, in execute
return self.process_execution_results(response, decoder=decoder)
  File /Library/Python/2.7/site-packages/cql/thrifteries.py, line
 116, in process_execution_results
self.get_metadata_info(self.result[0])
  File /Library/Python/2.7/site-packages/cql/cursor.py, line 97, in
 get_metadata_info
name, nbytes, vtype, ctype = self.get_column_metadata(colid)
  File /Library/Python/2.7/site-packages/cql/cursor.py, line 104, in
 get_column_metadata
return self.decoder.decode_metadata_and_type(column_id)
  File /Library/Python/2.7/site-packages/cql/decoders.py, line 45,
 in decode_metadata_and_type
name = self.name_decode_error(e, namebytes,
 comptype.cql_parameterized_type())
  File /Library/Python/2.7/site-packages/cql/decoders.py, line 29,
 in name_decode_error
% (namebytes, expectedtype, err))
 cql.apivalues.ProgrammingError: column name '\x00\x00\x00' can't be
 deserialized as 'org.apache.cassandra.db.marshal.CompositeType':
 global name 'self' is not defined
 
 I'm not shure if this is the right place to ask for: But am I doing
 here something wrong?



multiple reducers with BulkOutputFormat on the same host

2013-01-24 Thread Alexei Bakanov
Hello,

We see that BulkOutputFormat fails to stream data from multiple reduce
instances that run on the same host.
We get the same error messages that issue
https://issues.apache.org/jira/browse/CASSANDRA-4223 tries to address.
Looks like (ip-adress + in_out_flag + atomic integer) is not unique
enough for a sessionId when we have multiple JVMs streaming from one
physical host.

We get the problem fixed by setting one reducer per machine in hadoop
config, but it's not an option we want to deploy.

Thanks,
Alexei Bakanov


Re: multiple reducers with BulkOutputFormat on the same host

2013-01-24 Thread Yuki Morishita
Alexel,

You were right.
It was already fixed to use UUID for streaming session and released in 1.2.0.
See https://issues.apache.org/jira/browse/CASSANDRA-4813.


On Thursday, January 24, 2013 at 6:49 AM, Alexei Bakanov wrote:

 Hello,
 
 We see that BulkOutputFormat fails to stream data from multiple reduce
 instances that run on the same host.
 We get the same error messages that issue
 https://issues.apache.org/jira/browse/CASSANDRA-4223 tries to address.
 Looks like (ip-adress + in_out_flag + atomic integer) is not unique
 enough for a sessionId when we have multiple JVMs streaming from one
 physical host.
 
 We get the problem fixed by setting one reducer per machine in hadoop
 config, but it's not an option we want to deploy.
 
 Thanks,
 Alexei Bakanov
 
 




Re: multiple reducers with BulkOutputFormat on the same host

2013-01-24 Thread Alexei Bakanov
Oh, that's nice! Thanks!

On 24 January 2013 13:55, Yuki Morishita mor.y...@gmail.com wrote:
 Alexel,

 You were right.
 It was already fixed to use UUID for streaming session and released in
 1.2.0.
 See https://issues.apache.org/jira/browse/CASSANDRA-4813.

 On Thursday, January 24, 2013 at 6:49 AM, Alexei Bakanov wrote:

 Hello,

 We see that BulkOutputFormat fails to stream data from multiple reduce
 instances that run on the same host.
 We get the same error messages that issue
 https://issues.apache.org/jira/browse/CASSANDRA-4223 tries to address.
 Looks like (ip-adress + in_out_flag + atomic integer) is not unique
 enough for a sessionId when we have multiple JVMs streaming from one
 physical host.

 We get the problem fixed by setting one reducer per machine in hadoop
 config, but it's not an option we want to deploy.

 Thanks,
 Alexei Bakanov




Re: trouble setting up initial cluster: Host ID collision between active endpoint

2013-01-24 Thread Tim Dunphy
Cool Thanks for the advice Aaron. I actually did get this working before I
read your reply. The trick apparently for me was to use the IP for the
first node in the seeds setting of each successive node. But I like the
idea of using larges for an hour or so and terminating them for some basic
experimentation.  Also, thanks for pointing me to the Datastax AMIs I'll be
sure to check them out.

Tim

On Thu, Jan 24, 2013 at 3:45 AM, aaron morton aa...@thelastpickle.comwrote:

 They both have 0 for their token, and this is stored in their System
 keyspace.
 Scrub them and start again.

 But I found that the tokens that were being generated would require way
 too much memory

 Token assignments have nothing to do with memory usage.

  m1.micro instances

 You are better off using your laptop than micro instances.
 For playing around try m1.large and terminate them when not in use.
 To make life easier use this to make the cluster for you
 http://www.datastax.com/docs/1.2/install/install_ami

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 24/01/2013, at 5:17 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Hello list,

  I really do appreciate the advice I've gotten here as I start building
 familiarity with Cassandra. Aside from the single node instance I setup for
 a developer friend, I've just been playing with a single node in a VM on my
 laptop and playing around with the cassandra-cli and PHP.

 Well I've decided to setup my first cluster on my amazon ec2 account and
 I'm running into an issue getting the nodes to gossip.

 I've set the IP's of 'node01' and 'node02' ec2 instances in their
 respective listen_address, rpc_address and made sure that the
 'cluster_name' on both was in agreement.

  I believe the problem may be in one of two places: either the seeds or
 the initial_token setting.

 For the seeds I have it setup as such. I put the IPs for both machines in
 the 'seeds' settings for each, thinking this would be how each node would
 discover each other:

  - seeds: 10.xxx.xxx.248,10.xxx.xxx.123

 Initially I tried the tokengen script that I found in the documentation.
 But I found that the tokens that were being generated would require way too
 much memory for the m1.micro instances that I'm experimenting with on the
 Amazon free tier. And according to the docs in the config it is in some
 cases ok to leave that field blank. So that's what I did on both instances.

 Not sure how much/if this matters but I am using the setting -
 endpoint_snitch: Ec2Snitch

 Finally, when I start up the first node all goes well.

 But when I startup the second node I see this exception on both hosts:

 node1

 INFO 11:02:32,231 Listening for thrift clients...
  INFO 11:02:59,262 Node /10.xxx.xxx.123 is now part of the cluster
  INFO 11:02:59,268 InetAddress /10.xxx.xxx.123 is now UP
 ERROR 11:02:59,270 Exception in thread Thread[GossipStage:1,5,main]
 java.lang.RuntimeException: Host ID collision between active endpoint
 /10..xxx.248 and /10.xxx.xxx.123
 (id=54ce7ccd-1b1d-418e-9861-1c281c078b8f)
 at
 org.apache.cassandra.locator.TokenMetadata.updateHostId(TokenMetadata.java:227)
 at
 org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1296)
 at
 org.apache.cassandra.service.StorageService.onChange(StorageService.java:1157)
 at
 org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1895)
 at
 org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:805)
 at
 org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:883)
 at
 org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:43)
 at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)
 at java.lang.Thread.run(Unknown Source)

 And on node02 I see:

  INFO 11:02:58,817 Starting Messaging Service on port 7000
  INFO 11:02:58,835 Using saved token [0]
  INFO 11:02:58,837 Enqueuing flush of Memtable-local@672636645(84/84
 serialized/live bytes, 4 ops)
  INFO 11:02:58,838 Writing Memtable-local@672636645(84/84 serialized/live
 bytes, 4 ops)
  INFO 11:02:58,912 Completed flushing
 /var/lib/cassandra/data/system/local/system-local-ia-43-Data.db (120 bytes)
 for commitlog position ReplayPosition(segmentId=1358956977628,
 position=49266)
  INFO 11:02:58,922 Enqueuing flush of Memtable-local@1007604537(32/32
 serialized/live bytes, 2 ops)
  INFO 11:02:58,923 Writing Memtable-local@1007604537(32/32
 serialized/live bytes, 2 ops)
  INFO 11:02:58,943 Compacting
 [SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ia-40-Data.db'),
 

Fwd: {kundera-discuss} Kundera 2.3 released

2013-01-24 Thread Vivek Mishra
-- Forwarded message --
From: Vivek Mishra vivek.mis...@impetus.co.in
Date: Thu, Jan 24, 2013 at 8:29 PM
Subject: {kundera-discuss} Kundera 2.3 released
To: kundera-disc...@googlegroups.com kundera-disc...@googlegroups.com

  Hi All,

We are happy to announce release of Kundera 2.3.

Kundera is a JPA 2.0 compliant, object-datastore mapping library for NoSQL
datastores. The idea behind Kundera is to make working with NoSQL Databases
drop-dead simple and fun.
It currently supports Cassandra, HBase, MongoDB, Redis and relational
databases.

Major Changes:
-
1)  Added Redis (http://redis.io/) to Kundera's supported database list. (
https://github.com/impetus-opensource/Kundera/wiki/Kundera-over-Redis-Connecting-...
)
2)  Cassandra 1.2 migration.
3) Changes in HBase schema handling.
4)  Stronger query support, like selective column/id search via JPQL.
5)  Enable support for @Transient for embeddedColumns and mappedsuperclass.
6)  Allow to set record limit on search for mongodb .
7)  Performance improvement on Cassandra,HBase,MongoDB.


Github Bug Fixes:
--
https://github.com/impetus-opensource/Kundera/issues/163
https://github.com/impetus-opensource/Kundera/issues/162
https://github.com/impetus-opensource/Kundera/issues/154
https://github.com/impetus-opensource/Kundera/issues/141
https://github.com/impetus-opensource/Kundera/issues/133
https://github.com/impetus-opensource/Kundera/issues/131
https://github.com/impetus-opensource/Kundera/issues/127
https://github.com/impetus-opensource/Kundera/issues/122
https://github.com/impetus-opensource/Kundera/issues/121
https://github.com/impetus-opensource/Kundera/issues/117
https://github.com/impetus-opensource/Kundera/issues/84
https://github.com/impetus-opensource/Kundera/issues/67

@kundera-discuss  issues
---
1) Batch operation over Cassandra composite key not working.


We have revamped our wiki, so you might want to have a look at it here:
https://github.com/impetus-**opensource/Kundera/wikihttps://github.com/impetus-opensource/Kundera/wiki

To download, use or contribute to Kundera, visit:
http://github.com/impetus-**opensource/Kunderahttp://github.com/impetus-opensource/Kundera

Latest released tag version is 2.3 Kundera maven libraries are now
available at: https://oss.sonatype.org/**content/repositories/releases/**
com/impetushttps://oss.sonatype.org/content/repositories/releases/com/impetus

Sample codes and examples for using Kundera can be found here:
http://github.com/impetus-**opensource/Kundera-Exampleshttp://github.com/impetus-opensource/Kundera-Examples

And

https://github.com/impetus-**opensource/Kundera/tree/trunk/**kundera-testshttps://github.com/impetus-opensource/Kundera/tree/trunk/kundera-tests


Thank you all for your contributions!

Sincerely,
Kundera Team

--






NOTE: This message may contain information that is confidential,
proprietary, privileged or otherwise protected by law. The message is
intended solely for the named addressee. If received in error, please
destroy and notify the sender. Any use of this email is prohibited when
received in error. Impetus does not represent, warrant and/or guarantee,
that the integrity of this communication has been maintained nor that the
communication is free of errors, virus, interception or interference.

--


Issues with CQLSH in Cassandra 1.2

2013-01-24 Thread Gabriel Ciuloaica

Hi,

I have spent half of the day today trying to make a new Cassandra 
cluster to work. I have setup a single data center cluster, using 
NetworkTopologyStrategy, DC1:3.
I'm using latest version of Astyanax client to connect. After many hours 
of debug, I found out that the problem may be in cqlsh utility.


So, after the cluster was up and running:
[me@cassandra-node1 cassandra]$ nodetool status
Datacenter: DC-1
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns (effective)  Host 
ID   Rack
UN  10.11.1.109   59.1 KB256 0.0% 
726689df-edc3-49a0-b680-370953994a8c  RAC2
UN  10.11.1.108   67.49 KB   256 0.0% 
73cd86a9-4efb-4407-9fe8-9a1b3a277af7  RAC1
UN  10.11.1.200   59.84 KB   64  0.0% 
d6d700d4-28aa-4722-b215-a6a7d304b8e7  RAC3


I went to create the keyspace:
1. First I have tried using cqlsh:
create keyspace foo with replication= 
{'class':'NetworkTopologyStrategy','DC1':3};


after this, I have checked that the keyspace was properly created by 
running


cqlsh select * from system.schema_keyspaces;
 keyspace_name | durable_writes | 
strategy_class   | strategy_options

---++--+
   system_auth |   True | 
org.apache.cassandra.locator.SimpleStrategy | {replication_factor:1}
  foo |   True | 
org.apache.cassandra.locator.NetworkTopologyStrategy | {dc1:3}
system |   True | 
org.apache.cassandra.locator.LocalStrategy | {}
 system_traces |   True | 
org.apache.cassandra.locator.SimpleStrategy | {replication_factor:1}


but if I run nodetool describering foo, it will not show anything into 
endpoint, or endpoint_details fields.


In this situation, Astyanax client will throw exception with 
/NoAvailableHostsException/. I have used following configuration:


withAstyanaxConfiguration(new  AstyanaxConfigurationImpl()   
.setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE)

.setConnectionPoolType(ConnectionPoolType.TOKEN_AWARE)


First option did not worked at all.

2. I've dropped the keyspace crated with cqlsh and re-created with 
cassandra-cli. This time, the nodetool describering foo, shows 
information into endpoint and endpoint_details columns, and also the 
Astyanax client works properly.


Hope it will avoid others to avoid spending time to figure out how to go 
around this issue.


Br,
Gabi


Re: Issues with CQLSH in Cassandra 1.2

2013-01-24 Thread Ivan Velykorodnyy
Hi,

Astyanax is not 1.2 compatible yet
https://github.com/Netflix/astyanax/issues/191
https://github.com/Netflix/astyanax/issues/191Eran planned to make it
in 1.57.x

четверг, 24 января 2013 г. пользователь Gabriel Ciuloaica писал:

  Hi,

 I have spent half of the day today trying to make a new Cassandra cluster
 to work. I have setup a single data center cluster, using
 NetworkTopologyStrategy, DC1:3.
 I'm using latest version of Astyanax client to connect. After many hours
 of debug, I found out that the problem may be in cqlsh utility.

 So, after the cluster was up and running:
 [me@cassandra-node1 cassandra]$ nodetool status
 Datacenter: DC-1
 ==
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns (effective)  Host
 ID   Rack
 UN  10.11.1.109   59.1 KB256 0.0%
 726689df-edc3-49a0-b680-370953994a8c  RAC2
 UN  10.11.1.108   67.49 KB   256 0.0%
 73cd86a9-4efb-4407-9fe8-9a1b3a277af7  RAC1
 UN  10.11.1.200   59.84 KB   64  0.0%
 d6d700d4-28aa-4722-b215-a6a7d304b8e7  RAC3

 I went to create the keyspace:
 1. First I have tried using cqlsh:
 create keyspace foo with replication=
 {'class':'NetworkTopologyStrategy','DC1':3};

 after this, I have checked that the keyspace was properly created by
 running

 cqlsh select * from system.schema_keyspaces;
  keyspace_name | durable_writes |
 strategy_class   | strategy_options

 ---++--+
system_auth |   True |
 org.apache.cassandra.locator.SimpleStrategy | {replication_factor:1}
   foo |   True |
 org.apache.cassandra.locator.NetworkTopologyStrategy | {dc1:3}
 system |   True |
 org.apache.cassandra.locator.LocalStrategy | {}
  system_traces |   True |
 org.apache.cassandra.locator.SimpleStrategy | {replication_factor:1}

 but if I run nodetool describering foo, it will not show anything into
 endpoint, or endpoint_details fields.

 In this situation, Astyanax client will throw exception with *
 NoAvailableHostsException*. I have used following configuration:

 withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
 .setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE)
 .setConnectionPoolType(ConnectionPoolType.TOKEN_AWARE)


 First option did not worked at all.

 2. I've dropped the keyspace crated with cqlsh and re-created with
 cassandra-cli. This time, the nodetool describering foo, shows information
 into endpoint and endpoint_details columns, and also the Astyanax client
 works properly.

 Hope it will avoid others to avoid spending time to figure out how to go
 around this issue.

 Br,
 Gabi



Re: Issues with CQLSH in Cassandra 1.2

2013-01-24 Thread Gabriel Ciuloaica
I do not think that  it has anything to do with Astyanax, but after I 
have recreated the keyspace with cassandra-cli, everything is working fine.
Also, I have mention below that not even nodetool describering foo, 
did not showed correct information for the tokens, encoding_details, if 
the keyspace was created with cqlsh.


Thanks,
Gabi

On 1/24/13 9:21 PM, Ivan Velykorodnyy wrote:

Hi,

Astyanax is not 1.2 compatible yet 
https://github.com/Netflix/astyanax/issues/191

Eran planned to make it in 1.57.x

четверг, 24 января 2013 г. пользователь Gabriel Ciuloaica писал:

Hi,

I have spent half of the day today trying to make a new Cassandra
cluster to work. I have setup a single data center cluster, using
NetworkTopologyStrategy, DC1:3.
I'm using latest version of Astyanax client to connect. After many
hours of debug, I found out that the problem may be in cqlsh utility.

So, after the cluster was up and running:
[me@cassandra-node1 cassandra]$ nodetool status
Datacenter: DC-1
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns (effective)  Host ID
Rack
UN  10.11.1.109   59.1 KB256 0.0%
726689df-edc3-49a0-b680-370953994a8c  RAC2
UN  10.11.1.108   67.49 KB   256 0.0%
73cd86a9-4efb-4407-9fe8-9a1b3a277af7  RAC1
UN  10.11.1.200   59.84 KB   64 0.0%
d6d700d4-28aa-4722-b215-a6a7d304b8e7  RAC3

I went to create the keyspace:
1. First I have tried using cqlsh:
create keyspace foo with replication=
{'class':'NetworkTopologyStrategy','DC1':3};

after this, I have checked that the keyspace was properly created
by running

cqlsh select * from system.schema_keyspaces;
 keyspace_name | durable_writes |
strategy_class   |
strategy_options

---++--+
   system_auth |   True |
org.apache.cassandra.locator.SimpleStrategy |
{replication_factor:1}
  foo |   True |
org.apache.cassandra.locator.NetworkTopologyStrategy |
{dc1:3}

system |   True |
org.apache.cassandra.locator.LocalStrategy
| {}
 system_traces |   True |
org.apache.cassandra.locator.SimpleStrategy |
{replication_factor:1}

but if I run nodetool describering foo, it will not show anything
into endpoint, or endpoint_details fields.

In this situation, Astyanax client will throw exception with
/NoAvailableHostsException/. I have used following configuration:

withAstyanaxConfiguration(new  AstyanaxConfigurationImpl()   
 .setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE)

 .setConnectionPoolType(ConnectionPoolType.TOKEN_AWARE)


First option did not worked at all.

2. I've dropped the keyspace crated with cqlsh and re-created with
cassandra-cli. This time, the nodetool describering foo, shows
information into endpoint and endpoint_details columns, and also
the Astyanax client works properly.

Hope it will avoid others to avoid spending time to figure out how
to go around this issue.

Br,
Gabi





Re: Perfroming simple CQL Query using pyhton db-api 2.0 fails

2013-01-24 Thread Paul van Hoven
The reason for the error was that I opened the connection to the database wrong.

I did:
con = cql.connect(host, port, keyspace)

but correct is:
con = cql.connect(host, port, keyspace, cql_version='3.0.0')

Now it works fine. Thanks for reading.

2013/1/24 aaron morton aa...@thelastpickle.com:
 How did you create the table?

 Anyways that looks like a bug, I *think* they should go here
 http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/issues/list

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 24/01/2013, at 7:14 AM, Paul van Hoven paul.van.ho...@googlemail.com
 wrote:

 I try to access my local cassandra database via python. Therefore I
 installed db-api 2.0 and thrift for accessing the database. Opening
 and closing a connection works fine. But a simply query is not
 working:

 The script looks like this:

c = conn.cursor()
c.execute(select * from users;)
data = c.fetchall()
print Query: select * from users; returned the following result:
print str(data)


 The table users looks like this:
 qlsh:demodb select * from users;

 user_name | birth_year | gender | password | session_token | state
 ---+++--+---+---
jsmith |   null |   null |   secret |  null |  null



 But when I try to execute it I get the following error:
 Open connection to localhost:9160 on keyspace demodb
 Traceback (most recent call last):
  File
 /Users/Tom/Freelancing/Company/Python/ApacheCassandra/src/CassandraDemo.py,
 line 56, in module
perfromSimpleCQLQuery()
  File
 /Users/Tom/Freelancing/Company/Python/ApacheCassandra/src/CassandraDemo.py,
 line 46, in perfromSimpleCQLQuery
c.execute(select * from users;)
  File /Library/Python/2.7/site-packages/cql/cursor.py, line 81, in execute
return self.process_execution_results(response, decoder=decoder)
  File /Library/Python/2.7/site-packages/cql/thrifteries.py, line
 116, in process_execution_results
self.get_metadata_info(self.result[0])
  File /Library/Python/2.7/site-packages/cql/cursor.py, line 97, in
 get_metadata_info
name, nbytes, vtype, ctype = self.get_column_metadata(colid)
  File /Library/Python/2.7/site-packages/cql/cursor.py, line 104, in
 get_column_metadata
return self.decoder.decode_metadata_and_type(column_id)
  File /Library/Python/2.7/site-packages/cql/decoders.py, line 45,
 in decode_metadata_and_type
name = self.name_decode_error(e, namebytes,
 comptype.cql_parameterized_type())
  File /Library/Python/2.7/site-packages/cql/decoders.py, line 29,
 in name_decode_error
% (namebytes, expectedtype, err))
 cql.apivalues.ProgrammingError: column name '\x00\x00\x00' can't be
 deserialized as 'org.apache.cassandra.db.marshal.CompositeType':
 global name 'self' is not defined

 I'm not shure if this is the right place to ask for: But am I doing
 here something wrong?




Re: Issues with CQLSH in Cassandra 1.2

2013-01-24 Thread Tyler Hobbs
Gabriel,

It looks like you used DC1 for the datacenter name in your replication
strategy options, while the actual datacenter name was DC-1 (based on the
nodetool status output).  Perhaps that was causing the problem?


On Thu, Jan 24, 2013 at 1:57 PM, Gabriel Ciuloaica gciuloa...@gmail.comwrote:

  I do not think that  it has anything to do with Astyanax, but after I
 have recreated the keyspace with cassandra-cli, everything is working fine.
 Also, I have mention below that not even nodetool describering foo, did
 not showed correct information for the tokens, encoding_details, if the
 keyspace was created with cqlsh.

 Thanks,
 Gabi


 On 1/24/13 9:21 PM, Ivan Velykorodnyy wrote:

 Hi,

  Astyanax is not 1.2 compatible yet
 https://github.com/Netflix/astyanax/issues/191
 Eran planned to make it in 1.57.x

 четверг, 24 января 2013 г. пользователь Gabriel Ciuloaica писал:

  Hi,

 I have spent half of the day today trying to make a new Cassandra cluster
 to work. I have setup a single data center cluster, using
 NetworkTopologyStrategy, DC1:3.
 I'm using latest version of Astyanax client to connect. After many hours
 of debug, I found out that the problem may be in cqlsh utility.

 So, after the cluster was up and running:
 [me@cassandra-node1 cassandra]$ nodetool status
 Datacenter: DC-1
 ==
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns (effective)  Host
 ID   Rack
 UN  10.11.1.109   59.1 KB256 0.0%
 726689df-edc3-49a0-b680-370953994a8c  RAC2
 UN  10.11.1.108   67.49 KB   256 0.0%
 73cd86a9-4efb-4407-9fe8-9a1b3a277af7  RAC1
 UN  10.11.1.200   59.84 KB   64  0.0%
 d6d700d4-28aa-4722-b215-a6a7d304b8e7  RAC3

 I went to create the keyspace:
 1. First I have tried using cqlsh:
 create keyspace foo with replication=
 {'class':'NetworkTopologyStrategy','DC1':3};

 after this, I have checked that the keyspace was properly created by
 running

 cqlsh select * from system.schema_keyspaces;
  keyspace_name | durable_writes |
 strategy_class   | strategy_options

 ---++--+
system_auth |   True |
 org.apache.cassandra.locator.SimpleStrategy | {replication_factor:1}
   foo |   True |
 org.apache.cassandra.locator.NetworkTopologyStrategy | {dc1:3}
 system |   True |
 org.apache.cassandra.locator.LocalStrategy | {}
  system_traces |   True |
 org.apache.cassandra.locator.SimpleStrategy | {replication_factor:1}

 but if I run nodetool describering foo, it will not show anything into
 endpoint, or endpoint_details fields.

 In this situation, Astyanax client will throw exception with *
 NoAvailableHostsException*. I have used following configuration:

 withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
 .setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE)
 .setConnectionPoolType(ConnectionPoolType.TOKEN_AWARE)


 First option did not worked at all.

 2. I've dropped the keyspace crated with cqlsh and re-created with
 cassandra-cli. This time, the nodetool describering foo, shows information
 into endpoint and endpoint_details columns, and also the Astyanax client
 works properly.

 Hope it will avoid others to avoid spending time to figure out how to go
 around this issue.

 Br,
 Gabi





-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Issues with CQLSH in Cassandra 1.2

2013-01-24 Thread Gabriel Ciuloaica

Hi Tyler,

No, it was just a typo in the email, I changed names of DC in the email 
after copy/paste from output of the tools.
It is quite easy to reproduce (assuming you have a correct configuration 
for NetworkTopologyStrategy, with vNodes(default, 256)):


1. launch cqlsh and create the keyspace

create keyspace foo with replication= 
{'class':'NetworkTopologyStrategy','DC1':3};


2. exit cqlsh, run

nodetool describering foo

you'll see something like this:

TokenRange(start_token:2318224911779291128, 
end_token:2351629206880900296, endpoints:[], rpc_endpoints:[], 
endpoint_details:[])
TokenRange(start_token:-8291638263612363845, 
end_token:-8224756763869823639, endpoints:[], rpc_endpoints:[], 
endpoint_details:[])


3. start  cqlsh,

drop keyspace foo;

4. Exit cqlsh, start cassandra-cli
create keyspace foo with placement_strategy = 'NetworkTopologyStrategy' 
AND strategy_options={DC1};


if you run nodetool describering foo you'll see:

TokenRange(start_token:2318224911779291128, 
end_token:2351629206880900296, endpoints:[10.11.1.200, 10.11.1.109, 
10.11.1.108], rpc_endpoints:[10.11.1.200, 10.11.1.109, 10.11.1.108], 
endpoint_details:[EndpointDetails(host:10.11.1.200, datacenter:DC1, 
rack:RAC3), EndpointDetails(host:10.11.1.109, datacenter:DC1, 
rack:RAC2), EndpointDetails(host:10.11.1.108, datacenter:DC1, rack:RAC1)])
TokenRange(start_token:-8291638263612363845, 
end_token:-8224756763869823639, endpoints:[10.11.1.200, 10.11.1.109, 
10.11.1.108], rpc_endpoints:[10.11.1.200, 10.11.1.109, 10.11.1.108], 
endpoint_details:[EndpointDetails(host:10.11.1.200, datacenter:DC1, 
rack:RAC3), EndpointDetails(host:10.11.1.109, datacenter:DC1, 
rack:RAC2), EndpointDetails(host:10.11.1.108, datacenter:DC1, rack:RAC1)])


Br,
Gabi


On 1/24/13 10:22 PM, Tyler Hobbs wrote:

Gabriel,

It looks like you used DC1 for the datacenter name in your 
replication strategy options, while the actual datacenter name was 
DC-1 (based on the nodetool status output).  Perhaps that was 
causing the problem?



On Thu, Jan 24, 2013 at 1:57 PM, Gabriel Ciuloaica 
gciuloa...@gmail.com mailto:gciuloa...@gmail.com wrote:


I do not think that  it has anything to do with Astyanax, but
after I have recreated the keyspace with cassandra-cli, everything
is working fine.
Also, I have mention below that not even nodetool describering
foo, did not showed correct information for the tokens,
encoding_details, if the keyspace was created with cqlsh.

Thanks,
Gabi


On 1/24/13 9:21 PM, Ivan Velykorodnyy wrote:

Hi,

Astyanax is not 1.2 compatible yet
https://github.com/Netflix/astyanax/issues/191
Eran planned to make it in 1.57.x

четверг, 24 января 2013 г. пользователь Gabriel Ciuloaica писал:

Hi,

I have spent half of the day today trying to make a new
Cassandra cluster to work. I have setup a single data center
cluster, using NetworkTopologyStrategy, DC1:3.
I'm using latest version of Astyanax client to connect. After
many hours of debug, I found out that the problem may be in
cqlsh utility.

So, after the cluster was up and running:
[me@cassandra-node1 cassandra]$ nodetool status
Datacenter: DC-1
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens Owns (effective) 
Host ID   Rack

UN  10.11.1.109   59.1 KB256 0.0%
726689df-edc3-49a0-b680-370953994a8c RAC2
UN  10.11.1.108   67.49 KB   256 0.0%
73cd86a9-4efb-4407-9fe8-9a1b3a277af7 RAC1
UN  10.11.1.200   59.84 KB   64 0.0%
d6d700d4-28aa-4722-b215-a6a7d304b8e7 RAC3

I went to create the keyspace:
1. First I have tried using cqlsh:
create keyspace foo with replication=
{'class':'NetworkTopologyStrategy','DC1':3};

after this, I have checked that the keyspace was properly
created by running

cqlsh select * from system.schema_keyspaces;
 keyspace_name | durable_writes | strategy_class |
strategy_options

---++--+
   system_auth |   True |
org.apache.cassandra.locator.SimpleStrategy |
{replication_factor:1}
  foo |   True |
org.apache.cassandra.locator.NetworkTopologyStrategy
| {dc1:3}
system |   True |
org.apache.cassandra.locator.LocalStrategy
| {}
 system_traces |   True |
org.apache.cassandra.locator.SimpleStrategy |
{replication_factor:1}

but if I run nodetool describering foo, it will not show
anything into endpoint, or endpoint_details fields.

In this situation, Astyanax client will 

Re: trouble setting up initial cluster: Host ID collision between active endpoint

2013-01-24 Thread Ben Bromhead
Hi Tim

If you want to check out Cassandra on AWS you should also have a look
www.instaclustr.com.

We are still very much in Beta (so if you come across anything, please let
us know), but if you have a few minutes and want to deploy a cluster in
just a few clicks I highly recommend trying Instaclustr out.

Cheers

Ben Bromhead
*Instaclustr*

On Fri, Jan 25, 2013 at 12:35 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Cool Thanks for the advice Aaron. I actually did get this working before I
 read your reply. The trick apparently for me was to use the IP for the
 first node in the seeds setting of each successive node. But I like the
 idea of using larges for an hour or so and terminating them for some basic
 experimentation.  Also, thanks for pointing me to the Datastax AMIs I'll be
 sure to check them out.

 Tim


 On Thu, Jan 24, 2013 at 3:45 AM, aaron morton aa...@thelastpickle.comwrote:

 They both have 0 for their token, and this is stored in their System
 keyspace.
 Scrub them and start again.

 But I found that the tokens that were being generated would require way
 too much memory

 Token assignments have nothing to do with memory usage.

  m1.micro instances

 You are better off using your laptop than micro instances.
 For playing around try m1.large and terminate them when not in use.
 To make life easier use this to make the cluster for you
 http://www.datastax.com/docs/1.2/install/install_ami

 Cheers

-
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 24/01/2013, at 5:17 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Hello list,

  I really do appreciate the advice I've gotten here as I start building
 familiarity with Cassandra. Aside from the single node instance I setup for
 a developer friend, I've just been playing with a single node in a VM on my
 laptop and playing around with the cassandra-cli and PHP.

 Well I've decided to setup my first cluster on my amazon ec2 account and
 I'm running into an issue getting the nodes to gossip.

 I've set the IP's of 'node01' and 'node02' ec2 instances in their
 respective listen_address, rpc_address and made sure that the
 'cluster_name' on both was in agreement.

  I believe the problem may be in one of two places: either the seeds or
 the initial_token setting.

 For the seeds I have it setup as such. I put the IPs for both machines in
 the 'seeds' settings for each, thinking this would be how each node would
 discover each other:

  - seeds: 10.xxx.xxx.248,10.xxx.xxx.123

 Initially I tried the tokengen script that I found in the documentation.
 But I found that the tokens that were being generated would require way too
 much memory for the m1.micro instances that I'm experimenting with on the
 Amazon free tier. And according to the docs in the config it is in some
 cases ok to leave that field blank. So that's what I did on both instances.

 Not sure how much/if this matters but I am using the setting -
 endpoint_snitch: Ec2Snitch

 Finally, when I start up the first node all goes well.

 But when I startup the second node I see this exception on both hosts:

 node1

 INFO 11:02:32,231 Listening for thrift clients...
  INFO 11:02:59,262 Node /10.xxx.xxx.123 is now part of the cluster
  INFO 11:02:59,268 InetAddress /10.xxx.xxx.123 is now UP
 ERROR 11:02:59,270 Exception in thread Thread[GossipStage:1,5,main]
 java.lang.RuntimeException: Host ID collision between active endpoint
 /10..xxx.248 and /10.xxx.xxx.123
 (id=54ce7ccd-1b1d-418e-9861-1c281c078b8f)
 at
 org.apache.cassandra.locator.TokenMetadata.updateHostId(TokenMetadata.java:227)
 at
 org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1296)
 at
 org.apache.cassandra.service.StorageService.onChange(StorageService.java:1157)
 at
 org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1895)
 at
 org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:805)
 at
 org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:883)
 at
 org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:43)
 at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)
 at java.lang.Thread.run(Unknown Source)

 And on node02 I see:

  INFO 11:02:58,817 Starting Messaging Service on port 7000
  INFO 11:02:58,835 Using saved token [0]
  INFO 11:02:58,837 Enqueuing flush of Memtable-local@672636645(84/84
 serialized/live bytes, 4 ops)
  INFO 11:02:58,838 Writing Memtable-local@672636645(84/84
 serialized/live bytes, 4 ops)
  INFO 11:02:58,912 Completed flushing
 /var/lib/cassandra/data/system/local/system-local-ia-43-Data.db (120 bytes)
 for commitlog position 

CQL3 and clients for new Cluster

2013-01-24 Thread Matthew Langton
Hi all,

I started looking at Cassandra awhile ago and got used to the Thrift API. I
put it on the back burner for awhile though until now. To get back up to
speed I have read a lot of documentation at the DataStax website, and it
appears that the Thrift API is no longer considered the ideal way to
interface with Cassandra.

So my questions are these:

What is the future of the Thrift API, should I just ignore it going forward
and use CQL?

If CQL is the preferred way to interface with Cassandra, does using any of
the clients listed here:
http://wiki.apache.org/cassandra/ClientOptionsprovide me any benefits
over using a JDBC like the one listed here
http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/

Thanks,
Matt


Re: CQL3 and clients for new Cluster

2013-01-24 Thread Hiller, Dean
Some of the Mapping libraries can help translate into objects and not have sooo 
much DAO code

PlayOrm has a whole feature list of things that can be helpful.  I am sure 
other high level clients have stuff as well that can speed up development time.
https://github.com/deanhiller/playorm#playorm-feature-list

Dean

From: Matthew Langton mjla...@gmail.commailto:mjla...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Thursday, January 24, 2013 4:35 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: CQL3 and clients for new Cluster

Hi all,

I started looking at Cassandra awhile ago and got used to the Thrift API. I put 
it on the back burner for awhile though until now. To get back up to speed I 
have read a lot of documentation at the DataStax website, and it appears that 
the Thrift API is no longer considered the ideal way to interface with 
Cassandra.

So my questions are these:

What is the future of the Thrift API, should I just ignore it going forward and 
use CQL?

If CQL is the preferred way to interface with Cassandra, does using any of the 
clients listed here: http://wiki.apache.org/cassandra/ClientOptions provide me 
any benefits over using a JDBC like the one listed here 
http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/

Thanks,
Matt




Re: CQL3 and clients for new Cluster

2013-01-24 Thread Aaron Turner
Either CQL or a higher level API running on top of Thrift like
Hector/Asyntax/etc.

Thrift is uh... painful.

On Thu, Jan 24, 2013 at 3:35 PM, Matthew Langton mjla...@gmail.com wrote:
 Hi all,

 I started looking at Cassandra awhile ago and got used to the Thrift API. I
 put it on the back burner for awhile though until now. To get back up to
 speed I have read a lot of documentation at the DataStax website, and it
 appears that the Thrift API is no longer considered the ideal way to
 interface with Cassandra.

 So my questions are these:

 What is the future of the Thrift API, should I just ignore it going forward
 and use CQL?

 If CQL is the preferred way to interface with Cassandra, does using any of
 the clients listed here: http://wiki.apache.org/cassandra/ClientOptions
 provide me any benefits over using a JDBC like the one listed here
 http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/

 Thanks,
 Matt





-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero


Re: CQL3 and clients for new Cluster

2013-01-24 Thread Peter Lin
I use both Thrift and CQL.

my bias take is use CQL for select queries and thrift for
insert/update. I like being able to insert exactly the data type I
want for the column name and value. CQL is more user friendly, but it
lacks the flexibility of thrift in terms of using different data types
for column names and values.

the SQL metaphor only goes so far. My bias opinion, you're not getting
the most out of Cassandra if you're only using CQL.



On Thu, Jan 24, 2013 at 6:38 PM, Aaron Turner synfina...@gmail.com wrote:
 Either CQL or a higher level API running on top of Thrift like
 Hector/Asyntax/etc.

 Thrift is uh... painful.

 On Thu, Jan 24, 2013 at 3:35 PM, Matthew Langton mjla...@gmail.com wrote:
 Hi all,

 I started looking at Cassandra awhile ago and got used to the Thrift API. I
 put it on the back burner for awhile though until now. To get back up to
 speed I have read a lot of documentation at the DataStax website, and it
 appears that the Thrift API is no longer considered the ideal way to
 interface with Cassandra.

 So my questions are these:

 What is the future of the Thrift API, should I just ignore it going forward
 and use CQL?

 If CQL is the preferred way to interface with Cassandra, does using any of
 the clients listed here: http://wiki.apache.org/cassandra/ClientOptions
 provide me any benefits over using a JDBC like the one listed here
 http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/

 Thanks,
 Matt





 --
 Aaron Turner
 http://synfin.net/ Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
 -- Benjamin Franklin
 carpe diem quam minimum credula postero


Does setstreamthroughput also throttle the network traffic caused by nodetool repair?

2013-01-24 Thread Wei Zhu
In the yaml, it has the following setting

# Throttles all outbound streaming file transfers on this node to the
# given total throughput in Mbps. This is necessary because Cassandra does
# mostly sequential IO when streaming data during bootstrap or repair, which
# can lead to saturating the network connection and degrading rpc performance.
# When unset, the default is 400 Mbps or 50 MB/s.
# stream_throughput_outbound_megabits_per_sec: 400

Is this the same value as if I call

Nodetool setstreamthroughput 

Should I call it to all the nodes on the cluster? Will that throttle the 
network traffic caused by nodetool repair?

Thanks.
-Wei

Re: Issues with CQLSH in Cassandra 1.2

2013-01-24 Thread aaron morton
Can you provide details of the snitch configuration and the number of nodes you 
have? 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 25/01/2013, at 9:39 AM, Gabriel Ciuloaica gciuloa...@gmail.com wrote:

 Hi Tyler,
 
 No, it was just a typo in the email, I changed names of DC in the email after 
 copy/paste from output of the tools.
 It is quite easy to reproduce (assuming you have a correct configuration for 
 NetworkTopologyStrategy, with vNodes(default, 256)):
 
 1. launch cqlsh and create the keyspace
 
 create keyspace foo with replication= 
 {'class':'NetworkTopologyStrategy','DC1':3};
 
 2. exit cqlsh, run
 
 nodetool describering foo
 
 you'll see something like this:
 
 TokenRange(start_token:2318224911779291128, end_token:2351629206880900296, 
 endpoints:[], rpc_endpoints:[], endpoint_details:[])
 TokenRange(start_token:-8291638263612363845, end_token:-8224756763869823639, 
 endpoints:[], rpc_endpoints:[], endpoint_details:[])
 
 3. start  cqlsh, 
 
 drop keyspace foo;
 
 4. Exit cqlsh, start cassandra-cli
 create keyspace foo with placement_strategy = 'NetworkTopologyStrategy' AND 
 strategy_options={DC1};
 
 if you run nodetool describering foo you'll see:
 
 TokenRange(start_token:2318224911779291128, 
 end_token:2351629206880900296, endpoints:[10.11.1.200, 10.11.1.109, 
 10.11.1.108], rpc_endpoints:[10.11.1.200, 10.11.1.109, 10.11.1.108], 
 endpoint_details:[EndpointDetails(host:10.11.1.200, datacenter:DC1, 
 rack:RAC3), EndpointDetails(host:10.11.1.109, datacenter:DC1, rack:RAC2), 
 EndpointDetails(host:10.11.1.108, datacenter:DC1, rack:RAC1)])
 TokenRange(start_token:-8291638263612363845, 
 end_token:-8224756763869823639, endpoints:[10.11.1.200, 10.11.1.109, 
 10.11.1.108], rpc_endpoints:[10.11.1.200, 10.11.1.109, 10.11.1.108], 
 endpoint_details:[EndpointDetails(host:10.11.1.200, datacenter:DC1, 
 rack:RAC3), EndpointDetails(host:10.11.1.109, datacenter:DC1, rack:RAC2), 
 EndpointDetails(host:10.11.1.108, datacenter:DC1, rack:RAC1)])
 
 Br,
 Gabi
 
 
 On 1/24/13 10:22 PM, Tyler Hobbs wrote:
 Gabriel,
 
 It looks like you used DC1 for the datacenter name in your replication 
 strategy options, while the actual datacenter name was DC-1 (based on the 
 nodetool status output).  Perhaps that was causing the problem?
 
 
 On Thu, Jan 24, 2013 at 1:57 PM, Gabriel Ciuloaica gciuloa...@gmail.com 
 wrote:
 I do not think that  it has anything to do with Astyanax, but after I have 
 recreated the keyspace with cassandra-cli, everything is working fine.
 Also, I have mention below that not even nodetool describering foo, did 
 not showed correct information for the tokens, encoding_details, if the 
 keyspace was created with cqlsh.
 
 Thanks,
 Gabi
 
 
 On 1/24/13 9:21 PM, Ivan Velykorodnyy wrote:
 Hi,
 
 Astyanax is not 1.2 compatible yet 
 https://github.com/Netflix/astyanax/issues/191
 Eran planned to make it in 1.57.x
 
 четверг, 24 января 2013 г. пользователь Gabriel Ciuloaica писал:
 Hi,
 
 I have spent half of the day today trying to make a new Cassandra cluster 
 to work. I have setup a single data center cluster, using 
 NetworkTopologyStrategy, DC1:3.
 I'm using latest version of Astyanax client to connect. After many hours of 
 debug, I found out that the problem may be in cqlsh utility.
 
 So, after the cluster was up and running:
 [me@cassandra-node1 cassandra]$ nodetool status
 Datacenter: DC-1
 ==
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns (effective)  Host ID  
  Rack
 UN  10.11.1.109   59.1 KB256 0.0%  
 726689df-edc3-49a0-b680-370953994a8c  RAC2
 UN  10.11.1.108   67.49 KB   256 0.0%  
 73cd86a9-4efb-4407-9fe8-9a1b3a277af7  RAC1
 UN  10.11.1.200   59.84 KB   64  0.0%  
 d6d700d4-28aa-4722-b215-a6a7d304b8e7  RAC3
 
 I went to create the keyspace:
 1. First I have tried using cqlsh:
 create keyspace foo with replication= 
 {'class':'NetworkTopologyStrategy','DC1':3};
 
 after this, I have checked that the keyspace was properly created by 
 running 
 
 cqlsh select * from system.schema_keyspaces;
  keyspace_name | durable_writes | strategy_class
| strategy_options
 ---++--+
system_auth |   True |  
 org.apache.cassandra.locator.SimpleStrategy | {replication_factor:1}
   foo |   True | 
 org.apache.cassandra.locator.NetworkTopologyStrategy | {dc1:3}
 system |   True |   
 org.apache.cassandra.locator.LocalStrategy | {}
  system_traces |   True |  
 org.apache.cassandra.locator.SimpleStrategy | {replication_factor:1}
 
 but if I run nodetool describering foo, it will not show anything 

Re: Issues with CQLSH in Cassandra 1.2

2013-01-24 Thread Gabriel Ciuloaica

Hi Aaron,

I'm using PropertyFileSnitch, an my cassandra-topology.propertis looks 
like this:


/# Cassandra Node IP=Data Center:Rack//
//
//# default for unknown nodes//
//default=DC1:RAC1//
//
//# all known nodes//
//  10.11.1.108=DC1:RAC1//
//  10.11.1.109=DC1:RAC2//
//  10.11.1.200=DC1:RAC3

/Cheers,
Gabi/

/


On 1/25/13 4:38 AM, aaron morton wrote:
Can you provide details of the snitch configuration and the number of 
nodes you have?


Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 25/01/2013, at 9:39 AM, Gabriel Ciuloaica gciuloa...@gmail.com 
mailto:gciuloa...@gmail.com wrote:



Hi Tyler,

No, it was just a typo in the email, I changed names of DC in the 
email after copy/paste from output of the tools.
It is quite easy to reproduce (assuming you have a correct 
configuration for NetworkTopologyStrategy, with vNodes(default, 256)):


1. launch cqlsh and create the keyspace

create keyspace foo with replication= 
{'class':'NetworkTopologyStrategy','DC1':3};


2. exit cqlsh, run

nodetool describering foo

you'll see something like this:

TokenRange(start_token:2318224911779291128, 
end_token:2351629206880900296, endpoints:[], rpc_endpoints:[], 
endpoint_details:[])
TokenRange(start_token:-8291638263612363845, 
end_token:-8224756763869823639, endpoints:[], rpc_endpoints:[], 
endpoint_details:[])


3. start  cqlsh,

drop keyspace foo;

4. Exit cqlsh, start cassandra-cli
create keyspace foo with placement_strategy = 
'NetworkTopologyStrategy' AND strategy_options={DC1};


if you run nodetool describering foo you'll see:

TokenRange(start_token:2318224911779291128, 
end_token:2351629206880900296, endpoints:[10.11.1.200, 10.11.1.109, 
10.11.1.108], rpc_endpoints:[10.11.1.200, 10.11.1.109, 10.11.1.108], 
endpoint_details:[EndpointDetails(host:10.11.1.200, datacenter:DC1, 
rack:RAC3), EndpointDetails(host:10.11.1.109, datacenter:DC1, 
rack:RAC2), EndpointDetails(host:10.11.1.108, datacenter:DC1, 
rack:RAC1)])
TokenRange(start_token:-8291638263612363845, 
end_token:-8224756763869823639, endpoints:[10.11.1.200, 10.11.1.109, 
10.11.1.108], rpc_endpoints:[10.11.1.200, 10.11.1.109, 10.11.1.108], 
endpoint_details:[EndpointDetails(host:10.11.1.200, datacenter:DC1, 
rack:RAC3), EndpointDetails(host:10.11.1.109, datacenter:DC1, 
rack:RAC2), EndpointDetails(host:10.11.1.108, datacenter:DC1, 
rack:RAC1)])


Br,
Gabi


On 1/24/13 10:22 PM, Tyler Hobbs wrote:

Gabriel,

It looks like you used DC1 for the datacenter name in your 
replication strategy options, while the actual datacenter name was 
DC-1 (based on the nodetool status output).  Perhaps that was 
causing the problem?



On Thu, Jan 24, 2013 at 1:57 PM, Gabriel Ciuloaica 
gciuloa...@gmail.com mailto:gciuloa...@gmail.com wrote:


I do not think that  it has anything to do with Astyanax, but
after I have recreated the keyspace with cassandra-cli,
everything is working fine.
Also, I have mention below that not even nodetool describering
foo, did not showed correct information for the tokens,
encoding_details, if the keyspace was created with cqlsh.

Thanks,
Gabi


On 1/24/13 9:21 PM, Ivan Velykorodnyy wrote:

Hi,

Astyanax is not 1.2 compatible yet
https://github.com/Netflix/astyanax/issues/191
Eran planned to make it in 1.57.x

четверг, 24 января 2013 г. пользователь Gabriel Ciuloaica писал:

Hi,

I have spent half of the day today trying to make a new
Cassandra cluster to work. I have setup a single data
center cluster, using NetworkTopologyStrategy, DC1:3.
I'm using latest version of Astyanax client to connect.
After many hours of debug, I found out that the problem may
be in cqlsh utility.

So, after the cluster was up and running:
[me@cassandra-node1 cassandra]$ nodetool status
Datacenter: DC-1
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns (effective)  Host ID Rack
UN  10.11.1.109   59.1 KB256 0.0%
726689df-edc3-49a0-b680-370953994a8c RAC2
UN  10.11.1.108   67.49 KB   256 0.0%
73cd86a9-4efb-4407-9fe8-9a1b3a277af7 RAC1
UN  10.11.1.200   59.84 KB   64  0.0%
d6d700d4-28aa-4722-b215-a6a7d304b8e7 RAC3

I went to create the keyspace:
1. First I have tried using cqlsh:
create keyspace foo with replication=
{'class':'NetworkTopologyStrategy','DC1':3};

after this, I have checked that the keyspace was properly
created by running

cqlsh select * from system.schema_keyspaces;
 keyspace_name | durable_writes | strategy_class |
strategy_options

---++--+
   system_auth | True |

Re: Cassandra pending compaction tasks keeps increasing

2013-01-24 Thread Wei Zhu
Do you mean 90% of the reads should come from 1 SSTable? 

By the way, after I finished the data migrating, I ran nodetool repair -pr on 
one of the nodes. Before nodetool repair, all the nodes have the same disk 
space usage. After I ran the nodetool repair, the disk space for that node 
jumped from 135G to 220G, also there are more than 15000 pending compaction 
tasks. After a while , Cassandra started to throw the exception like below and 
stop compacting. I had to restart the node. By the way, we are using 1.1.7. 
Something doesn't seem right.


 INFO [CompactionExecutor:108804] 2013-01-24 22:23:10,427 CompactionTask.java 
(line 109) Compacting 
[SSTableReader(path='/ssd/cassandra/data/zoosk/friends/zoosk-friends-hf-753782-Data.db')]
 INFO [CompactionExecutor:108804] 2013-01-24 22:23:11,610 CompactionTask.java 
(line 221) Compacted to 
[/ssd/cassandra/data/zoosk/friends/zoosk-friends-hf-754996-Data.db,].  
5,259,403 to 5,259,403 (~100% of original) bytes for 1,983 keys at 
4.268730MB/s.  Time: 1,175ms.
 INFO [CompactionExecutor:108805] 2013-01-24 22:23:11,617 CompactionTask.java 
(line 109) Compacting 
[SSTableReader(path='/ssd/cassandra/data/zoosk/friends/zoosk-friends-hf-754880-Data.db')]
 INFO [CompactionExecutor:108805] 2013-01-24 22:23:12,828 CompactionTask.java 
(line 221) Compacted to 
[/ssd/cassandra/data/zoosk/friends/zoosk-friends-hf-754997-Data.db,].  
5,272,746 to 5,272,746 (~100% of original) bytes for 1,941 keys at 
4.152339MB/s.  Time: 1,211ms.
ERROR [CompactionExecutor:108806] 2013-01-24 22:23:13,048 
AbstractCassandraDaemon.java (line 135) Exception in thread 
Thread[CompactionExecutor:108806,1,main]
java.lang.StackOverflowError
at java.util.AbstractList$Itr.hasNext(Unknown Source)
at com.google.common.collect.Iterators$5.hasNext(Iterators.java:517)
at com.google.common.collect.Iterators$3.hasNext(Iterators.java:114)
at com.google.common.collect.Iterators$5.hasNext(Iterators.java:517)
at com.google.common.collect.Iterators$3.hasNext(Iterators.java:114)
at com.google.common.collect.Iterators$5.hasNext(Iterators.java:517)
at com.google.common.collect.Iterators$3.hasNext(Iterators.java:114)
at com.google.common.collect.Iterators$5.hasNext(Iterators.java:517)
at com.google.common.collect.Iterators$3.hasNext(Iterators.java:114)


- Original Message -
From: aaron morton aa...@thelastpickle.com
To: user@cassandra.apache.org
Sent: Wednesday, January 23, 2013 2:40:45 PM
Subject: Re: Cassandra pending compaction tasks keeps increasing

The histogram does not look right to me, too many SSTables for an LCS CF. 


It's a symptom no a cause. If LCS is catching up though it should be more like 
the distribution in the linked article. 


Cheers 








- 
Aaron Morton 
Freelance Cassandra Developer 
New Zealand 


@aaronmorton 
http://www.thelastpickle.com 


On 23/01/2013, at 10:57 AM, Jim Cistaro  jcist...@netflix.com  wrote: 




What version are you using? Are you seeing any compaction related assertions in 
the logs? 


Might be https://issues.apache.org/jira/browse/CASSANDRA-4411 


We experienced this problem of the count only decreasing to a certain number 
and then stopping. If you are idle, it should go to 0. I have not seen it 
overestimate for zero, only for non-zero amounts. 


As for timeouts etc, you will need to look at things like nodetool tpstats to 
see if you have pending transactions queueing up. 


Jc 


From: Wei Zhu  wz1...@yahoo.com  
Reply-To:  user@cassandra.apache.org   user@cassandra.apache.org , Wei Zhu 
 wz1...@yahoo.com  
Date: Tuesday, January 22, 2013 12:56 PM 
To:  user@cassandra.apache.org   user@cassandra.apache.org  
Subject: Re: Cassandra pending compaction tasks keeps increasing 






Thanks Aaron and Jim for your reply. The data import is done. We have about 
135G on each node and it's about 28K SStables. For normal operation, we only 
have about 90 writes per seconds, but when I ran nodetool compationstats, it 
remains at 9 and hardly changes. I guess it's just an estimated number. 


When I ran histogram, 



Offset SSTables Write Latency Read Latency Row Size Column Count 
1 2644 0 0 0 18660057 
2 8204 0 0 0 9824270 
3 11198 0 0 0 6968475 
4 4269 6 0 0 5510745 
5 517 29 0 0 4595205 




You can see about half of the reads result in 3 SSTables. Majority of read 
latency are under 5ms, only a dozen are over 10ms. We haven't fully turn on 
reads yet, only 60 reads per second. We see about 20 read timeout during the 
past 12 hours. Not a single warning from Cassandra Log. 


Is it normal for Cassandra to timeout some requests? We set rpc timeout to be 
1s, it shouldn't time out any of them? 


Thanks. 
-Wei 





From: aaron morton  aa...@thelastpickle.com  
To: user@cassandra.apache.org 
Sent: Monday, January 21, 2013 12:21 AM 
Subject: Re: Cassandra pending compaction tasks keeps increasing 



The main guarantee LCS gives you is that most reads will only touch 1 

Re: Cassandra pending compaction tasks keeps increasing

2013-01-24 Thread Derek Williams
Increasing the stack size in cassandra-env.sh should help you get past the
stack overflow. Doesn't help with your original problem though.


On Fri, Jan 25, 2013 at 12:00 AM, Wei Zhu wz1...@yahoo.com wrote:

 Well, even after restart, it throws the the same exception. I am basically
 stuck. Any suggestion to clear the pending compaction tasks? Below is the
 end of stack trace:

  at com.google.common.collect.Sets$1.iterator(Sets.java:578)
 at com.google.common.collect.Sets$1.iterator(Sets.java:578)
 at com.google.common.collect.Sets$1.iterator(Sets.java:578)
 at com.google.common.collect.Sets$1.iterator(Sets.java:578)
 at com.google.common.collect.Sets$3.iterator(Sets.java:667)
 at com.google.common.collect.Sets$3.size(Sets.java:670)
 at com.google.common.collect.Iterables.size(Iterables.java:80)
 at
 org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:557)
 at
 org.apache.cassandra.db.compaction.CompactionController.init(CompactionController.java:69)
 at
 org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:105)
 at
 org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
 at
 org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
 Source)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)
 at java.lang.Thread.run(Unknown Source)

 Any suggestion is very much appreciated

 -Wei

 - Original Message -
 From: Wei Zhu wz1...@yahoo.com
 To: user@cassandra.apache.org
 Sent: Thursday, January 24, 2013 10:55:07 PM
 Subject: Re: Cassandra pending compaction tasks keeps increasing

 Do you mean 90% of the reads should come from 1 SSTable?

 By the way, after I finished the data migrating, I ran nodetool repair -pr
 on one of the nodes. Before nodetool repair, all the nodes have the same
 disk space usage. After I ran the nodetool repair, the disk space for that
 node jumped from 135G to 220G, also there are more than 15000 pending
 compaction tasks. After a while , Cassandra started to throw the exception
 like below and stop compacting. I had to restart the node. By the way, we
 are using 1.1.7. Something doesn't seem right.


  INFO [CompactionExecutor:108804] 2013-01-24 22:23:10,427
 CompactionTask.java (line 109) Compacting
 [SSTableReader(path='/ssd/cassandra/data/zoosk/friends/zoosk-friends-hf-753782-Data.db')]
  INFO [CompactionExecutor:108804] 2013-01-24 22:23:11,610
 CompactionTask.java (line 221) Compacted to
 [/ssd/cassandra/data/zoosk/friends/zoosk-friends-hf-754996-Data.db,].
  5,259,403 to 5,259,403 (~100% of original) bytes for 1,983 keys at
 4.268730MB/s.  Time: 1,175ms.
  INFO [CompactionExecutor:108805] 2013-01-24 22:23:11,617
 CompactionTask.java (line 109) Compacting
 [SSTableReader(path='/ssd/cassandra/data/zoosk/friends/zoosk-friends-hf-754880-Data.db')]
  INFO [CompactionExecutor:108805] 2013-01-24 22:23:12,828
 CompactionTask.java (line 221) Compacted to
 [/ssd/cassandra/data/zoosk/friends/zoosk-friends-hf-754997-Data.db,].
  5,272,746 to 5,272,746 (~100% of original) bytes for 1,941 keys at
 4.152339MB/s.  Time: 1,211ms.
 ERROR [CompactionExecutor:108806] 2013-01-24 22:23:13,048
 AbstractCassandraDaemon.java (line 135) Exception in thread
 Thread[CompactionExecutor:108806,1,main]
 java.lang.StackOverflowError
 at java.util.AbstractList$Itr.hasNext(Unknown Source)
 at
 com.google.common.collect.Iterators$5.hasNext(Iterators.java:517)
 at
 com.google.common.collect.Iterators$3.hasNext(Iterators.java:114)
 at
 com.google.common.collect.Iterators$5.hasNext(Iterators.java:517)
 at
 com.google.common.collect.Iterators$3.hasNext(Iterators.java:114)
 at
 com.google.common.collect.Iterators$5.hasNext(Iterators.java:517)
 at
 com.google.common.collect.Iterators$3.hasNext(Iterators.java:114)
 at
 com.google.common.collect.Iterators$5.hasNext(Iterators.java:517)
 at
 com.google.common.collect.Iterators$3.hasNext(Iterators.java:114)


 - Original Message -
 From: aaron morton aa...@thelastpickle.com
 To: user@cassandra.apache.org
 Sent: Wednesday, January 23, 2013 2:40:45 PM
 Subject: Re: Cassandra pending compaction tasks keeps increasing

 The histogram does not look right to me, too many SSTables for an LCS CF.


 It's a symptom no a cause. If LCS is catching up though it should be more
 like the distribution in the linked article.


 Cheers








 -
 Aaron Morton
 Freelance 

Re: Cassandra pending compaction tasks keeps increasing

2013-01-24 Thread Wei Zhu
Thanks Derek,
in the cassandra-env.sh, it says 

# reduce the per-thread stack size to minimize the impact of Thrift 


# thread-per-client.  (Best practice is for client connections to   


# be pooled anyway.) Only do so on Linux where it is known to be


# supported.


# u34 and greater need 180k 


JVM_OPTS=$JVM_OPTS -Xss180k

What value should I use? Java defaults at 400K? Maybe try that first. 

Thanks.
-Wei

- Original Message -
From: Derek Williams de...@fyrie.net
To: user@cassandra.apache.org, Wei Zhu wz1...@yahoo.com
Sent: Thursday, January 24, 2013 11:06:00 PM
Subject: Re: Cassandra pending compaction tasks keeps increasing


Increasing the stack size in cassandra-env.sh should help you get past the 
stack overflow. Doesn't help with your original problem though. 



On Fri, Jan 25, 2013 at 12:00 AM, Wei Zhu  wz1...@yahoo.com  wrote: 


Well, even after restart, it throws the the same exception. I am basically 
stuck. Any suggestion to clear the pending compaction tasks? Below is the end 
of stack trace: 

at com.google.common.collect.Sets$1.iterator(Sets.java:578) 
at com.google.common.collect.Sets$1.iterator(Sets.java:578) 
at com.google.common.collect.Sets$1.iterator(Sets.java:578) 
at com.google.common.collect.Sets$1.iterator(Sets.java:578) 
at com.google.common.collect.Sets$3.iterator(Sets.java:667) 
at com.google.common.collect.Sets$3.size(Sets.java:670) 
at com.google.common.collect.Iterables.size(Iterables.java:80) 
at org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:557) 
at 
org.apache.cassandra.db.compaction.CompactionController.init(CompactionController.java:69)
 
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:105)
 
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
 
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
 
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) 
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) 
at java.util.concurrent.FutureTask.run(Unknown Source) 
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
at java.lang.Thread.run(Unknown Source) 

Any suggestion is very much appreciated 

-Wei 



- Original Message - 
From: Wei Zhu  wz1...@yahoo.com  
To: user@cassandra.apache.org 
Sent: Thursday, January 24, 2013 10:55:07 PM 
Subject: Re: Cassandra pending compaction tasks keeps increasing 

Do you mean 90% of the reads should come from 1 SSTable? 

By the way, after I finished the data migrating, I ran nodetool repair -pr on 
one of the nodes. Before nodetool repair, all the nodes have the same disk 
space usage. After I ran the nodetool repair, the disk space for that node 
jumped from 135G to 220G, also there are more than 15000 pending compaction 
tasks. After a while , Cassandra started to throw the exception like below and 
stop compacting. I had to restart the node. By the way, we are using 1.1.7. 
Something doesn't seem right. 


INFO [CompactionExecutor:108804] 2013-01-24 22:23:10,427 CompactionTask.java 
(line 109) Compacting 
[SSTableReader(path='/ssd/cassandra/data/zoosk/friends/zoosk-friends-hf-753782-Data.db')]
 
INFO [CompactionExecutor:108804] 2013-01-24 22:23:11,610 CompactionTask.java 
(line 221) Compacted to 
[/ssd/cassandra/data/zoosk/friends/zoosk-friends-hf-754996-Data.db,]. 5,259,403 
to 5,259,403 (~100% of original) bytes for 1,983 keys at 4.268730MB/s. Time: 
1,175ms. 
INFO [CompactionExecutor:108805] 2013-01-24 22:23:11,617 CompactionTask.java 
(line 109) Compacting 
[SSTableReader(path='/ssd/cassandra/data/zoosk/friends/zoosk-friends-hf-754880-Data.db')]
 
INFO [CompactionExecutor:108805] 2013-01-24 22:23:12,828 CompactionTask.java 
(line 221) Compacted to 
[/ssd/cassandra/data/zoosk/friends/zoosk-friends-hf-754997-Data.db,]. 5,272,746 
to 5,272,746 (~100% of original) bytes for 1,941 keys at 4.152339MB/s. Time: 
1,211ms. 
ERROR [CompactionExecutor:108806] 2013-01-24 22:23:13,048