Re: why set replica placement strategy at keyspace level ?

2013-01-28 Thread aaron morton
The row is the unit of replication, all values with the same storage engine row 
key in a KS are on the same nodes. if they were per CF this would not hold. 

Not that it would be the end of the world, but that is the first thing that 
comes to mind. 

Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 27/01/2013, at 4:15 PM, Manu Zhang owenzhang1...@gmail.com wrote:

 Although I've got to know Cassandra for quite a while, this question only has 
 occurred to me recently:
 
 Why are the replica placement strategy and replica factors set at the 
 keyspace level?
 
 Would setting them at the column family level offers more flexibility?
 
 Is this because it's easier for user to manage an application? Or related to 
 internal implementation? Or it's just that I've overlooked something?



Re: Does setstreamthroughput also throttle the network traffic caused by nodetool repair?

2013-01-28 Thread aaron morton
  Will that throttle the network traffic caused by nodetool repair?
yes. 


 Should I call it to all the nodes on the cluster?
Or set it in the yaml file.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 25/01/2013, at 2:31 PM, Wei Zhu wz1...@yahoo.com wrote:

 In the yaml, it has the following setting
 
 # Throttles all outbound streaming file transfers on this node to the
 # given total throughput in Mbps. This is necessary because Cassandra does
 # mostly sequential IO when streaming data during bootstrap or repair, which
 # can lead to saturating the network connection and degrading rpc performance.
 # When unset, the default is 400 Mbps or 50 MB/s.
 # stream_throughput_outbound_megabits_per_sec: 400
 
 Is this the same value as if I call
 
 Nodetool setstreamthroughput 
 
 Should I call it to all the nodes on the cluster? Will that throttle the 
 network traffic caused by nodetool repair?
 
 Thanks.
 -Wei



Re: why set replica placement strategy at keyspace level ?

2013-01-28 Thread Manu Zhang

On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:

The row is the unit of replication, all values with the same storage engine row 
key in a KS are on the same nodes. if they were per CF this would not hold.

Not that it would be the end of the world, but that is the first thing that 
comes to mind.

Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 27/01/2013, at 4:15 PM, Manu Zhang owenzhang1...@gmail.com wrote:


Although I've got to know Cassandra for quite a while, this question only has 
occurred to me recently:

Why are the replica placement strategy and replica factors set at the keyspace 
level?

Would setting them at the column family level offers more flexibility?

Is this because it's easier for user to manage an application? Or related to 
internal implementation? Or it's just that I've overlooked something?




Is it important to store rows of different column families that share 
the same row key to the same node? AFAIK, Cassandra doesn't support get 
all of them in a single call.


Meanwhile, what's the drawback of setting RPS and RF at column family 
level?


Another thing that's been confusing me is that when we talk about the 
data model should the row key be inside or outside a column family?


Thanks



CQL3 jdbc and Tomcat resource

2013-01-28 Thread Andy Cobley
I tried to add a CQL3  jdbc resource to tomcat 7 in a context.xml file (in a 
Eclipse project)  as follows:

Resource type=javax.sql.DataSource
name=jdbc/CF1
factory=org.apache.tomcat.jdbc.pool.DataSourceFactory
driverClassName=org.apache.cassandra.cql.jdbc.CassandraDriver
url=jdbc:cassandra://localhost:9170/Keyspace2
   
/

JDBC driver is cassandra-jdbc-1.1.2.  When Tomcat (7.035) restarts it throws a 
series of errors. Is this known, or expected ?  Removing the resource from 
contact.xml allows the server to start correctly.

Andy

Errors are:

Jan 28, 2013 11:26:27 AM org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal 
performance in production environments was not found on the java.library.path: 
.:/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java
Jan 28, 2013 11:26:27 AM org.apache.tomcat.util.digester.SetPropertiesRule begin
WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Context} Setting 
property 'source' to 'org.eclipse.jst.jee.server:Count' did not find a matching 
property.
Jan 28, 2013 11:26:27 AM org.apache.tomcat.util.digester.SetPropertiesRule begin
WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Context} Setting 
property 'source' to 'org.eclipse.jst.jee.server:mysqlexample' did not find a 
matching property.
Jan 28, 2013 11:26:27 AM org.apache.tomcat.util.digester.SetPropertiesRule begin
WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Context} Setting 
property 'source' to 'org.eclipse.jst.jee.server:testwebservlet' did not find a 
matching property.
Jan 28, 2013 11:26:27 AM org.apache.tomcat.util.digester.SetPropertiesRule begin
WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Context} Setting 
property 'source' to 'org.eclipse.jst.jee.server:Convert' did not find a 
matching property.
Jan 28, 2013 11:26:27 AM org.apache.tomcat.util.digester.SetPropertiesRule begin
WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Context} Setting 
property 'source' to 'org.eclipse.jst.jee.server:Math' did not find a matching 
property.
Jan 28, 2013 11:26:27 AM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler [http-bio-8080]
Jan 28, 2013 11:26:27 AM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler [ajp-bio-8009]
Jan 28, 2013 11:26:27 AM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 897 ms
Jan 28, 2013 11:26:27 AM org.apache.catalina.core.StandardService startInternal
INFO: Starting service Catalina
Jan 28, 2013 11:26:27 AM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.35
Jan 28, 2013 11:26:28 AM org.apache.catalina.core.ContainerBase startInternal
SEVERE: A child container failed during start
java.util.concurrent.ExecutionException: 
org.apache.catalina.LifecycleException: Failed to start component 
[StandardEngine[Catalina].StandardHost[localhost].StandardContext[/mysqlexample]]
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at 
org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:1123)
at 
org.apache.catalina.core.StandardHost.startInternal(StandardHost.java:800)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at 
org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559)
at 
org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Caused by: org.apache.catalina.LifecycleException: Failed to start component 
[StandardEngine[Catalina].StandardHost[localhost].StandardContext[/mysqlexample]]
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154)
... 7 more
Caused by: java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at 
org.apache.cassandra.cql.jdbc.CassandraDriver.clinit(CassandraDriver.java:52)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at 
org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:246)
at 
org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:182)
at 
org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:702)
at 
org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:634)
at 

Re: What is the default 'key_validation_class' on secondary INDEX(es)

2013-01-28 Thread Sylvain Lebresne
Your question is missing a what. What do you want to know the default of?

If you are asking for the key_validation_class of the Index CF, then it's
the column type that defines it. If you're asking about the index CF
comparator, then in that example it would use a comparator that sorts like
your partitioner (so if you say use the RandomPartitioner, the comparator
will sort by md5).

And no, none of those can be changed (it would make no sense for the
key_validation_class, as for the comparator, the 2ndary index
implementation relies pretty much by design on the ordering being the one
of the partitioner).

--
Sylvain


On Mon, Jan 28, 2013 at 11:57 AM, Alan Ristić alan.ris...@gmail.com wrote:

 I'm just curious for C* v1.2..

 a) Does automatic secondary INDEX CF defaults to primary CF
 key_validation_classhttp://www.datastax.com/docs/1.2/configuration/storage_configuration#key-validation-class
  ('user_id' in example) or it decides by column type (varchar in 'email'
 case) what key_validation_class should be ?
 *I suppose it defaults.*

 example:
 CREATE TABLE users (
   user_id uuid PRIMARY KEY,
   created_at timestamp,
   email varchar,
   ...
 ) WITH
  comment='Registered user';

 CREATE INDEX ON users (email);

 *b) And if yes can it be changed on CREATE INDEX? *
 *I suppose no.*

 ps: I know that sec. index in example is not optimal ;)

 Tnx,
 *Alan Ristić*

  *t*: @alanristic http://twitter.com/alanristic
 *m*: 040 423 688



Re: CQL3 jdbc and Tomcat resource

2013-01-28 Thread Andy Cobley
Apologies,

I was missing  a few cassandra jar libs in the tomcat library.

Andy

On 28 Jan 2013, at 11:31, Andy Cobley acob...@computing.dundee.ac.uk wrote:

 I tried to add a CQL3  jdbc resource to tomcat 7 in a context.xml file (in a 
 Eclipse project)  as follows:
 
 Resource type=javax.sql.DataSource
 name=jdbc/CF1
 factory=org.apache.tomcat.jdbc.pool.DataSourceFactory
 driverClassName=org.apache.cassandra.cql.jdbc.CassandraDriver
 url=jdbc:cassandra://localhost:9170/Keyspace2

 /
 
 JDBC driver is cassandra-jdbc-1.1.2.  When Tomcat (7.035) restarts it throws 
 a series of errors. Is this known, or expected ?  Removing the resource from 
 contact.xml allows the server to start correctly.
 
 Andy
 
 Errors are:
 
 Jan 28, 2013 11:26:27 AM org.apache.catalina.core.AprLifecycleListener init
 INFO: The APR based Apache Tomcat Native library which allows optimal 
 performance in production environments was not found on the 
 java.library.path: 
 .:/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java
 Jan 28, 2013 11:26:27 AM org.apache.tomcat.util.digester.SetPropertiesRule 
 begin
 WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Context} Setting 
 property 'source' to 'org.eclipse.jst.jee.server:Count' did not find a 
 matching property.
 Jan 28, 2013 11:26:27 AM org.apache.tomcat.util.digester.SetPropertiesRule 
 begin
 WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Context} Setting 
 property 'source' to 'org.eclipse.jst.jee.server:mysqlexample' did not find a 
 matching property.
 Jan 28, 2013 11:26:27 AM org.apache.tomcat.util.digester.SetPropertiesRule 
 begin
 WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Context} Setting 
 property 'source' to 'org.eclipse.jst.jee.server:testwebservlet' did not find 
 a matching property.
 Jan 28, 2013 11:26:27 AM org.apache.tomcat.util.digester.SetPropertiesRule 
 begin
 WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Context} Setting 
 property 'source' to 'org.eclipse.jst.jee.server:Convert' did not find a 
 matching property.
 Jan 28, 2013 11:26:27 AM org.apache.tomcat.util.digester.SetPropertiesRule 
 begin
 WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Context} Setting 
 property 'source' to 'org.eclipse.jst.jee.server:Math' did not find a 
 matching property.
 Jan 28, 2013 11:26:27 AM org.apache.coyote.AbstractProtocol init
 INFO: Initializing ProtocolHandler [http-bio-8080]
 Jan 28, 2013 11:26:27 AM org.apache.coyote.AbstractProtocol init
 INFO: Initializing ProtocolHandler [ajp-bio-8009]
 Jan 28, 2013 11:26:27 AM org.apache.catalina.startup.Catalina load
 INFO: Initialization processed in 897 ms
 Jan 28, 2013 11:26:27 AM org.apache.catalina.core.StandardService 
 startInternal
 INFO: Starting service Catalina
 Jan 28, 2013 11:26:27 AM org.apache.catalina.core.StandardEngine startInternal
 INFO: Starting Servlet Engine: Apache Tomcat/7.0.35
 Jan 28, 2013 11:26:28 AM org.apache.catalina.core.ContainerBase startInternal
 SEVERE: A child container failed during start
 java.util.concurrent.ExecutionException: 
 org.apache.catalina.LifecycleException: Failed to start component 
 [StandardEngine[Catalina].StandardHost[localhost].StandardContext[/mysqlexample]]
   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
   at 
 org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:1123)
   at 
 org.apache.catalina.core.StandardHost.startInternal(StandardHost.java:800)
   at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
   at 
 org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559)
   at 
 org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:680)
 Caused by: org.apache.catalina.LifecycleException: Failed to start component 
 [StandardEngine[Catalina].StandardHost[localhost].StandardContext[/mysqlexample]]
   at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154)
   ... 7 more
 Caused by: java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
   at 
 org.apache.cassandra.cql.jdbc.CassandraDriver.clinit(CassandraDriver.java:52)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:247)
   at 
 org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:246)
   at 
 org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:182)
   at 
 

Re: Denormalization

2013-01-28 Thread chandra Varahala
My experience we can design main column families  and lookup column
families.
Main column family have all denormalized data,lookup column  families have
rowkey of denormalized column families's column.

In users column family  all user's  denormalized data and  lookup column
family name like  userByemail.
when i first make request to userByemail retuns unique key which is rowkey
of User column family then call to User column family returns all data,
same other lookup column families too.

-
Chandra



On Sun, Jan 27, 2013 at 8:53 PM, Hiller, Dean dean.hil...@nrel.gov wrote:

 Agreed, was just making sure others knew ;).

 Dean

 From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com
 
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Sunday, January 27, 2013 6:51 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Denormalization

 When I said that writes were cheap, I was speaking that in a normal case
 people are making 2-10 inserts what in a relational database might be one.
 30K inserts is certainly not cheap.

 Your use case with 30,000 inserts is probably a special case. Most
 directory services that I am aware of OpenLDAP, Active Directory, Sun
 Directory server do eventually consistent master/slave and multi-master
 replication. So no worries about having to background something. You just
 want the replication to be fast enough so that when you call the employee
 about to be fired into the office, that by the time he leaves and gets home
 he can not VPN to rm -rf / your main file server :)


 On Sun, Jan 27, 2013 at 7:57 PM, Hiller, Dean dean.hil...@nrel.gov
 mailto:dean.hil...@nrel.gov wrote:
 Sometimes this is true, sometimes not…..….We have a use case where we have
 an admin tool where we choose to do this denorm for ACL on permission
 checks to make permission checks extremely fast.  That said, we have one
 issue with one object that too many children(30,000) so when someone gives
 a user access to this one object with 30,000 children, we end up with a bad
 60 second wait and users ended up getting frustrated and trying to
 cancel(our plan since admin activity hardly ever happens is to do it on our
 background thread and just return immediately to the user and tell him his
 changes will take affect in 1 minute ).  After all, admin changes are
 infrequent anyways.  This example demonstrates how sometimes it could
 almost burn you.

 I guess my real point is it really depends on your use cases ;).  In a lot
 of cases denorm can work but in some cases it burns you so you have to
 balance it all.  In 90% of our cases our denorm is working great and for
 this one case, we need to background the permission change as we still LOVE
 the performance of our ACL checks.

 Ps. 30,000 writes in cassandra is not cheap when done from one server ;)
 but in general parallized writes is very fast for like 500.

 Later,
 Dean

 From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com
 mailto:edlinuxg...@gmail.commailto:edlinuxg...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 mailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Sunday, January 27, 2013 5:50 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: Denormalization

 One technique is on the client side you build a tool that takes the even
 and produces N mutations. In c* writes are cheap so essentially, re-write
 everything on all changes.

 On Sun, Jan 27, 2013 at 4:03 PM, Fredrik Stigbäck 
 fredrik.l.stigb...@sitevision.semailto:fredrik.l.stigb...@sitevision.se
 mailto:fredrik.l.stigb...@sitevision.semailto:
 fredrik.l.stigb...@sitevision.se wrote:
 Hi.
 Since denormalized data is first-class citizen in Cassandra, how to
 handle updating denormalized data.
 E.g. If we have  a USER cf with name, email etc. and denormalize user
 data into many other CF:s and then
 update the information about a user (name, email...). What is the best
 way to handle updating those user data properties
 which might be spread out over many cf:s and many rows?

 Regards
 /Fredrik





RE: Accessing Metadata of Column Familes

2013-01-28 Thread Rishabh Agrawal
I found following issues while working on Cassandra version 1.2, CQL 3 and 
Thrift protocol 19.35.0.

Case 1:
Using CQL I created a table t1 with columns col1 and col2 with col1 being my 
primary key.

When I access same data using CLI, I see col1 gets adopted as rowkey and col2 
being another column. Now I have inserted value in another column (col3) in 
same row using CLI.  Now when I query same table again from CQL I am unable to 
find col3.

Case 2:

Using CLI, I have created table t2. Now I added a row key  row1 and two columns 
(keys)  col1 and col2 with some values in each. When I access t2 from CQL I 
find following resultset with three columns:

  key | column1 | value
row1| col1  | val1
row1| col2  | val2


This behavior raises certain questions:


* What is the reason for such schema anomaly or is this a problem?

* Which schema should be deemed as correct or consistent?

* How to access meta data on the same?



Thanks and Regards
Rishabh Agrawal


From: Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com]
Sent: Monday, January 28, 2013 12:57 PM
To: user@cassandra.apache.org
Subject: RE: Accessing Metadata of Column Familes

You can get storage attributes from /data/system/ keyspace.

From: Rishabh Agrawal [mailto:rishabh.agra...@impetus.co.in]
Sent: Monday, January 28, 2013 12:42 PM
To: user@cassandra.apache.org
Subject: RE: Accessing Metadata of Column Familes

Thank for the reply.

I do not want to go by API route. I wish to access files and column families 
which store meta data information

From: Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com]
Sent: Monday, January 28, 2013 12:25 PM
To: user@cassandra.apache.org
Subject: RE: Accessing Metadata of Column Familes

Which API are you using?
If you are using Hector use ColumnFamilyDefinition.

Regards
Harshvardhan OJha

From: Rishabh Agrawal [mailto:rishabh.agra...@impetus.co.in]
Sent: Monday, January 28, 2013 12:16 PM
To: user@cassandra.apache.org
Subject: Accessing Metadata of Column Familes

Hello,

I wish to access metadata information on column families. How can I do it? Any 
ideas?

Thanks and Regards
Rishabh Agrawal









NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.
The contents of this email, including the attachments, are PRIVILEGED AND 
CONFIDENTIAL to the intended recipient at the email address to which it has 
been addressed. If you receive it in error, please notify the sender 
immediately by return email and then permanently delete it from your system. 
The unauthorized use, distribution, copying or alteration of this email, 
including the attachments, is strictly forbidden. Please note that neither 
MakeMyTrip nor the sender accepts any responsibility for viruses and it is your 
responsibility to scan the email and attachments (if any). No contracts may be 
concluded on behalf of MakeMyTrip by means of email communications.








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.
The contents of this email, including the attachments, are PRIVILEGED AND 
CONFIDENTIAL to the intended recipient at the email address to which it has 
been addressed. If you receive it in error, please notify the sender 
immediately by return email and then permanently delete it from your system. 
The unauthorized use, distribution, copying or alteration of this email, 
including the attachments, is strictly forbidden. Please note that neither 
MakeMyTrip nor the sender accepts any responsibility for viruses and it is your 
responsibility to scan the email and attachments (if any). No contracts may be 
concluded on behalf of MakeMyTrip by means of email communications.








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant 

Re: Accessing Metadata of Column Familes

2013-01-28 Thread Brian O'Neill
Through CQL, you see the logical schema.
Through CLI, you see the physical schema.

This may help:
http://www.datastax.com/dev/blog/cql3-for-cassandra-experts

-brian

On Mon, Jan 28, 2013 at 7:26 AM, Rishabh Agrawal
rishabh.agra...@impetus.co.in wrote:
 I found following issues while working on Cassandra version 1.2, CQL 3 and
 Thrift protocol 19.35.0.



 Case 1:

 Using CQL I created a table t1 with columns col1 and col2 with col1 being my
 primary key.



 When I access same data using CLI, I see col1 gets adopted as rowkey and
 col2 being another column. Now I have inserted value in another column
 (col3) in same row using CLI.  Now when I query same table again from CQL I
 am unable to find col3.



 Case 2:



 Using CLI, I have created table t2. Now I added a row key  row1 and two
 columns (keys)  col1 and col2 with some values in each. When I access t2
 from CQL I find following resultset with three columns:



   key | column1 | value

 row1| col1  | val1

 row1| col2  | val2





 This behavior raises certain questions:



 · What is the reason for such schema anomaly or is this a problem?

 · Which schema should be deemed as correct or consistent?

 · How to access meta data on the same?





 Thanks and Regards

 Rishabh Agrawal





 From: Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com]
 Sent: Monday, January 28, 2013 12:57 PM


 To: user@cassandra.apache.org
 Subject: RE: Accessing Metadata of Column Familes



 You can get storage attributes from /data/system/ keyspace.



 From: Rishabh Agrawal [mailto:rishabh.agra...@impetus.co.in]
 Sent: Monday, January 28, 2013 12:42 PM
 To: user@cassandra.apache.org
 Subject: RE: Accessing Metadata of Column Familes



 Thank for the reply.



 I do not want to go by API route. I wish to access files and column families
 which store meta data information



 From: Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com]
 Sent: Monday, January 28, 2013 12:25 PM
 To: user@cassandra.apache.org
 Subject: RE: Accessing Metadata of Column Familes



 Which API are you using?

 If you are using Hector use ColumnFamilyDefinition.



 Regards

 Harshvardhan OJha



 From: Rishabh Agrawal [mailto:rishabh.agra...@impetus.co.in]
 Sent: Monday, January 28, 2013 12:16 PM
 To: user@cassandra.apache.org
 Subject: Accessing Metadata of Column Familes



 Hello,



 I wish to access metadata information on column families. How can I do it?
 Any ideas?



 Thanks and Regards

 Rishabh Agrawal





 







 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that the
 communication is free of errors, virus, interception or interference.

 The contents of this email, including the attachments, are PRIVILEGED AND
 CONFIDENTIAL to the intended recipient at the email address to which it has
 been addressed. If you receive it in error, please notify the sender
 immediately by return email and then permanently delete it from your system.
 The unauthorized use, distribution, copying or alteration of this email,
 including the attachments, is strictly forbidden. Please note that neither
 MakeMyTrip nor the sender accepts any responsibility for viruses and it is
 your responsibility to scan the email and attachments (if any). No contracts
 may be concluded on behalf of MakeMyTrip by means of email communications.



 







 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that the
 communication is free of errors, virus, interception or interference.

 The contents of this email, including the attachments, are PRIVILEGED AND
 CONFIDENTIAL to the intended recipient at the email address to which it has
 been addressed. If you receive it in error, please notify the sender
 immediately by return email and then permanently delete it from your system.
 The unauthorized use, distribution, copying or alteration of this email,
 including the attachments, is strictly forbidden. Please note that neither
 MakeMyTrip nor the sender accepts any responsibility for viruses and it is
 your responsibility to scan the email and attachments (if any). No contracts
 may be concluded on behalf of MakeMyTrip by means of email communications.


 

data not shown up after some time

2013-01-28 Thread Matthias Zeilinger
Hi,

I´m a simple operations guy and new to Cassandra.
I have the problem that one of our application is writing data into Cassandra 
(but not deleting them, because we should have a 90 days TTL).
The application operates in 1 KS with 5 CF. my current setup:

3 node cluster and KS has a RF of 3 (I know it´s not the best setup)

I can see now the problem that after 10 days most (nearly all) data are not 
showing anymore in the cli and also our application cannot see the data.
I assume that it has something to do with the gc_grace_seconds, it is set to 10 
days.

I have read many documentations about tombstones, but our application doesn´t 
perform deletes.
How can I see in the cli, if I row key has any tombstone or not.

Could it be that there are some ghost tombstones?

Thx for your help

Br,
Matthias Zeilinger
Production Operation - Shared Services

P: +43 (0) 50 858-31185
M: +43 (0) 664 85-34459
E: matthias.zeilin...@bwinparty.com

bwin.party services (Austria) GmbH
Marxergasse 1B
A-1030 Vienna

www.bwinparty.com



[RELEASE] Apache Cassandra 1.2.1 released

2013-01-28 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.2.1.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is the first maintenance/bug fix release[1] on the 1.1 series.
As
always, please pay attention to the release notes[2] and Let us know[3] if
you
were to encounter any problem.

Enjoy!

[1]: http://goo.gl/yIiHD (CHANGES.txt)
[2]: http://goo.gl/GA9Cz (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


RE: data not shown up after some time

2013-01-28 Thread Viktor Jevdokimov
Are you sure your app is setting TTL correctly?
TTL is in seconds. For 90 days it have to be 90*24*60*60=7776000.
What If you set by accident 777600 (10 times less) - that will be 9 days, 
almost what you see.

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Matthias Zeilinger [mailto:matthias.zeilin...@bwinparty.com]
Sent: Monday, January 28, 2013 15:57
To: user@cassandra.apache.org
Subject: data not shown up after some time

Hi,

I´m a simple operations guy and new to Cassandra.
I have the problem that one of our application is writing data into Cassandra 
(but not deleting them, because we should have a 90 days TTL).
The application operates in 1 KS with 5 CF. my current setup:

3 node cluster and KS has a RF of 3 (I know it´s not the best setup)

I can see now the problem that after 10 days most (nearly all) data are not 
showing anymore in the cli and also our application cannot see the data.
I assume that it has something to do with the gc_grace_seconds, it is set to 10 
days.

I have read many documentations about tombstones, but our application doesn´t 
perform deletes.
How can I see in the cli, if I row key has any tombstone or not.

Could it be that there are some ghost tombstones?

Thx for your help

Br,
Matthias Zeilinger
Production Operation - Shared Services

P: +43 (0) 50 858-31185
M: +43 (0) 664 85-34459
E: matthias.zeilin...@bwinparty.commailto:matthias.zeilin...@bwinparty.com

bwin.party services (Austria) GmbH
Marxergasse 1B
A-1030 Vienna

www.bwinparty.comhttp://www.bwinparty.com

inline: signature-logo4c86.pnginline: signature-best-employer-logo4d95.png

RE: data not shown up after some time

2013-01-28 Thread Matthias Zeilinger
Hi,

No I have checked the TTL: 7776000

Very interesting is, if I do a simple list cf; the data is shown, but it I 
do a get cf where index='testvalue'; it returns 0 Row Returned.

How can that be?

Br,
Matthias Zeilinger
Production Operation - Shared Services

P: +43 (0) 50 858-31185
M: +43 (0) 664 85-34459
E: matthias.zeilin...@bwinparty.com

bwin.party services (Austria) GmbH
Marxergasse 1B
A-1030 Vienna

www.bwinparty.com

From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com]
Sent: Montag, 28. Jänner 2013 15:25
To: user@cassandra.apache.org
Subject: RE: data not shown up after some time

Are you sure your app is setting TTL correctly?
TTL is in seconds. For 90 days it have to be 90*24*60*60=7776000.
What If you set by accident 777600 (10 times less) - that will be 9 days, 
almost what you see.

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News]http://www.adform.com
[Adform awarded the Best Employer 
2012]http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Matthias Zeilinger [mailto:matthias.zeilin...@bwinparty.com]
Sent: Monday, January 28, 2013 15:57
To: user@cassandra.apache.org
Subject: data not shown up after some time

Hi,

I´m a simple operations guy and new to Cassandra.
I have the problem that one of our application is writing data into Cassandra 
(but not deleting them, because we should have a 90 days TTL).
The application operates in 1 KS with 5 CF. my current setup:

3 node cluster and KS has a RF of 3 (I know it´s not the best setup)

I can see now the problem that after 10 days most (nearly all) data are not 
showing anymore in the cli and also our application cannot see the data.
I assume that it has something to do with the gc_grace_seconds, it is set to 10 
days.

I have read many documentations about tombstones, but our application doesn´t 
perform deletes.
How can I see in the cli, if I row key has any tombstone or not.

Could it be that there are some ghost tombstones?

Thx for your help

Br,
Matthias Zeilinger
Production Operation - Shared Services

P: +43 (0) 50 858-31185
M: +43 (0) 664 85-34459
E: matthias.zeilin...@bwinparty.commailto:matthias.zeilin...@bwinparty.com

bwin.party services (Austria) GmbH
Marxergasse 1B
A-1030 Vienna

www.bwinparty.comhttp://www.bwinparty.com

inline: image001.pnginline: image002.png

unsubscribe

2013-01-28 Thread Olivier Devos
unsubscribe

 

From: Matthias Zeilinger [mailto:matthias.zeilin...@bwinparty.com] 
Sent: lundi 28 janvier 2013 15:32
To: user@cassandra.apache.org
Subject: RE: data not shown up after some time

 

Hi,

 

No I have checked the TTL: 7776000

 

Very interesting is, if I do a simple “list cf;” the data is shown, but it
I do a “get cf where index=’testvalue’;” it returns “0 Row Returned”.

 

How can that be?

 

Br,

Matthias Zeilinger

Production Operation – Shared Services

 

P: +43 (0) 50 858-31185

M: +43 (0) 664 85-34459

E: matthias.zeilin...@bwinparty.com
mailto:matthias.zeilin...@bwinparty.com 

 

bwin.party services (Austria) GmbH 

Marxergasse 1B

A-1030 Vienna

 

www.bwinparty.com http://www.bwinparty.com  

 

From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com] 
Sent: Montag, 28. Jänner 2013 15:25
To: user@cassandra.apache.org mailto:user@cassandra.apache.org 
Subject: RE: data not shown up after some time

 

Are you sure your app is setting TTL correctly?

TTL is in seconds. For 90 days it have to be 90*24*60*60=7776000.

What If you set by accident 777600 (10 times less) – that will be 9 days,
almost what you see.

 


Best regards / Pagarbiai

Viktor Jevdokimov

Senior Developer

 

Email:  mailto:viktor.jevdoki...@adform.com viktor.jevdoki...@adform.com

Phone: +370 5 212 3063, Fax +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania

Follow us on Twitter:  http://twitter.com/#!/adforminsider @adforminsider

Take a ride with Adform's  http://vimeo.com/adform/richmedia Rich Media
Suite

 http://www.adform.com 

 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employ
er-survey/ 


Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies. 

 

From: Matthias Zeilinger [mailto:matthias.zeilin...@bwinparty.com] 
Sent: Monday, January 28, 2013 15:57
To: user@cassandra.apache.org mailto:user@cassandra.apache.org 
Subject: data not shown up after some time

 

Hi,

 

I´m a simple operations guy and new to Cassandra.

I have the problem that one of our application is writing data into
Cassandra (but not deleting them, because we should have a 90 days TTL).

The application operates in 1 KS with 5 CF. my current setup:

 

3 node cluster and KS has a RF of 3 (I know it´s not the best setup)

 

I can see now the problem that after 10 days most (nearly all) data are not
showing anymore in the cli and also our application cannot see the data.

I assume that it has something to do with the gc_grace_seconds, it is set to
10 days.

 

I have read many documentations about tombstones, but our application
doesn´t perform deletes.

How can I see in the cli, if I row key has any tombstone or not.

 

Could it be that there are some ghost tombstones?

 

Thx for your help

 

Br,

Matthias Zeilinger

Production Operation – Shared Services

 

P: +43 (0) 50 858-31185

M: +43 (0) 664 85-34459

E:  mailto:matthias.zeilin...@bwinparty.com
matthias.zeilin...@bwinparty.com

 

bwin.party services (Austria) GmbH 

Marxergasse 1B

A-1030 Vienna

 

 http://www.bwinparty.com www.bwinparty.com 

 

image001.pngimage002.png

Re: Cassandra timeout whereas it is not much busy

2013-01-28 Thread Nicolas Lalevée
I did some testing, I have a theory.

First, we have it seems a lot of CF. And two are particularly every hungry in 
RAM, consuming a quite big amount of RAM for the bloom filters. Cassandra do 
not force the flush of the memtables if it has more than 6G of Xmx (luckily for 
us, this is the maximum reasonable we can give).
Since our machines have 8G, this gives quite a little room for the disk cache. 
Thanks to this systemtap script [1], I have seen that the hit ratio is about 
10%.

Then I have tested with an Xmx at 4G. So %wa drops down. The disk cache ratio 
raises to 80%. On the other hand, flushing is happening very often. I cannot 
say how much, since I have too many CF to graph them all. But the ones I graph, 
none of their memtable goes above 10M, whereas they usually go up to 200M.

I have not tested further. Since it is quite obvious that the machines needs 
more RAM. And they're about to receive more.

But I guess that if I had to put more write and read pressure, with still an 
xmx at 4G, the %wa would still be quite low, but the flushing would be even 
more intensive. And I guess that it would go wrong. From what I could read 
there seems to be a contention issue around the flushing (the switchlock ?). 
Cassandra would then be slow, but not using the entire cpu. I would be in the 
strange situation I was where I reported my issue in this thread.
Does my theory makes sense ?

Nicolas

[1] http://sourceware.org/systemtap/wiki/WSCacheHitRate

Le 23 janv. 2013 à 18:35, Nicolas Lalevée nicolas.lale...@hibnet.org a écrit :

 Le 22 janv. 2013 à 21:50, Rob Coli rc...@palominodb.com a écrit :
 
 On Wed, Jan 16, 2013 at 1:30 PM, Nicolas Lalevée
 nicolas.lale...@hibnet.org wrote:
 Here is the long story.
 After some long useless staring at the monitoring graphs, I gave a try to
 using the openjdk 6b24 rather than openjdk 7u9
 
 OpenJDK 6 and 7 are both counter-recommended with regards to
 Cassandra. I've heard reports of mysterious behavior like the behavior
 you describe, when using OpenJDK 7.
 
 Try using the Sun/Oracle JVM? Is your JNA working?
 
 JNA is working.
 I tried both oracle-jdk6 and oracle-jdk7, no difference with openjdk6. And 
 since ubuntu is only maintaining openjdk, we'll stick with it until oracle's 
 one proven better.
 oracle vs openjdk, I tested for now under normal pressure though.
 
 What amaze me is whatever how much I google it and ask around, I still don't 
 know for sure the difference between the openjdk and oracle's jdk…
 
 Nicolas
 



unsubscribe

2013-01-28 Thread Olivier Devos
unsubscribe



Re: unsubscribe

2013-01-28 Thread Eric Evans
http://i.imgur.com/2ch9L.gif

On Mon, Jan 28, 2013 at 8:36 AM, Olivier Devos olde...@gmail.com wrote:
 unsubscribe




-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


Re: unsubscribe

2013-01-28 Thread Alain RODRIGUEZ
You can try it a third time or you can rather try writing to
user-unsubscr...@cassandra.apache.org

Alain


2013/1/28 Olivier Devos olde...@gmail.com

 unsubscribe




Re: Unavaliable Exception

2013-01-28 Thread Everton Lima
Thanks for replies.

2013/1/25 Michael Kjellman mkjell...@barracuda.com

 More nodes!

 On Jan 25, 2013, at 7:21 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 fwiw, I have a mixed ubuntu 11.10 / 12.04 6 node cluster (AWS m1.xlarge).

 The load average is always between 0 and 5 for 11.10 nodes while 12.04
 nodes shows all the time a load between 2 and 20.

 I have the same configuration on each node and the average request latency
 is a few better on 12.04 nodes (which have the higher load).

 htop shows about the same cpu used for any of these nodes so does iostat
 (same iowait, user, system...).

 I am not sure why this load increase, but I am not sure either that this
 is a problem.

 Alain


 2013/1/25 Everton Lima peitin.inu...@gmail.com

 Hello,
 I was storing many data in a cluster of 7 nodes with replication factor 3.
 The variable Load(cpu), that can be viewed in

 opscenter - cluster - List view

 is upping over than 15.

 Is there any manner to minimize this?

 Thanks

 --
 Everton Lima Aleixo
 Bacharel em Ciência da Computação pela UFG
 Mestrando em Ciência da Computação pela UFG
 Programador no LUPA





-- 
Everton Lima Aleixo
Bacharel em Ciência da Computação pela UFG
Mestrando em Ciência da Computação pela UFG
Programador no LUPA


cluster issues

2013-01-28 Thread S C



One of our node in a 3 node cluster drifted by ~ 20-25 seconds. While I figured 
this pretty quickly, I had few questions that am looking for some answers.
We can always be proactive in keeping the time sync. But, Is there any way to 
recover from a time drift (in a reactive manner)? Since it was a lab 
environment, I dropped the KS (deleted data directory).Are there any other 
scenarios that would lead a cluster look like below? Note:Actual topology of 
the cluster - ONE Cassandra node and TWO Analytic nodes.

On 192.168.2.100Address DC  RackStatus State   Load 
   OwnsToken

  113427455640312821154458202477256070485 192.168.2.100  Cassandra   rack1  
 Up Normal  601.34 MB   33.33%  0   
192.168.2.101  Analytics   rack1   Down   Normal  
149.75 MB   33.33%  56713727820156410577229101238628035242  
192.168.2.102  Analytics   rack1   Down   Normal  ?   33.33%
  113427455640312821154458202477256070485   
On 192.168.2.101Address DC  RackStatus State   Load 
   OwnsToken

  113427455640312821154458202477256070485 192.168.2.100  Analytics   rack1  
 Down   Normal  ?   33.33%  0   
192.168.2.101  Analytics   rack1   Up Normal  
158.59 MB   33.33%  56713727820156410577229101238628035242  
192.168.2.102  Analytics   rack1   Down   Normal  ?   33.33%
  113427455640312821154458202477256070485
On 192.168.2.102Address DC  RackStatus State   Load 
   OwnsToken

  113427455640312821154458202477256070485 192.168.2.100  Analytics   rack1  
 Down   Normal  ?   33.33%  0   
192.168.2.101  Analytics   rack1   Down   Normal  ? 
  33.33%  56713727820156410577229101238628035242  
192.168.2.102  Analytics   rack1   Up Normal  117.02 MB   33.33%
  113427455640312821154458202477256070485 

Appreciate your valuable inputs.
Thanks,SC
  

JDBC, Select * Cql2 vs Cql3 problem ?

2013-01-28 Thread Andy Cobley
I have the following code in my app using the JDBC (cassandra-jdbc-1.1.2.jar) 
drivers to CQL:

try {
rs= stmt.executeQuery(SELECT * FROM users);
}catch(Exception et){
System.out.println(Can not execute statement +et);
}

When connecting to a CQL2 server (cassandra 1.1.5) the code works as expected 
returning a result set .  When connecting to CQL3 (Cassandra 1.2) I catch the 
following exception:

Can not execute statement java.lang.NullPointerException

The Select statement (Select * from users) does work from CQLSH as expected.  
Is there a problem with my code or something else ?

Andy C
School of Computing
University of Dundee.




The University of Dundee is a Scottish Registered Charity, No. SC015096.




Re: Cassandra pending compaction tasks keeps increasing

2013-01-28 Thread Wei Zhu
Any thoughts?

Thanks.
-Wei

- Original Message -
From: Wei Zhu wz1...@yahoo.com
To: user@cassandra.apache.org
Sent: Friday, January 25, 2013 10:09:37 PM
Subject: Re: Cassandra pending compaction tasks keeps increasing


To recap the problem, 
1.1.6 on SSD, 5 nodes, RF = 3, one CF only. 
After data load, initially all 5 nodes have very even data size (135G, each). I 
ran nodetool repair -pr on node 1 which have replicates on node 2, node 3 since 
we set RF = 3. 
It appears that huge amount of data got transferred. Node 1 has 220G, node 2, 3 
have around 170G. Pending LCS task on node 1 is 15K and node 2, 3 have around 
7K each. 
Questions: 

* Why nodetool repair increases the data size that much? It's not likely 
that much data needs to be repaired. Will that happen for all the subsequent 
repair? 
* How to make LCS run faster? After almost a day, the LCS tasks only 
dropped by 1000. I am afraid it will never catch up. We set 


* compaction_throughput_mb_per_sec = 500 
* multithreaded_compaction: true 


Both Disk and CPU util are less than 10%. I understand LCS is single threaded, 
any chance to speed it up? 


* We use default SSTable size as 5M, Will increase the size of SSTable 
help? What will happen if I change the setting after the data is loaded. 

Any suggestion is very much appreciated. 

-Wei 

- Original Message -

From: Wei Zhu wz1...@yahoo.com 
To: user@cassandra.apache.org 
Sent: Thursday, January 24, 2013 11:46:04 PM 
Subject: Re: Cassandra pending compaction tasks keeps increasing 

I believe I am running into this one: 

https://issues.apache.org/jira/browse/CASSANDRA-4765 

By the way, I am using 1.1.6 (I though I was using 1.1.7) and this one is fixed 
in 1.1.7. 

- Original Message -

From: Wei Zhu wz1...@yahoo.com 
To: user@cassandra.apache.org 
Sent: Thursday, January 24, 2013 11:18:59 PM 
Subject: Re: Cassandra pending compaction tasks keeps increasing 

Thanks Derek, 
in the cassandra-env.sh, it says 

# reduce the per-thread stack size to minimize the impact of Thrift 
# thread-per-client. (Best practice is for client connections to 
# be pooled anyway.) Only do so on Linux where it is known to be 
# supported. 
# u34 and greater need 180k 
JVM_OPTS=$JVM_OPTS -Xss180k 

What value should I use? Java defaults at 400K? Maybe try that first. 

Thanks. 
-Wei 

- Original Message - 
From: Derek Williams de...@fyrie.net 
To: user@cassandra.apache.org, Wei Zhu wz1...@yahoo.com 
Sent: Thursday, January 24, 2013 11:06:00 PM 
Subject: Re: Cassandra pending compaction tasks keeps increasing 


Increasing the stack size in cassandra-env.sh should help you get past the 
stack overflow. Doesn't help with your original problem though. 



On Fri, Jan 25, 2013 at 12:00 AM, Wei Zhu  wz1...@yahoo.com  wrote: 


Well, even after restart, it throws the the same exception. I am basically 
stuck. Any suggestion to clear the pending compaction tasks? Below is the end 
of stack trace: 

at com.google.common.collect.Sets$1.iterator(Sets.java:578) 
at com.google.common.collect.Sets$1.iterator(Sets.java:578) 
at com.google.common.collect.Sets$1.iterator(Sets.java:578) 
at com.google.common.collect.Sets$1.iterator(Sets.java:578) 
at com.google.common.collect.Sets$3.iterator(Sets.java:667) 
at com.google.common.collect.Sets$3.size(Sets.java:670) 
at com.google.common.collect.Iterables.size(Iterables.java:80) 
at org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:557) 
at 
org.apache.cassandra.db.compaction.CompactionController.init(CompactionController.java:69)
 
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:105)
 
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
 
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
 
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) 
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) 
at java.util.concurrent.FutureTask.run(Unknown Source) 
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
at java.lang.Thread.run(Unknown Source) 

Any suggestion is very much appreciated 

-Wei 



- Original Message - 
From: Wei Zhu  wz1...@yahoo.com  
To: user@cassandra.apache.org 
Sent: Thursday, January 24, 2013 10:55:07 PM 
Subject: Re: Cassandra pending compaction tasks keeps increasing 

Do you mean 90% of the reads should come from 1 SSTable? 

By the way, after I finished the data migrating, I ran nodetool repair -pr on 
one of the nodes. Before nodetool repair, all the nodes have the same disk 
space usage. After I ran the nodetool repair, the disk space for that node 
jumped from 135G to 220G, also there are more than 15000 pending compaction 
tasks. After a 

cql: show tables in a keystone

2013-01-28 Thread Paul van Hoven
Is there some way in cql to get a list of all tables or column
families that belong to a keystore like show tables in sql?


Re: cql: show tables in a keystone

2013-01-28 Thread Brian O'Neill

cqlsh use keyspace;
cqlsh:cirrus describe tables;

For more info:
cqlsh help describe

-brian


---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 http://www.twitter.com/boneill42  €
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 1/28/13 2:27 PM, Paul van Hoven paul.van.ho...@googlemail.com wrote:

Is there some way in cql to get a list of all tables or column
families that belong to a keystore like show tables in sql?




Re: Cassandra pending compaction tasks keeps increasing

2013-01-28 Thread Wei Zhu
Two fundamental questions: 


* Why did nodetool repairs bring so much data. A lot of SSTables are 
created, disk space almost doubled. 
* Why does level compactions run so slow? We turned off throtting 
completely and don't see much utilization of the SSD and CPU. One example, 
0.7MB/s on SSD? That is insane. Anything I can do to speed it up? 


* 1,837,023,925 to 1,836,694,446 (~99% of original) bytes for 1,686,604 
keys at 0.717223MB/s. Time: 2,442,208ms. 

Thanks. 
-Wei 
- Original Message -

From: Wei Zhu wz1...@yahoo.com 
To: user@cassandra.apache.org 
Sent: Monday, January 28, 2013 11:16:47 AM 
Subject: Re: Cassandra pending compaction tasks keeps increasing 

Any thoughts? 

Thanks. 
-Wei 

- Original Message -

From: Wei Zhu wz1...@yahoo.com 
To: user@cassandra.apache.org 
Sent: Friday, January 25, 2013 10:09:37 PM 
Subject: Re: Cassandra pending compaction tasks keeps increasing 


To recap the problem, 
1.1.6 on SSD, 5 nodes, RF = 3, one CF only. 
After data load, initially all 5 nodes have very even data size (135G, each). I 
ran nodetool repair -pr on node 1 which have replicates on node 2, node 3 since 
we set RF = 3. 
It appears that huge amount of data got transferred. Node 1 has 220G, node 2, 3 
have around 170G. Pending LCS task on node 1 is 15K and node 2, 3 have around 
7K each. 
Questions: 

* Why nodetool repair increases the data size that much? It's not likely that 
much data needs to be repaired. Will that happen for all the subsequent repair? 
* How to make LCS run faster? After almost a day, the LCS tasks only dropped by 
1000. I am afraid it will never catch up. We set 


* compaction_throughput_mb_per_sec = 500 
* multithreaded_compaction: true 


Both Disk and CPU util are less than 10%. I understand LCS is single threaded, 
any chance to speed it up? 


* We use default SSTable size as 5M, Will increase the size of SSTable help? 
What will happen if I change the setting after the data is loaded. 

Any suggestion is very much appreciated. 

-Wei 

- Original Message - 

From: Wei Zhu wz1...@yahoo.com 
To: user@cassandra.apache.org 
Sent: Thursday, January 24, 2013 11:46:04 PM 
Subject: Re: Cassandra pending compaction tasks keeps increasing 

I believe I am running into this one: 

https://issues.apache.org/jira/browse/CASSANDRA-4765 

By the way, I am using 1.1.6 (I though I was using 1.1.7) and this one is fixed 
in 1.1.7. 

- Original Message - 

From: Wei Zhu wz1...@yahoo.com 
To: user@cassandra.apache.org 
Sent: Thursday, January 24, 2013 11:18:59 PM 
Subject: Re: Cassandra pending compaction tasks keeps increasing 

Thanks Derek, 
in the cassandra-env.sh, it says 

# reduce the per-thread stack size to minimize the impact of Thrift 
# thread-per-client. (Best practice is for client connections to 
# be pooled anyway.) Only do so on Linux where it is known to be 
# supported. 
# u34 and greater need 180k 
JVM_OPTS=$JVM_OPTS -Xss180k 

What value should I use? Java defaults at 400K? Maybe try that first. 

Thanks. 
-Wei 

- Original Message - 
From: Derek Williams de...@fyrie.net 
To: user@cassandra.apache.org, Wei Zhu wz1...@yahoo.com 
Sent: Thursday, January 24, 2013 11:06:00 PM 
Subject: Re: Cassandra pending compaction tasks keeps increasing 


Increasing the stack size in cassandra-env.sh should help you get past the 
stack overflow. Doesn't help with your original problem though. 



On Fri, Jan 25, 2013 at 12:00 AM, Wei Zhu  wz1...@yahoo.com  wrote: 


Well, even after restart, it throws the the same exception. I am basically 
stuck. Any suggestion to clear the pending compaction tasks? Below is the end 
of stack trace: 

at com.google.common.collect.Sets$1.iterator(Sets.java:578) 
at com.google.common.collect.Sets$1.iterator(Sets.java:578) 
at com.google.common.collect.Sets$1.iterator(Sets.java:578) 
at com.google.common.collect.Sets$1.iterator(Sets.java:578) 
at com.google.common.collect.Sets$3.iterator(Sets.java:667) 
at com.google.common.collect.Sets$3.size(Sets.java:670) 
at com.google.common.collect.Iterables.size(Iterables.java:80) 
at org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:557) 
at 
org.apache.cassandra.db.compaction.CompactionController.init(CompactionController.java:69)
 
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:105)
 
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
 
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
 
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) 
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) 
at java.util.concurrent.FutureTask.run(Unknown Source) 
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
at 

Re: why set replica placement strategy at keyspace level ?

2013-01-28 Thread aaron morton
 
 Another thing that's been confusing me is that when we talk about the data 
 model should the row key be inside or outside a column family?
My mental model is:

cluster == database
keyspace == table
row == a row in a table
CF == a family of columns in one row

(I think that's different to others, but it works for me)

 Is it important to store rows of different column families that share the 
 same row key to the same node?
Makes the failure models a little easier to understand. e.g. Everything key for 
user amorton is either available or not. 

 Meanwhile, what's the drawback of setting RPS and RF at column family level?
Other than it's baked in?

We process all mutations for a row at the same time. If you write to 4 CF's 
with the same row key that is considered one mutation, for one row. That one 
RowMutation is directed to the replicas using the ReplicationStratagy and 
atomically applied to the commit log. 

If you have RS per CF that one mutation would be split into 4, which would then 
be sent to different replicas. Even if they went to the same replicas they 
would be written to the commit log as different mutations. 

So if you have RS per CF you lose atomic commits for writes to the same row.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/01/2013, at 11:22 PM, Manu Zhang owenzhang1...@gmail.com wrote:

 On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:
 The row is the unit of replication, all values with the same storage engine 
 row key in a KS are on the same nodes. if they were per CF this would not 
 hold.
 
 Not that it would be the end of the world, but that is the first thing that 
 comes to mind.
 
 Cheers
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 27/01/2013, at 4:15 PM, Manu Zhang owenzhang1...@gmail.com wrote:
 
 Although I've got to know Cassandra for quite a while, this question only 
 has occurred to me recently:
 
 Why are the replica placement strategy and replica factors set at the 
 keyspace level?
 
 Would setting them at the column family level offers more flexibility?
 
 Is this because it's easier for user to manage an application? Or related 
 to internal implementation? Or it's just that I've overlooked something?
 
 
 Is it important to store rows of different column families that share the 
 same row key to the same node? AFAIK, Cassandra doesn't support get all of 
 them in a single call.
 
 Meanwhile, what's the drawback of setting RPS and RF at column family level?
 
 Another thing that's been confusing me is that when we talk about the data 
 model should the row key be inside or outside a column family?
 
 Thanks
 



Re: Node selection when both partition key and secondary index field constrained?

2013-01-28 Thread aaron morton
It uses the index...

cqlsh:dev tracing on;
Now tracing requests.
cqlsh:dev 
cqlsh:dev 
cqlsh:dev SELECT id, flag from foo WHERE TOKEN(id)  '-9939393' AND TOKEN(id) 
= '0' AND flag=true;

Tracing session: 128cab90-6982-11e2-8cd1-51eaa232562e

 activity   | timestamp| source
| source_elapsed
+--+---+
 execute_cql3_query | 08:36:55,244 | 127.0.0.1 
|  0
  Parsing statement | 08:36:55,244 | 127.0.0.1 
|600
 Peparing statement | 08:36:55,245 | 127.0.0.1 
|   1408
  Determining replicas to query | 08:36:55,246 | 127.0.0.1 
|   1924
 Executing indexed scan for (max(-9939393), max(0)] | 08:36:55,247 | 127.0.0.1 
|   2956
 Executing single-partition query on foo.flag_index | 08:36:55,247 | 127.0.0.1 
|   3192
   Acquiring sstable references | 08:36:55,247 | 127.0.0.1 
|   3220
  Merging memtable contents | 08:36:55,247 | 127.0.0.1 
|   3265
   Scanned 0 rows and matched 0 | 08:36:55,247 | 127.0.0.1 
|   3396
   Request complete | 08:36:55,247 | 127.0.0.1 
|   3644


It reads from the secondary index and discards keys that are outside of the 
token range. 

Cheers
 

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/01/2013, at 4:24 PM, Mike Sample mike.sam...@gmail.com wrote:

 Does the following FAQ entry hold even when the partion key is also 
 constrained in the query (by token())?
 
 http://wiki.apache.org/cassandra/SecondaryIndexes:
 ==
Q: How does choice of Consistency Level affect cluster availability when 
 using secondary indexes?
 
A: Because secondary indexes are distributed, you must have CL nodes 
 available for all token ranges in the cluster in order to complete a query. 
 For example, with RF = 3, when two out of three consecutive nodes in the ring 
 are unavailable, all secondary index queries at CL = QUORUM will fail, 
 however secondary index queries at CL = ONE will succeed. This is true 
 regardless of cluster size.
 ==
 
 For example:
 
 CREATE TABLE foo (
 id uuid,  
 seq_num bigint, 
 flag boolean, 
 some_other_data blob,
 PRIMARY KEY (id,seq_num) 
 );
 
 CREATE INDEX flag_index ON foo (flag);
 
 SELECT id, flag from foo WHERE TOKEN(id)  '-9939393' AND TOKEN(id) = '0' 
 AND flag=true;
 
 Would the above query with LOCAL_QUORUM succeed given the following? IE is 
 the token range used first trim node selection?
 
 * the cluster has 18 nodes
 * foo is in a keyspace with a replication factor of 3 for that data center
 * 2 nodes in one of the replication groups are down
 * the token range in the query is not in the range of the down nodes
 
 
 Thanks in advance!



Re: Issues with CQLSH in Cassandra 1.2

2013-01-28 Thread aaron morton
I was able to replicate it…

$ bin/nodetool -h 127.0.0.1 -p 7100  describering foo
Schema Version:253da4a3-e277-35b5-8d04-dbeeb3c9508e
TokenRange: 
TokenRange(start_token:3074457345618258602, 
end_token:-9223372036854775808, endpoints:[], rpc_endpoints:[], 
endpoint_details:[])
TokenRange(start_token:-3074457345618258603, 
end_token:3074457345618258602, endpoints:[], rpc_endpoints:[], 
endpoint_details:[])
TokenRange(start_token:-9223372036854775808, 
end_token:-3074457345618258603, endpoints:[], rpc_endpoints:[], 
endpoint_details:[])


Will dig into it later on to see if it's a bug. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 25/01/2013, at 5:35 PM, Gabriel Ciuloaica gciuloa...@gmail.com wrote:

 Hi Aaron,
 
 I'm using PropertyFileSnitch, an my cassandra-topology.propertis looks like 
 this:
 
 # Cassandra Node IP=Data Center:Rack
 
 # default for unknown nodes
 default=DC1:RAC1
 
 # all known nodes
   10.11.1.108=DC1:RAC1
   10.11.1.109=DC1:RAC2
   10.11.1.200=DC1:RAC3
 
 Cheers,
 Gabi
 
 
 
 
 On 1/25/13 4:38 AM, aaron morton wrote:
 Can you provide details of the snitch configuration and the number of nodes 
 you have? 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 25/01/2013, at 9:39 AM, Gabriel Ciuloaica gciuloa...@gmail.com wrote:
 
 Hi Tyler,
 
 No, it was just a typo in the email, I changed names of DC in the email 
 after copy/paste from output of the tools.
 It is quite easy to reproduce (assuming you have a correct configuration 
 for NetworkTopologyStrategy, with vNodes(default, 256)):
 
 1. launch cqlsh and create the keyspace
 
 create keyspace foo with replication= 
 {'class':'NetworkTopologyStrategy','DC1':3};
 
 2. exit cqlsh, run
 
 nodetool describering foo
 
 you'll see something like this:
 
 TokenRange(start_token:2318224911779291128, end_token:2351629206880900296, 
 endpoints:[], rpc_endpoints:[], endpoint_details:[])
 TokenRange(start_token:-8291638263612363845, 
 end_token:-8224756763869823639, endpoints:[], rpc_endpoints:[], 
 endpoint_details:[])
 
 3. start  cqlsh, 
 
 drop keyspace foo;
 
 4. Exit cqlsh, start cassandra-cli
 create keyspace foo with placement_strategy = 'NetworkTopologyStrategy' AND 
 strategy_options={DC1};
 
 if you run nodetool describering foo you'll see:
 
 TokenRange(start_token:2318224911779291128, 
 end_token:2351629206880900296, endpoints:[10.11.1.200, 10.11.1.109, 
 10.11.1.108], rpc_endpoints:[10.11.1.200, 10.11.1.109, 10.11.1.108], 
 endpoint_details:[EndpointDetails(host:10.11.1.200, datacenter:DC1, 
 rack:RAC3), EndpointDetails(host:10.11.1.109, datacenter:DC1, rack:RAC2), 
 EndpointDetails(host:10.11.1.108, datacenter:DC1, rack:RAC1)])
 TokenRange(start_token:-8291638263612363845, 
 end_token:-8224756763869823639, endpoints:[10.11.1.200, 10.11.1.109, 
 10.11.1.108], rpc_endpoints:[10.11.1.200, 10.11.1.109, 10.11.1.108], 
 endpoint_details:[EndpointDetails(host:10.11.1.200, datacenter:DC1, 
 rack:RAC3), EndpointDetails(host:10.11.1.109, datacenter:DC1, rack:RAC2), 
 EndpointDetails(host:10.11.1.108, datacenter:DC1, rack:RAC1)])
 
 Br,
 Gabi
 
 
 On 1/24/13 10:22 PM, Tyler Hobbs wrote:
 Gabriel,
 
 It looks like you used DC1 for the datacenter name in your replication 
 strategy options, while the actual datacenter name was DC-1 (based on 
 the nodetool status output).  Perhaps that was causing the problem?
 
 
 On Thu, Jan 24, 2013 at 1:57 PM, Gabriel Ciuloaica gciuloa...@gmail.com 
 wrote:
 I do not think that  it has anything to do with Astyanax, but after I have 
 recreated the keyspace with cassandra-cli, everything is   
 working fine.
 Also, I have mention below that not even nodetool describering foo, did 
 not showed correct information for the tokens, encoding_details, if the 
 keyspace was created with cqlsh.
 
 Thanks,
 Gabi
 
 
 On 1/24/13 9:21 PM, Ivan Velykorodnyy wrote:
 Hi,
 
 Astyanax is not 1.2 compatible yet 
 https://github.com/Netflix/astyanax/issues/191
 Eran planned to make it in 1.57.x
 
 четверг, 24 января 2013 г. пользователь Gabriel Ciuloaica писал:
 Hi,
 
 I have spent half of the day today trying to make a new Cassandra cluster 
 to work. I have setup a single data center cluster, using 
 NetworkTopologyStrategy, DC1:3.
 I'm using latest version of Astyanax client to connect. After many hours 
 of debug, I found out that the problem may be in cqlsh utility.
 
 So, after the cluster was up and running:
 [me@cassandra-node1 cassandra]$ nodetool status
 Datacenter: DC-1
 ==
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns (effective)  Host ID
Rack
 UN  10.11.1.109   59.1 KB256 0.0%  
 726689df-edc3-49a0-b680-370953994a8c  RAC2
 UN  10.11.1.108   67.49 KB  

Re: Cassandra pending compaction tasks keeps increasing

2013-01-28 Thread Derek Williams
I could be wrong about this, but when repair is run, it isn't just values
that are streamed between nodes, it's entire sstables. This causes a lot of
duplicate data to be written which was already correct on the node, which
needs to be compacted away.

As for speeding it up, no idea.


On Mon, Jan 28, 2013 at 12:16 PM, Wei Zhu wz1...@yahoo.com wrote:

 Any thoughts?

 Thanks.
 -Wei

 - Original Message -
 From: Wei Zhu wz1...@yahoo.com
 To: user@cassandra.apache.org
 Sent: Friday, January 25, 2013 10:09:37 PM
 Subject: Re: Cassandra pending compaction tasks keeps increasing


 To recap the problem,
 1.1.6 on SSD, 5 nodes, RF = 3, one CF only.
 After data load, initially all 5 nodes have very even data size (135G,
 each). I ran nodetool repair -pr on node 1 which have replicates on node 2,
 node 3 since we set RF = 3.
 It appears that huge amount of data got transferred. Node 1 has 220G, node
 2, 3 have around 170G. Pending LCS task on node 1 is 15K and node 2, 3 have
 around 7K each.
 Questions:

 * Why nodetool repair increases the data size that much? It's not
 likely that much data needs to be repaired. Will that happen for all the
 subsequent repair?
 * How to make LCS run faster? After almost a day, the LCS tasks only
 dropped by 1000. I am afraid it will never catch up. We set


 * compaction_throughput_mb_per_sec = 500
 * multithreaded_compaction: true


 Both Disk and CPU util are less than 10%. I understand LCS is single
 threaded, any chance to speed it up?


 * We use default SSTable size as 5M, Will increase the size of SSTable
 help? What will happen if I change the setting after the data is loaded.

 Any suggestion is very much appreciated.

 -Wei

 - Original Message -

 From: Wei Zhu wz1...@yahoo.com
 To: user@cassandra.apache.org
 Sent: Thursday, January 24, 2013 11:46:04 PM
 Subject: Re: Cassandra pending compaction tasks keeps increasing

 I believe I am running into this one:

 https://issues.apache.org/jira/browse/CASSANDRA-4765

 By the way, I am using 1.1.6 (I though I was using 1.1.7) and this one is
 fixed in 1.1.7.

 - Original Message -

 From: Wei Zhu wz1...@yahoo.com
 To: user@cassandra.apache.org
 Sent: Thursday, January 24, 2013 11:18:59 PM
 Subject: Re: Cassandra pending compaction tasks keeps increasing

 Thanks Derek,
 in the cassandra-env.sh, it says

 # reduce the per-thread stack size to minimize the impact of Thrift
 # thread-per-client. (Best practice is for client connections to
 # be pooled anyway.) Only do so on Linux where it is known to be
 # supported.
 # u34 and greater need 180k
 JVM_OPTS=$JVM_OPTS -Xss180k

 What value should I use? Java defaults at 400K? Maybe try that first.

 Thanks.
 -Wei

 - Original Message -
 From: Derek Williams de...@fyrie.net
 To: user@cassandra.apache.org, Wei Zhu wz1...@yahoo.com
 Sent: Thursday, January 24, 2013 11:06:00 PM
 Subject: Re: Cassandra pending compaction tasks keeps increasing


 Increasing the stack size in cassandra-env.sh should help you get past the
 stack overflow. Doesn't help with your original problem though.



 On Fri, Jan 25, 2013 at 12:00 AM, Wei Zhu  wz1...@yahoo.com  wrote:


 Well, even after restart, it throws the the same exception. I am basically
 stuck. Any suggestion to clear the pending compaction tasks? Below is the
 end of stack trace:

 at com.google.common.collect.Sets$1.iterator(Sets.java:578)
 at com.google.common.collect.Sets$1.iterator(Sets.java:578)
 at com.google.common.collect.Sets$1.iterator(Sets.java:578)
 at com.google.common.collect.Sets$1.iterator(Sets.java:578)
 at com.google.common.collect.Sets$3.iterator(Sets.java:667)
 at com.google.common.collect.Sets$3.size(Sets.java:670)
 at com.google.common.collect.Iterables.size(Iterables.java:80)
 at
 org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:557)
 at
 org.apache.cassandra.db.compaction.CompactionController.init(CompactionController.java:69)
 at
 org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:105)
 at
 org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
 at
 org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)

 Any suggestion is very much appreciated

 -Wei



 - Original Message -
 From: Wei Zhu  wz1...@yahoo.com 
 To: user@cassandra.apache.org
 Sent: Thursday, January 24, 2013 10:55:07 PM
 Subject: Re: Cassandra pending compaction tasks keeps increasing

 Do you mean 90% of 

Re: cql: show tables in a keystone

2013-01-28 Thread Theo Hultberg
the DESCRIBE family of commands in cqlsh are wrappers around queries to the
system keyspace, so if you want to inspect what keyspaces and tables exist
from your application you can do something like:

SELECT columnfamily_name, comment
FROM system.schema_columnfamilies
WHERE keyspace_name = 'test';

or

SELECT * FROM system.schema_keyspaces;

T#


On Mon, Jan 28, 2013 at 8:35 PM, Brian O'Neill b...@alumni.brown.eduwrote:


 cqlsh use keyspace;
 cqlsh:cirrus describe tables;

 For more info:
 cqlsh help describe

 -brian


 ---
 Brian O'Neill
 Lead Architect, Software Development
 Health Market Science
 The Science of Better Results
 2700 Horizon Drive € King of Prussia, PA € 19406
 M: 215.588.6024 € @boneill42 http://www.twitter.com/boneill42  €
 healthmarketscience.com

 This information transmitted in this email message is for the intended
 recipient only and may contain confidential and/or privileged material. If
 you received this email in error and are not the intended recipient, or
 the person responsible to deliver it to the intended recipient, please
 contact the sender at the email above and delete this email and any
 attachments and destroy any copies thereof. Any review, retransmission,
 dissemination, copying or other use of, or taking any action in reliance
 upon, this information by persons or entities other than the intended
 recipient is strictly prohibited.







 On 1/28/13 2:27 PM, Paul van Hoven paul.van.ho...@googlemail.com
 wrote:

 Is there some way in cql to get a list of all tables or column
 families that belong to a keystore like show tables in sql?





Re: why set replica placement strategy at keyspace level ?

2013-01-28 Thread Hiller, Dean
If you write to 4 CF's with the same row key that is considered one
mutation

Hm, I never considered this, never knew either.(very un-intuitive from
a user perspective IMHO).  So If I write to CF Users with rowkey=dean
and to CF Schedules with rowkey=dean, it is actually one row?  (it's so
un-intuitive that I had to ask to make sure I am reading that correctly).

I guess I really don't have that case since most of my row keys are GUID's
anyways, but very interesting and unexpected (not sure I really mind, was
just taken aback)

Ps. Not sure I ever minded losting atomic commits to the same row across
CF's as I never expected it in the first place having used cassandra for
more than a year.(must have missed that several times in the
documentation).

Thanks,
Dean

On 1/28/13 12:41 PM, aaron morton aa...@thelastpickle.com wrote:

 
 Another thing that's been confusing me is that when we talk about the
data model should the row key be inside or outside a column family?
My mental model is:

cluster == database
keyspace == table
row == a row in a table
CF == a family of columns in one row

(I think that's different to others, but it works for me)

 Is it important to store rows of different column families that share
the same row key to the same node?
Makes the failure models a little easier to understand. e.g. Everything
key for user amorton is either available or not.

 Meanwhile, what's the drawback of setting RPS and RF at column family
level?
Other than it's baked in?

We process all mutations for a row at the same time. If you write to 4
CF's with the same row key that is considered one mutation, for one row.
That one RowMutation is directed to the replicas using the
ReplicationStratagy and atomically applied to the commit log.

If you have RS per CF that one mutation would be split into 4, which
would then be sent to different replicas. Even if they went to the same
replicas they would be written to the commit log as different mutations.

So if you have RS per CF you lose atomic commits for writes to the same
row.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/01/2013, at 11:22 PM, Manu Zhang owenzhang1...@gmail.com wrote:

 On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:
 The row is the unit of replication, all values with the same storage
engine row key in a KS are on the same nodes. if they were per CF this
would not hold.
 
 Not that it would be the end of the world, but that is the first thing
that comes to mind.
 
 Cheers
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 27/01/2013, at 4:15 PM, Manu Zhang owenzhang1...@gmail.com wrote:
 
 Although I've got to know Cassandra for quite a while, this question
only has occurred to me recently:
 
 Why are the replica placement strategy and replica factors set at the
keyspace level?
 
 Would setting them at the column family level offers more flexibility?
 
 Is this because it's easier for user to manage an application? Or
related to internal implementation? Or it's just that I've overlooked
something?
 
 
 Is it important to store rows of different column families that share
the same row key to the same node? AFAIK, Cassandra doesn't support get
all of them in a single call.
 
 Meanwhile, what's the drawback of setting RPS and RF at column family
level?
 
 Another thing that's been confusing me is that when we talk about the
data model should the row key be inside or outside a column family?
 
 Thanks
 




Problem on node join the ring

2013-01-28 Thread Daning Wang
I add a new node to ring(version 1.1.6), after more than 30 hours, it is
still in the 'Joining' state

Address DC  RackStatus State   Load
 Effective-Ownership Token

   141784319550391026443072753096570088105
10.28.78.123datacenter1 rack1   Up Normal  18.73 GB
 50.00%  0
10.4.17.138 datacenter1 rack1   Up Normal  15 GB
39.29%  24305883351495604533098186245126300818
10.93.95.51 datacenter1 rack1   Up Normal  17.96 GB
 41.67%  42535295865117307932921825928971026432
10.170.1.26 datacenter1 rack1   Up Joining 6.89 GB
0.00%   56713727820156410577229101238628035242
10.6.115.239datacenter1 rack1   Up Normal  20.3 GB
50.00%  85070591730234615865843651857942052864
10.28.20.200datacenter1 rack1   Up Normal  22.68 GB
 60.71%  127605887595351923798765477786913079296
10.240.113.171  datacenter1 rack1   Up Normal  18.4 GB
58.33%  141784319550391026443072753096570088105


since after a while, the cpu usage goes down to 0, looks it is stuck. I
have restarted server several times in last 30 hours. when server is just
started, you can see streaming in 'nodetool netstats', but after a few
minutes, there is no streaming anymore

I have turned on the debug, this is what it is doing now(cpu is pretty much
idle), no any error message.

Please help, I can provide more info if needed.

Thanks in advance,


DEBUG [MutationStage:17] 2013-01-28 12:47:59,618
RowMutationVerbHandler.java (line 44) Applying RowMutation(keyspace='dsat',
key='52f5298affbb8bf0', modifications=[ColumnFamily(dsatcache
[_meta:false:278@1359406079725000!3888000,])])
DEBUG [MutationStage:17] 2013-01-28 12:47:59,618 Table.java (line 395)
applying mutation of row 52f5298affbb8bf0
DEBUG [MutationStage:17] 2013-01-28 12:47:59,618
RowMutationVerbHandler.java (line 56) RowMutation(keyspace='dsat',
key='52f5298affbb8bf0', modifications=[ColumnFamily(dsatcache
[_meta:false:278@1359406079725000!3888000,])]) applied.  Sending response
to 571645593@/10.28.78.123
DEBUG [MutationStage:26] 2013-01-28 12:47:59,623
RowMutationVerbHandler.java (line 44) Applying RowMutation(keyspace='dsat',
key='57f700499922964b', modifications=[ColumnFamily(dsatcache
[cache_type:false:8@1359406079730002,path:false:30@1359406079730001
,top_node:false:22@135940607973,v0:false:976@1359406079730003
!3888000,])])
DEBUG [MutationStage:26] 2013-01-28 12:47:59,623 Table.java (line 395)
applying mutation of row 57f700499922964b
DEBUG [MutationStage:26] 2013-01-28 12:47:59,623 Table.java (line 429)
mutating indexed column top_node value
6d617474626f7574726f732e74756d626c722e636f6d
DEBUG [MutationStage:26] 2013-01-28 12:47:59,623 CollationController.java
(line 78) collectTimeOrderedData
DEBUG [MutationStage:26] 2013-01-28 12:47:59,623 Table.java (line 453)
Pre-mutation index row is null
DEBUG [MutationStage:26] 2013-01-28 12:47:59,624 KeysIndex.java (line 119)
applying index row mattboutros.tumblr.com in
ColumnFamily(dsatcache.dsatcache_top_node_idx
[57f700499922964b:false:0@135940607973,])
DEBUG [MutationStage:26] 2013-01-28 12:47:59,624
RowMutationVerbHandler.java (line 56) RowMutation(keyspace='dsat',
key='57f700499922964b', modifications=[ColumnFamily(dsatcache
[cache_type:false:8@1359406079730002,path:false:30@1359406079730001
,top_node:false:22@135940607973,v0:false:976@1359406079730003!3888000,])])
applied.  Sending response to 710680715@/10.28.20.200
DEBUG [MutationStage:22] 2013-01-28 12:47:59,624
RowMutationVerbHandler.java (line 44) Applying RowMutation(keyspace='dsat',
key='57f700499922964b', modifications=[ColumnFamily(dsatcache
[_meta:false:278@1359406079731000!3888000,])])
DEBUG [MutationStage:22] 2013-01-28 12:47:59,624 Table.java (line 395)
applying mutation of row 57f700499922964b
DEBUG [MutationStage:22] 2013-01-28 12:47:59,624
RowMutationVerbHandler.java (line 56) RowMutation(keyspace='dsat',
key='57f700499922964b', modifications=[ColumnFamily(dsatcache
[_meta:false:278@1359406079731000!3888000,])]) applied.  Sending response
to 710680719@/10.28.20.200
DEBUG [MutationStage:25] 2013-01-28 12:47:59,652
RowMutationVerbHandler.java (line 44) Applying RowMutation(keyspace='dsat',
key='2a50083d5332071f', modifications=[ColumnFamily(dsatcache
[cache_type:false:8@1359406079692002,path:false:26@1359406079692001
,top_node:false:18@1359406079692000,v0:false:583@1359406079692003
!3888000,])])
DEBUG [MutationStage:25] 2013-01-28 12:47:59,652 Table.java (line 395)
applying mutation of row 2a50083d5332071f
DEBUG [MutationStage:25] 2013-01-28 12:47:59,652 Table.java (line 429)
mutating indexed column top_node value 772e706163696669632d72652e636f6d
DEBUG [MutationStage:25] 2013-01-28 12:47:59,652 CollationController.java
(line 78) collectTimeOrderedData
DEBUG [MutationStage:25] 2013-01-28 12:47:59,652 Table.java (line 453)
Pre-mutation index row is null
DEBUG [MutationStage:25] 2013-01-28 

Understanding Virtual Nodes on Cassandra 1.2

2013-01-28 Thread Zhong Li
Hi All,

Virtual Nodes is great feature. After I searched some document on Datastax 
website and some old ticket, seems that it works for random partitioner only, 
and leaves order preserved partitioner out of the luck. I may misunderstand, 
please correct me. if it doesn't love order preserved partitioner, would be 
possible to add support multiple initial_token(s) for  order preserved 
partitioner  or allow add Virtual Nodes manually? 

Thanks,

Zhong 

JNA not found.

2013-01-28 Thread Tim Dunphy
Hey List,

 I just downloaded 1.21 and have set it up across my cluster. When I
noticed the following notice:

 INFO 18:14:53,828 JNA not found. Native methods will be disabled.

So I downloaded jna.jar from git hub and moved it to the cassandra /lib
directory. I changed mod to 755 as per the datastax docs. I've also tried
installing the jna package (via yum, I am using centos 6.2). Nothing seems
to do the trick, I keep getting this message. What can I do to get
cassandra 1.2.1 to recognize JNA?

Thanks
Tim
-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: What is the default 'key_validation_class' on secondary INDEX(es)

2013-01-28 Thread Alan Ristić
2013/1/28 Sylvain Lebresne sylv...@datastax.com

 If you are asking for the key_validation_class of the Index CF, then it's
 the column type that defines it


Sylvain, that was the one I meant, great. Tnx for explanation.


*Alan Ristić*

*m*: 040 423 688


1.2 Authentication

2013-01-28 Thread Daning Wang
We were using SimpleAuthenticator on 1.1.x, it worked fine.

While testing 1.2, I have put classes under example/simple_authentication
in a jar and copy to lib directory, the class is loaded. however, when I
try to connect with correct user/password, it gives me error

./cqlsh s2.dsat103-e1a -u  -p 
Traceback (most recent call last):
  File ./cqlsh, line 2262, in module
main(*read_options(sys.argv[1:], os.environ))
  File ./cqlsh, line 2248, in main
display_float_precision=options.float_precision)
  File ./cqlsh, line 483, in __init__
cql_version=cqlver, transport=transport)
  File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/connection.py,
line 143, in connect
  File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/connection.py,
line 59, in __init__
  File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/thrifteries.py,
line 157, in establish_connection
  File
./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/cassandra/Cassandra.py,
line 455, in login
  File
./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/cassandra/Cassandra.py,
line 476, in recv_login
cql.cassandra.ttypes.AuthenticationException:
AuthenticationException(why=User  doesn't exist - create it with
CREATE USER query first)


What does create it with CREATE USER query first mean?

I put debug information in SimpleAuthenticator class, that showed
authentication is passed in the authenticate() method.

Thanks,

Daning


Re: Node selection when both partition key and secondary index field constrained?

2013-01-28 Thread Mike Sample
Thanks Aaron.   So basically it's merging the results 2 separate queries:
Indexed scan (token-range) intersect foo.flag_index=true where the
latter query hits the entire cluster as per the secondary index FAQ
entry.   Thus the overall query would fail if LOCAL_QUORUM was requested,
RF=3 and 2 nodes in a given replication group were down. Darn.  Is there
any way of efficiently getting around this (ie scope the query to just the
nodes in the token range)?




On Mon, Jan 28, 2013 at 11:44 AM, aaron morton aa...@thelastpickle.comwrote:

 It uses the index...

 cqlsh:dev tracing on;
 Now tracing requests.
 cqlsh:dev
 cqlsh:dev
 cqlsh:dev SELECT id, flag from foo WHERE TOKEN(id)  '-9939393' AND
 TOKEN(id) = '0' AND flag=true;

 Tracing session: 128cab90-6982-11e2-8cd1-51eaa232562e

  activity   | timestamp|
 source| source_elapsed

 +--+---+
  execute_cql3_query | 08:36:55,244 |
 127.0.0.1 |  0
   Parsing statement | 08:36:55,244 |
 127.0.0.1 |600
  Peparing statement | 08:36:55,245 |
 127.0.0.1 |   1408
   Determining replicas to query | 08:36:55,246 |
 127.0.0.1 |   1924
  Executing indexed scan for (max(-9939393), max(0)] | 08:36:55,247 |
 127.0.0.1 |   2956
  Executing single-partition query on foo.flag_index | 08:36:55,247 |
 127.0.0.1 |   3192
Acquiring sstable references | 08:36:55,247 |
 127.0.0.1 |   3220
   Merging memtable contents | 08:36:55,247 |
 127.0.0.1 |   3265
Scanned 0 rows and matched 0 | 08:36:55,247 |
 127.0.0.1 |   3396
Request complete | 08:36:55,247 |
 127.0.0.1 |   3644


 It reads from the secondary index and discards keys that are outside of
 the token range.

 Cheers


 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 28/01/2013, at 4:24 PM, Mike Sample mike.sam...@gmail.com wrote:

  Does the following FAQ entry hold even when the partion key is also
 constrained in the query (by token())?
 
  http://wiki.apache.org/cassandra/SecondaryIndexes:
  ==
 Q: How does choice of Consistency Level affect cluster availability
 when using secondary indexes?
 
 A: Because secondary indexes are distributed, you must have CL nodes
 available for all token ranges in the cluster in order to complete a query.
 For example, with RF = 3, when two out of three consecutive nodes in the
 ring are unavailable, all secondary index queries at CL = QUORUM will fail,
 however secondary index queries at CL = ONE will succeed. This is true
 regardless of cluster size.
  ==
 
  For example:
 
  CREATE TABLE foo (
  id uuid,
  seq_num bigint,
  flag boolean,
  some_other_data blob,
  PRIMARY KEY (id,seq_num)
  );
 
  CREATE INDEX flag_index ON foo (flag);
 
  SELECT id, flag from foo WHERE TOKEN(id)  '-9939393' AND TOKEN(id) =
 '0' AND flag=true;
 
  Would the above query with LOCAL_QUORUM succeed given the following? IE
 is the token range used first trim node selection?
 
  * the cluster has 18 nodes
  * foo is in a keyspace with a replication factor of 3 for that data
 center
  * 2 nodes in one of the replication groups are down
  * the token range in the query is not in the range of the down nodes
 
 
  Thanks in advance!




Re: JNA not found.

2013-01-28 Thread Tim Dunphy
I went to github to try to download jna again. I downloaded version 3.5.1

[root@cassandra-node01 cassandrahome]# ls -l lib/jna-3.5.1.jar
-rw-r--r-- 1 root root 692603 Jan 28 21:57 lib/jna-3.5.1.jar

I noticed in the datastax docs that java 7 was not recommended so I
downgraded to java 6

[root@cassandra-node01 cassandrahome]# java -version
java version 1.6.0_34
Java(TM) SE Runtime Environment (build 1.6.0_34-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.9-b04, mixed mode)

And now if I try to start cassandra with that library it fails with this
message:

[root@cassandra-node01 cassandrahome]# ./bin/cassandra -f
xss =  -ea -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M
-Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
 INFO 22:00:14,318 Logging initialized
 INFO 22:00:14,333 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
VM/1.6.0_34
 INFO 22:00:14,334 Heap size: 301727744/302776320
 INFO 22:00:14,334 Classpath:
/etc/alternatives/cassandrahome/conf:/etc/alternatives/cassandrahome/build/classes/main:/etc/alternatives/cassandrahome/build/classes/thrift:/etc/alternatives/cassandrahome/lib/antlr-3.2.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-clientutil-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-thrift-1.2.1.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-fixes.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-sources-fixes.jar:/etc/alternatives/cassandrahome/lib/commons-cli-1.1.jar:/etc/alternatives/cassandrahome/lib/commons-codec-1.2.jar:/etc/alternatives/cassandrahome/lib/commons-lang-2.6.jar:/etc/alternatives/cassandrahome/lib/compress-lzf-0.8.4.jar:/etc/alternatives/cassandrahome/lib/concurrentlinkedhashmap-lru-1.3.jar:/etc/alternatives/cassandrahome/lib/guava-13.0.1.jar:/etc/alternatives/cassandrahome/lib/high-scale-lib-1.1.2.jar:/etc/alternatives/cassandrahome/lib/jackson-core-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jackson-mapper-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar:/etc/alternatives/cassandrahome/lib/jline-1.0.jar:/etc/alternatives/cassandrahome/lib/jna-3.5.1.jar:/etc/alternatives/cassandrahome/lib/json-simple-1.1.jar:/etc/alternatives/cassandrahome/lib/libthrift-0.7.0.jar:/etc/alternatives/cassandrahome/lib/log4j-1.2.16.jar:/etc/alternatives/cassandrahome/lib/metrics-core-2.0.3.jar:/etc/alternatives/cassandrahome/lib/netty-3.5.9.Final.jar:/etc/alternatives/cassandrahome/lib/servlet-api-2.5-20081211.jar:/etc/alternatives/cassandrahome/lib/slf4j-api-1.7.2.jar:/etc/alternatives/cassandrahome/lib/slf4j-log4j12-1.7.2.jar:/etc/alternatives/cassandrahome/lib/snakeyaml-1.6.jar:/etc/alternatives/cassandrahome/lib/snappy-java-1.0.4.1.jar:/etc/alternatives/cassandrahome/lib/snaptree-0.1.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
Killed

I move the library back out of the lib directory and cassandra starts again
albeit without JNA working quite naturally.


Both my cassandra and java installs are tarball installs.

Thanks
Tim

On Mon, Jan 28, 2013 at 6:29 PM, Tim Dunphy bluethu...@gmail.com wrote:

 Hey List,

  I just downloaded 1.21 and have set it up across my cluster. When I
 noticed the following notice:

  INFO 18:14:53,828 JNA not found. Native methods will be disabled.

 So I downloaded jna.jar from git hub and moved it to the cassandra /lib
 directory. I changed mod to 755 as per the datastax docs. I've also tried
 installing the jna package (via yum, I am using centos 6.2). Nothing seems
 to do the trick, I keep getting this message. What can I do to get
 cassandra 1.2.1 to recognize JNA?

 Thanks
 Tim
 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B




-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


RE: Accessing Metadata of Column Familes

2013-01-28 Thread Rishabh Agrawal
Any points on the same.

- Rishabh
- Reply message -
From: Rishabh Agrawal rishabh.agra...@impetus.co.in
To: user@cassandra.apache.org user@cassandra.apache.org
Subject: Accessing Metadata of Column Familes
Date: Mon, Jan 28, 2013 5:56 pm

I found following issues while working on Cassandra version 1.2, CQL 3 and 
Thrift protocol 19.35.0.

Case 1:
Using CQL I created a table t1 with columns col1 and col2 with col1 being my 
primary key.

When I access same data using CLI, I see col1 gets adopted as rowkey and col2 
being another column. Now I have inserted value in another column (col3) in 
same row using CLI.  Now when I query same table again from CQL I am unable to 
find col3.

Case 2:

Using CLI, I have created table t2. Now I added a row key  row1 and two columns 
(keys)  col1 and col2 with some values in each. When I access t2 from CQL I 
find following resultset with three columns:

  key | column1 | value
row1| col1  | val1
row1| col2  | val2


This behavior raises certain questions:


* What is the reason for such schema anomaly or is this a problem?

* Which schema should be deemed as correct or consistent?

* How to access meta data on the same?



Thanks and Regards
Rishabh Agrawal


From: Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com]
Sent: Monday, January 28, 2013 12:57 PM
To: user@cassandra.apache.org
Subject: RE: Accessing Metadata of Column Familes

You can get storage attributes from /data/system/ keyspace.

From: Rishabh Agrawal [mailto:rishabh.agra...@impetus.co.in]
Sent: Monday, January 28, 2013 12:42 PM
To: user@cassandra.apache.org
Subject: RE: Accessing Metadata of Column Familes

Thank for the reply.

I do not want to go by API route. I wish to access files and column families 
which store meta data information

From: Harshvardhan Ojha [mailto:harshvardhan.o...@makemytrip.com]
Sent: Monday, January 28, 2013 12:25 PM
To: user@cassandra.apache.org
Subject: RE: Accessing Metadata of Column Familes

Which API are you using?
If you are using Hector use ColumnFamilyDefinition.

Regards
Harshvardhan OJha

From: Rishabh Agrawal [mailto:rishabh.agra...@impetus.co.in]
Sent: Monday, January 28, 2013 12:16 PM
To: user@cassandra.apache.org
Subject: Accessing Metadata of Column Familes

Hello,

I wish to access metadata information on column families. How can I do it? Any 
ideas?

Thanks and Regards
Rishabh Agrawal









NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.
The contents of this email, including the attachments, are PRIVILEGED AND 
CONFIDENTIAL to the intended recipient at the email address to which it has 
been addressed. If you receive it in error, please notify the sender 
immediately by return email and then permanently delete it from your system. 
The unauthorized use, distribution, copying or alteration of this email, 
including the attachments, is strictly forbidden. Please note that neither 
MakeMyTrip nor the sender accepts any responsibility for viruses and it is your 
responsibility to scan the email and attachments (if any). No contracts may be 
concluded on behalf of MakeMyTrip by means of email communications.








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.
The contents of this email, including the attachments, are PRIVILEGED AND 
CONFIDENTIAL to the intended recipient at the email address to which it has 
been addressed. If you receive it in error, please notify the sender 
immediately by return email and then permanently delete it from your system. 
The unauthorized use, distribution, copying or alteration of this email, 
including the attachments, is strictly forbidden. Please note that neither 
MakeMyTrip nor the sender accepts any responsibility for viruses and it is your 
responsibility to scan the email and attachments (if any). No contracts may be 
concluded on behalf of MakeMyTrip by means of email communications.








NOTE: This message may contain information that is confidential, proprietary, 
privileged 

Re: data not shown up after some time

2013-01-28 Thread aaron morton
If you are seeing failed secondary index reads you may be seeing this 
https://issues.apache.org/jira/browse/CASSANDRA-5079

Cheers
  
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 3:31 AM, Matthias Zeilinger 
matthias.zeilin...@bwinparty.com wrote:

 Hi,
  
 No I have checked the TTL: 7776000
  
 Very interesting is, if I do a simple “list cf;” the data is shown, but it 
 I do a “get cf where index=’testvalue’;” it returns “0 Row Returned”.
  
 How can that be?
  
 Br,
 Matthias Zeilinger
 Production Operation – Shared Services
  
 P: +43 (0) 50 858-31185
 M: +43 (0) 664 85-34459
 E: matthias.zeilin...@bwinparty.com
  
 bwin.party services (Austria) GmbH
 Marxergasse 1B
 A-1030 Vienna
  
 www.bwinparty.com
  
 From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com] 
 Sent: Montag, 28. Jänner 2013 15:25
 To: user@cassandra.apache.org
 Subject: RE: data not shown up after some time
  
 Are you sure your app is setting TTL correctly?
 TTL is in seconds. For 90 days it have to be 90*24*60*60=7776000.
 What If you set by accident 777600 (10 times less) – that will be 9 days, 
 almost what you see.
  
 Best regards / Pagarbiai
 Viktor Jevdokimov
 Senior Developer
  
 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider
 Take a ride with Adform's Rich Media Suite
 image001.png
 image002.png
 
 Disclaimer: The information contained in this message and attachments is 
 intended solely for the attention and use of the named addressee and may be 
 confidential. If you are not the intended recipient, you are reminded that 
 the information remains the property of the sender. You must not use, 
 disclose, distribute, copy, print or rely on this e-mail. If you have 
 received this message in error, please contact the sender immediately and 
 irrevocably delete this message and any copies.
  
 From: Matthias Zeilinger [mailto:matthias.zeilin...@bwinparty.com] 
 Sent: Monday, January 28, 2013 15:57
 To: user@cassandra.apache.org
 Subject: data not shown up after some time
  
 Hi,
  
 I´m a simple operations guy and new to Cassandra.
 I have the problem that one of our application is writing data into Cassandra 
 (but not deleting them, because we should have a 90 days TTL).
 The application operates in 1 KS with 5 CF. my current setup:
  
 3 node cluster and KS has a RF of 3 (I know it´s not the best setup)
  
 I can see now the problem that after 10 days most (nearly all) data are not 
 showing anymore in the cli and also our application cannot see the data.
 I assume that it has something to do with the gc_grace_seconds, it is set to 
 10 days.
  
 I have read many documentations about tombstones, but our application doesn´t 
 perform deletes.
 How can I see in the cli, if I row key has any tombstone or not.
  
 Could it be that there are some ghost tombstones?
  
 Thx for your help
  
 Br,
 Matthias Zeilinger
 Production Operation – Shared Services
  
 P: +43 (0) 50 858-31185
 M: +43 (0) 664 85-34459
 E: matthias.zeilin...@bwinparty.com
  
 bwin.party services (Austria) GmbH
 Marxergasse 1B
 A-1030 Vienna
  
 www.bwinparty.com
  



getting error for decimal type data

2013-01-28 Thread Kuldeep Mishra
while I an trying to list column family data using cassandra-cli then I am
getting following problem for decimal type data,
any suggestion will be appreciated.

Exception in thread main java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.init(AbstractStringBuilder.java:45)
at java.lang.StringBuilder.init(StringBuilder.java:80)
at java.math.BigDecimal.getValueString(BigDecimal.java:2885)
at java.math.BigDecimal.toPlainString(BigDecimal.java:2869)
at
org.apache.cassandra.cql.jdbc.JdbcDecimal.getString(JdbcDecimal.java:72)
at
org.apache.cassandra.db.marshal.DecimalType.getString(DecimalType.java:62)
at
org.apache.cassandra.cli.CliClient.printSliceList(CliClient.java:2873)
at org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1486)
at
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:272)
at
org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:210)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:337)


-- 
Thanks and Regards
Kuldeep Kumar Mishra
+919540965199


Re: Cassandra timeout whereas it is not much busy

2013-01-28 Thread aaron morton
  From what I could read there seems to be a contention issue around the 
 flushing (the switchlock ?). Cassandra would then be slow, but not using 
 the entire cpu. I would be in the strange situation I was where I reported my 
 issue in this thread.
 Does my theory makes sense ?
If you are seeing contention around the switch lock you will see a pattern in 
the logs where a Writing… message is immediately followed by an Enqueing… 
message. This happens when the flush_queue is full and the thread flushing 
(either because of memory, commit log or snapshot etc) is waiting. 

See the comments for memtable_flush_queue_size in the yaml file. 

If you increase the value you will flush more frequently as C* leaves for 
memory to handle the case where the queue is full. 

If you have spare IO you could consider increasing memtable_flush_writers

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 4:19 AM, Nicolas Lalevée nicolas.lale...@hibnet.org wrote:

 I did some testing, I have a theory.
 
 First, we have it seems a lot of CF. And two are particularly every hungry 
 in RAM, consuming a quite big amount of RAM for the bloom filters. Cassandra 
 do not force the flush of the memtables if it has more than 6G of Xmx 
 (luckily for us, this is the maximum reasonable we can give).
 Since our machines have 8G, this gives quite a little room for the disk 
 cache. Thanks to this systemtap script [1], I have seen that the hit ratio is 
 about 10%.
 
 Then I have tested with an Xmx at 4G. So %wa drops down. The disk cache ratio 
 raises to 80%. On the other hand, flushing is happening very often. I cannot 
 say how much, since I have too many CF to graph them all. But the ones I 
 graph, none of their memtable goes above 10M, whereas they usually go up to 
 200M.
 
 I have not tested further. Since it is quite obvious that the machines needs 
 more RAM. And they're about to receive more.
 
 But I guess that if I had to put more write and read pressure, with still an 
 xmx at 4G, the %wa would still be quite low, but the flushing would be even 
 more intensive. And I guess that it would go wrong. From what I could read 
 there seems to be a contention issue around the flushing (the switchlock 
 ?). Cassandra would then be slow, but not using the entire cpu. I would be in 
 the strange situation I was where I reported my issue in this thread.
 Does my theory makes sense ?
 
 Nicolas
 
 [1] http://sourceware.org/systemtap/wiki/WSCacheHitRate
 
 Le 23 janv. 2013 à 18:35, Nicolas Lalevée nicolas.lale...@hibnet.org a 
 écrit :
 
 Le 22 janv. 2013 à 21:50, Rob Coli rc...@palominodb.com a écrit :
 
 On Wed, Jan 16, 2013 at 1:30 PM, Nicolas Lalevée
 nicolas.lale...@hibnet.org wrote:
 Here is the long story.
 After some long useless staring at the monitoring graphs, I gave a try to
 using the openjdk 6b24 rather than openjdk 7u9
 
 OpenJDK 6 and 7 are both counter-recommended with regards to
 Cassandra. I've heard reports of mysterious behavior like the behavior
 you describe, when using OpenJDK 7.
 
 Try using the Sun/Oracle JVM? Is your JNA working?
 
 JNA is working.
 I tried both oracle-jdk6 and oracle-jdk7, no difference with openjdk6. And 
 since ubuntu is only maintaining openjdk, we'll stick with it until oracle's 
 one proven better.
 oracle vs openjdk, I tested for now under normal pressure though.
 
 What amaze me is whatever how much I google it and ask around, I still don't 
 know for sure the difference between the openjdk and oracle's jdk…
 
 Nicolas
 
 



Re: cluster issues

2013-01-28 Thread aaron morton
 We can always be proactive in keeping the time sync. But, Is there any way to 
 recover from a time drift (in a reactive manner)? Since it was a lab 
 environment, I dropped the KS (deleted data directory)
There is a way to remove future dated columns, but it not for the faint 
hearted. 

Basically:
1) Drop the gc_grace_seconds to 0
2) Delete the column with a timestamp way in the future, so it is guaranteed to 
be higher than the value you want to delete. 
3) Flush the CF
4) Compact all the SSTables that contain the row. The easiest way to do that is 
a major compaction, but we normally advise not to do that because it creates 
one big file. You can also do a user defined compaction. 

 Are there any other scenarios that would lead a cluster look like below? 
 Note:Actual topology of the cluster - ONE Cassandra node and TWO Analytic 
 nodes.
What snitch are you using?
If you have the property file snitch do all nodes have the same configuration ?

There is a lot of sickness there. If possible I would scrub and start again. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 6:29 AM, S C as...@outlook.com wrote:

 One of our node in a 3 node cluster drifted by ~ 20-25 seconds. While I 
 figured this pretty quickly, I had few questions that am looking for some 
 answers.
 
 We can always be proactive in keeping the time sync. But, Is there any way to 
 recover from a time drift (in a reactive manner)? Since it was a lab 
 environment, I dropped the KS (deleted data directory).
 Are there any other scenarios that would lead a cluster look like below? 
 Note:Actual topology of the cluster - ONE Cassandra node and TWO Analytic 
 nodes.
 
 
 On 192.168.2.100
 Address DC  RackStatus State   LoadOwns   
  Token   
   
  113427455640312821154458202477256070485 
 192.168.2.100  Cassandra   rack1   Up Normal  601.34 MB   33.33%  
 0   
 192.168.2.101  Analytics   rack1   Down   Normal  149.75 MB   33.33%  
 56713727820156410577229101238628035242  
 192.168.2.102  Analytics   rack1   Down   Normal  ?   33.33%  
 113427455640312821154458202477256070485   
 
 On 192.168.2.101
 Address DC  RackStatus State   LoadOwns   
  Token   
   
  113427455640312821154458202477256070485 
 192.168.2.100  Analytics   rack1   Down   Normal  ?   33.33%  
 0   
 192.168.2.101  Analytics   rack1   Up Normal  158.59 MB   33.33%  
 56713727820156410577229101238628035242  
 192.168.2.102  Analytics   rack1   Down   Normal  ?   33.33%  
 113427455640312821154458202477256070485
 
 On 192.168.2.102
 Address DC  RackStatus State   LoadOwns   
  Token   
   
  113427455640312821154458202477256070485 
 192.168.2.100  Analytics   rack1   Down   Normal  ?   33.33%  
 0   
 192.168.2.101  Analytics   rack1   Down   Normal  ?   33.33%  
 56713727820156410577229101238628035242  
 192.168.2.102  Analytics   rack1   Up Normal  117.02 MB   33.33%  
 113427455640312821154458202477256070485 
 
 
 Appreciate your valuable inputs.
 
 Thanks,
 SC



Re: JDBC, Select * Cql2 vs Cql3 problem ?

2013-01-28 Thread aaron morton
What is your table spec ? 
Do you have the full stack trace from the exception ? 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 8:15 AM, Andy Cobley acob...@computing.dundee.ac.uk wrote:

 I have the following code in my app using the JDBC (cassandra-jdbc-1.1.2.jar) 
 drivers to CQL:
 
 try {
   rs= stmt.executeQuery(SELECT * FROM users);
 }catch(Exception et){
   System.out.println(Can not execute statement +et);
 }
 
 When connecting to a CQL2 server (cassandra 1.1.5) the code works as expected 
 returning a result set .  When connecting to CQL3 (Cassandra 1.2) I catch the 
 following exception:
 
 Can not execute statement java.lang.NullPointerException
 
 The Select statement (Select * from users) does work from CQLSH as expected.  
 Is there a problem with my code or something else ?
 
 Andy C
 School of Computing
 University of Dundee.
 
 
 
 The University of Dundee is a Scottish Registered Charity, No. SC015096.



RE: data not shown up after some time

2013-01-28 Thread Matthias Zeilinger
How can I check for this secondary index read fails?
Is it in the system.log or over the nodetool?

Br,
Matthias Zeilinger
Production Operation - Shared Services

P: +43 (0) 50 858-31185
M: +43 (0) 664 85-34459
E: matthias.zeilin...@bwinparty.com

bwin.party services (Austria) GmbH
Marxergasse 1B
A-1030 Vienna

www.bwinparty.com

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Dienstag, 29. Jänner 2013 08:04
To: user@cassandra.apache.org
Subject: Re: data not shown up after some time

If you are seeing failed secondary index reads you may be seeing this 
https://issues.apache.org/jira/browse/CASSANDRA-5079

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 3:31 AM, Matthias Zeilinger 
matthias.zeilin...@bwinparty.commailto:matthias.zeilin...@bwinparty.com 
wrote:


Hi,

No I have checked the TTL: 7776000

Very interesting is, if I do a simple list cf; the data is shown, but it I 
do a get cf where index='testvalue'; it returns 0 Row Returned.

How can that be?

Br,
Matthias Zeilinger
Production Operation - Shared Services

P: +43 (0) 50 858-31185
M: +43 (0) 664 85-34459
E: matthias.zeilin...@bwinparty.commailto:matthias.zeilin...@bwinparty.com

bwin.party services (Austria) GmbH
Marxergasse 1B
A-1030 Vienna

www.bwinparty.comhttp://www.bwinparty.com

From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.comhttp://adform.com]
Sent: Montag, 28. Jänner 2013 15:25
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: data not shown up after some time

Are you sure your app is setting TTL correctly?
TTL is in seconds. For 90 days it have to be 90*24*60*60=7776000.
What If you set by accident 777600 (10 times less) - that will be 9 days, 
almost what you see.

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

image001.pnghttp://www.adform.com
image002.pnghttp://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Matthias Zeilinger 
[mailto:matthias.zeilin...@bwinparty.comhttp://bwinparty.com]
Sent: Monday, January 28, 2013 15:57
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: data not shown up after some time

Hi,

I´m a simple operations guy and new to Cassandra.
I have the problem that one of our application is writing data into Cassandra 
(but not deleting them, because we should have a 90 days TTL).
The application operates in 1 KS with 5 CF. my current setup:

3 node cluster and KS has a RF of 3 (I know it´s not the best setup)

I can see now the problem that after 10 days most (nearly all) data are not 
showing anymore in the cli and also our application cannot see the data.
I assume that it has something to do with the gc_grace_seconds, it is set to 10 
days.

I have read many documentations about tombstones, but our application doesn´t 
perform deletes.
How can I see in the cli, if I row key has any tombstone or not.

Could it be that there are some ghost tombstones?

Thx for your help

Br,
Matthias Zeilinger
Production Operation - Shared Services

P: +43 (0) 50 858-31185
M: +43 (0) 664 85-34459
E: matthias.zeilin...@bwinparty.commailto:matthias.zeilin...@bwinparty.com

bwin.party services (Austria) GmbH
Marxergasse 1B
A-1030 Vienna

www.bwinparty.comhttp://www.bwinparty.com




Re: Cassandra pending compaction tasks keeps increasing

2013-01-28 Thread aaron morton
 * Why nodetool repair increases the data size that much? It's not likely 
 that much data needs to be repaired. Will that happen for all the subsequent 
 repair?
Repair only detects differences in entire rows. If you have very wide rows then 
small differences in rows can result in a large amount of streaming. 
Streaming creates new SSTables on the receiving side, which then need to be 
compacted. So repair often results in compaction doing it's thing for a while. 

 * How to make LCS run faster? After almost a day, the LCS tasks only 
 dropped by 1000. I am afraid it will never catch up. We set

This is going to be tricky to diagnose, sorry for asking silly questions...

Do you have wide rows ? Are you seeing logging about Compacting wide rows ? 
Are you seeing GC activity logged or seeing CPU steal on a VM ? 
Have you tried disabling multithreaded_compaction ? 
Are you using Key Caches ? Have you tried disabling 
compaction_preheat_key_cache?
Can you enabled DEBUG level logging and make them available ?

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 8:59 AM, Derek Williams de...@fyrie.net wrote:

 I could be wrong about this, but when repair is run, it isn't just values 
 that are streamed between nodes, it's entire sstables. This causes a lot of 
 duplicate data to be written which was already correct on the node, which 
 needs to be compacted away.
 
 As for speeding it up, no idea.
 
 
 On Mon, Jan 28, 2013 at 12:16 PM, Wei Zhu wz1...@yahoo.com wrote:
 Any thoughts?
 
 Thanks.
 -Wei
 
 - Original Message -
 From: Wei Zhu wz1...@yahoo.com
 To: user@cassandra.apache.org
 Sent: Friday, January 25, 2013 10:09:37 PM
 Subject: Re: Cassandra pending compaction tasks keeps increasing
 
 
 To recap the problem,
 1.1.6 on SSD, 5 nodes, RF = 3, one CF only.
 After data load, initially all 5 nodes have very even data size (135G, each). 
 I ran nodetool repair -pr on node 1 which have replicates on node 2, node 3 
 since we set RF = 3.
 It appears that huge amount of data got transferred. Node 1 has 220G, node 2, 
 3 have around 170G. Pending LCS task on node 1 is 15K and node 2, 3 have 
 around 7K each.
 Questions:
 
 * Why nodetool repair increases the data size that much? It's not likely 
 that much data needs to be repaired. Will that happen for all the subsequent 
 repair?
 * How to make LCS run faster? After almost a day, the LCS tasks only 
 dropped by 1000. I am afraid it will never catch up. We set
 
 
 * compaction_throughput_mb_per_sec = 500
 * multithreaded_compaction: true
 
 
 Both Disk and CPU util are less than 10%. I understand LCS is single 
 threaded, any chance to speed it up?
 
 
 * We use default SSTable size as 5M, Will increase the size of SSTable 
 help? What will happen if I change the setting after the data is loaded.
 
 Any suggestion is very much appreciated.
 
 -Wei
 
 - Original Message -
 
 From: Wei Zhu wz1...@yahoo.com
 To: user@cassandra.apache.org
 Sent: Thursday, January 24, 2013 11:46:04 PM
 Subject: Re: Cassandra pending compaction tasks keeps increasing
 
 I believe I am running into this one:
 
 https://issues.apache.org/jira/browse/CASSANDRA-4765
 
 By the way, I am using 1.1.6 (I though I was using 1.1.7) and this one is 
 fixed in 1.1.7.
 
 - Original Message -
 
 From: Wei Zhu wz1...@yahoo.com
 To: user@cassandra.apache.org
 Sent: Thursday, January 24, 2013 11:18:59 PM
 Subject: Re: Cassandra pending compaction tasks keeps increasing
 
 Thanks Derek,
 in the cassandra-env.sh, it says
 
 # reduce the per-thread stack size to minimize the impact of Thrift
 # thread-per-client. (Best practice is for client connections to
 # be pooled anyway.) Only do so on Linux where it is known to be
 # supported.
 # u34 and greater need 180k
 JVM_OPTS=$JVM_OPTS -Xss180k
 
 What value should I use? Java defaults at 400K? Maybe try that first.
 
 Thanks.
 -Wei
 
 - Original Message -
 From: Derek Williams de...@fyrie.net
 To: user@cassandra.apache.org, Wei Zhu wz1...@yahoo.com
 Sent: Thursday, January 24, 2013 11:06:00 PM
 Subject: Re: Cassandra pending compaction tasks keeps increasing
 
 
 Increasing the stack size in cassandra-env.sh should help you get past the 
 stack overflow. Doesn't help with your original problem though.
 
 
 
 On Fri, Jan 25, 2013 at 12:00 AM, Wei Zhu  wz1...@yahoo.com  wrote:
 
 
 Well, even after restart, it throws the the same exception. I am basically 
 stuck. Any suggestion to clear the pending compaction tasks? Below is the end 
 of stack trace:
 
 at com.google.common.collect.Sets$1.iterator(Sets.java:578)
 at com.google.common.collect.Sets$1.iterator(Sets.java:578)
 at com.google.common.collect.Sets$1.iterator(Sets.java:578)
 at com.google.common.collect.Sets$1.iterator(Sets.java:578)
 at com.google.common.collect.Sets$3.iterator(Sets.java:667)
 at 

Re: why set replica placement strategy at keyspace level ?

2013-01-28 Thread aaron morton

  So If I write to CF Users with rowkey=dean
 and to CF Schedules with rowkey=dean, it is actually one row?
In my mental model that's correct. 
A RowMutation is a row key and a collection of (internal) ColumnFamilies which 
contain the columns to write for a single CF. 

This is the thing that is committed to the log, and then the changes in the 
ColumnFamilies are applied to each CF in an isolated way. 

 .(must have missed that several times in the
 documentation).
http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 9:28 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 If you write to 4 CF's with the same row key that is considered one
 mutation
 
 Hm, I never considered this, never knew either.(very un-intuitive from
 a user perspective IMHO).  So If I write to CF Users with rowkey=dean
 and to CF Schedules with rowkey=dean, it is actually one row?  (it's so
 un-intuitive that I had to ask to make sure I am reading that correctly).
 
 I guess I really don't have that case since most of my row keys are GUID's
 anyways, but very interesting and unexpected (not sure I really mind, was
 just taken aback)
 
 Ps. Not sure I ever minded losting atomic commits to the same row across
 CF's as I never expected it in the first place having used cassandra for
 more than a year.(must have missed that several times in the
 documentation).
 
 Thanks,
 Dean
 
 On 1/28/13 12:41 PM, aaron morton aa...@thelastpickle.com wrote:
 
 
 Another thing that's been confusing me is that when we talk about the
 data model should the row key be inside or outside a column family?
 My mental model is:
 
 cluster == database
 keyspace == table
 row == a row in a table
 CF == a family of columns in one row
 
 (I think that's different to others, but it works for me)
 
 Is it important to store rows of different column families that share
 the same row key to the same node?
 Makes the failure models a little easier to understand. e.g. Everything
 key for user amorton is either available or not.
 
 Meanwhile, what's the drawback of setting RPS and RF at column family
 level?
 Other than it's baked in?
 
 We process all mutations for a row at the same time. If you write to 4
 CF's with the same row key that is considered one mutation, for one row.
 That one RowMutation is directed to the replicas using the
 ReplicationStratagy and atomically applied to the commit log.
 
 If you have RS per CF that one mutation would be split into 4, which
 would then be sent to different replicas. Even if they went to the same
 replicas they would be written to the commit log as different mutations.
 
 So if you have RS per CF you lose atomic commits for writes to the same
 row.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 28/01/2013, at 11:22 PM, Manu Zhang owenzhang1...@gmail.com wrote:
 
 On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:
 The row is the unit of replication, all values with the same storage
 engine row key in a KS are on the same nodes. if they were per CF this
 would not hold.
 
 Not that it would be the end of the world, but that is the first thing
 that comes to mind.
 
 Cheers
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 27/01/2013, at 4:15 PM, Manu Zhang owenzhang1...@gmail.com wrote:
 
 Although I've got to know Cassandra for quite a while, this question
 only has occurred to me recently:
 
 Why are the replica placement strategy and replica factors set at the
 keyspace level?
 
 Would setting them at the column family level offers more flexibility?
 
 Is this because it's easier for user to manage an application? Or
 related to internal implementation? Or it's just that I've overlooked
 something?
 
 
 Is it important to store rows of different column families that share
 the same row key to the same node? AFAIK, Cassandra doesn't support get
 all of them in a single call.
 
 Meanwhile, what's the drawback of setting RPS and RF at column family
 level?
 
 Another thing that's been confusing me is that when we talk about the
 data model should the row key be inside or outside a column family?
 
 Thanks
 
 
 



Re: Problem on node join the ring

2013-01-28 Thread aaron morton
  there is no streaming anymore
Nodes only bootstrap once, when they are first started. 

 I have turned on the debug, this is what it is doing now(cpu is pretty much 
 idle), no any error message. 
Looks like it is receiving writes and reads, looks like it's part of the ring. 

Is this ring output from the Joining node or from one of the others ? Do the 
other nodes
see this node as up or joining ? 

When starting the node was there a log line with Bootstrap variables ? 

Anyways I would try running a nodetool repair -pr on the joining node. If you 
are not using QUOURM / QUOURM you maybe getting inconsistent results now. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 9:51 AM, Daning Wang dan...@netseer.com wrote:

 I add a new node to ring(version 1.1.6), after more than 30 hours, it is 
 still in the 'Joining' state
 
 Address DC  RackStatus State   Load
 Effective-Ownership Token   
   
  141784319550391026443072753096570088105 
 10.28.78.123datacenter1 rack1   Up Normal  18.73 GB50.00% 
  0   
 10.4.17.138 datacenter1 rack1   Up Normal  15 GB   39.29% 
  24305883351495604533098186245126300818  
 10.93.95.51 datacenter1 rack1   Up Normal  17.96 GB41.67% 
  42535295865117307932921825928971026432  
 10.170.1.26 datacenter1 rack1   Up Joining 6.89 GB 0.00%  
  56713727820156410577229101238628035242  
 10.6.115.239datacenter1 rack1   Up Normal  20.3 GB 50.00% 
  85070591730234615865843651857942052864  
 10.28.20.200datacenter1 rack1   Up Normal  22.68 GB60.71% 
  127605887595351923798765477786913079296 
 10.240.113.171  datacenter1 rack1   Up Normal  18.4 GB 58.33% 
  141784319550391026443072753096570088105  
 
 
 since after a while, the cpu usage goes down to 0, looks it is stuck. I have 
 restarted server several times in last 30 hours. when server is just started, 
 you can see streaming in 'nodetool netstats', but after a few minutes, there 
 is no streaming anymore
 
 I have turned on the debug, this is what it is doing now(cpu is pretty much 
 idle), no any error message. 
 
 Please help, I can provide more info if needed.
 
 Thanks in advance,
 
 
 DEBUG [MutationStage:17] 2013-01-28 12:47:59,618 RowMutationVerbHandler.java 
 (line 44) Applying RowMutation(keyspace='dsat', key='52f5298affbb8bf0', 
 modifications=[ColumnFamily(dsatcache 
 [_meta:false:278@1359406079725000!3888000,])])
 DEBUG [MutationStage:17] 2013-01-28 12:47:59,618 Table.java (line 395) 
 applying mutation of row 52f5298affbb8bf0
 DEBUG [MutationStage:17] 2013-01-28 12:47:59,618 RowMutationVerbHandler.java 
 (line 56) RowMutation(keyspace='dsat', key='52f5298affbb8bf0', 
 modifications=[ColumnFamily(dsatcache 
 [_meta:false:278@1359406079725000!3888000,])]) applied.  Sending response to 
 571645593@/10.28.78.123
 DEBUG [MutationStage:26] 2013-01-28 12:47:59,623 RowMutationVerbHandler.java 
 (line 44) Applying RowMutation(keyspace='dsat', key='57f700499922964b', 
 modifications=[ColumnFamily(dsatcache 
 [cache_type:false:8@1359406079730002,path:false:30@1359406079730001,top_node:false:22@135940607973,v0:false:976@1359406079730003!3888000,])])
 DEBUG [MutationStage:26] 2013-01-28 12:47:59,623 Table.java (line 395) 
 applying mutation of row 57f700499922964b
 DEBUG [MutationStage:26] 2013-01-28 12:47:59,623 Table.java (line 429) 
 mutating indexed column top_node value 
 6d617474626f7574726f732e74756d626c722e636f6d
 DEBUG [MutationStage:26] 2013-01-28 12:47:59,623 CollationController.java 
 (line 78) collectTimeOrderedData
 DEBUG [MutationStage:26] 2013-01-28 12:47:59,623 Table.java (line 453) 
 Pre-mutation index row is null
 DEBUG [MutationStage:26] 2013-01-28 12:47:59,624 KeysIndex.java (line 119) 
 applying index row mattboutros.tumblr.com in 
 ColumnFamily(dsatcache.dsatcache_top_node_idx 
 [57f700499922964b:false:0@135940607973,])
 DEBUG [MutationStage:26] 2013-01-28 12:47:59,624 RowMutationVerbHandler.java 
 (line 56) RowMutation(keyspace='dsat', key='57f700499922964b', 
 modifications=[ColumnFamily(dsatcache 
 [cache_type:false:8@1359406079730002,path:false:30@1359406079730001,top_node:false:22@135940607973,v0:false:976@1359406079730003!3888000,])])
  applied.  Sending response to 710680715@/10.28.20.200
 DEBUG [MutationStage:22] 2013-01-28 12:47:59,624 RowMutationVerbHandler.java 
 (line 44) Applying RowMutation(keyspace='dsat', key='57f700499922964b', 
 modifications=[ColumnFamily(dsatcache 
 [_meta:false:278@1359406079731000!3888000,])])
 DEBUG [MutationStage:22] 2013-01-28