Re: Question about node tool repair

2014-01-22 Thread Artur Kronenberg

About repairs,

we encountered a similar problem with our setup where repairs would take 
ages to complete. Based on your setup you can try loading data into page 
cache before running repairs. Depending on how much data you can hold in 
cache, this will speed up your repairs massively.


-- artur

On 21/01/14 20:33, Logendran, Dharsan (Dharsan) wrote:


Thanks Rob,

Dharsan

*From:*Robert Coli [mailto:rc...@eventbrite.com]
*Sent:* January-21-14 2:26 PM
*To:* user@cassandra.apache.org
*Subject:* Re: Question about node tool repair

On Mon, Jan 20, 2014 at 2:47 PM, Logendran, Dharsan (Dharsan) 
dharsan.logend...@alcatel-lucent.com 
mailto:dharsan.logend...@alcatel-lucent.com wrote:


We have a two  node cluster with the replication factor of 2.   The db 
has more than 2500 column families(tables).   The nodetool -pr repair 
on an empty database(one or table has a litter data) takes about 30 
hours to complete.   We are using Cassandra Version 2.0.4.   Is there 
any way for us to speed up this?.


Cassandra 2.0.2 made aspects of repair serial and therefore logically 
much slower as a function of replication factor. Yours is not the 
first report I have heard of = 2.0.2 era repair being unreasonably slow.


https://issues.apache.org/jira/browse/CASSANDRA-5950

You can use -par (not at all confusingly named with -pr!) to get the 
old parallel behavior.


Cassandra 2.1 has this ticket to improve repair with vnodes.

https://issues.apache.org/jira/browse/CASSANDRA-5220

But really you should strongly consider how much you need to run 
repair, and at very least probably increase gc_grace_seconds from the 
unreasonably low default of 10 days to 32 days, and then run your 
repair on the first of each month.


https://issues.apache.org/jira/browse/CASSANDRA-5850

IMO it is just a complete and total error if repair of an actually 
empty database is anything but a NO-OP. I would file a JIRA ticket, 
were I you.


=Rob





Re: Long GC due to promotion failures

2014-01-22 Thread Jason Wee
 SSTable count: 365

Your sstable counts are too many... don't know what is the best count
should be but for my experience, anything below 20 are good. Is your
compaction running?

I read on a few blog on how should we read cfhistograms, but never really
understood fully. Anyone care to explain using OP attached cfhistogram ?

Taking a wild shot, perhaps trying different build, oracle jdk 1.6u25
perhaps?

HTH

Jason




On Tue, Jan 21, 2014 at 4:02 PM, John Watson j...@disqus.com wrote:

 Pretty reliable, at some point, nodes will have super long GCs.
 Followed by https://issues.apache.org/jira/browse/CASSANDRA-6592

 Lovely log messages:

   9030.798: [ParNew (0: promotion failure size = 4194306)  (2:
 promotion failure size = 4194306)  (4: promotion failure size =
 4194306)  (promotion failed)
   Total time for which application threads were stopped: 23.2659990 seconds

 Full gc.log until just before restarting the node (see another 32s GC
 near the end): https://gist.github.com/dctrwatson/f04896c215fa2418b1d9

 Here's graph of GC time, where we can see a an increase 30 minutes
 prior (indicator that the issue will happen soon):
 http://dl.dropboxusercontent.com/s/q4dr7dle023w9ih/render.png

 Graph of various Heap usage:
 http://dl.dropboxusercontent.com/s/e8kd8go25ihbmkl/download.png

 Running compactions in the same time frame:
 http://dl.dropboxusercontent.com/s/li9tggk4r2l3u4b/render%20(1).png

 CPU, IO, ops and latencies:

 https://dl.dropboxusercontent.com/s/yh9osm9urplikb7/2014-01-20%20at%2011.46%20PM%202x.png

 cfhistograms/cfstats:
 https://gist.github.com/dctrwatson/9a08b38d0258ae434b15

 Cassandra 1.2.13
 Oracle JDK 1.6u45

 JVM opts:

 MAX_HEAP_SIZE=8G
 HEAP_NEW_SIZE=1536M

 Tried HEAP_NEW_SIZE of 768M, 800M, 1000M and 1600M
 Tried default -XX:SurvivorRatio=8 and -XX:SurvivorRatio=4
 Tried default -XX:MaxTenuringThreshold=1 and -XX:MaxTenuringThreshold=2

 All still eventually ran into long GC.

 Hardware for all 3 nodes:

 (2) E5520 @ 2.27Ghz (8 cores w/ HT) [16 cores]
 (6) 4GB RAM [24G RAM]
 (1) 500GB 7.2k for commitlog
 (2) 400G SSD for data (configured as separate data directories)



Re: Exception in thread main java.lang.NoClassDefFoundError

2014-01-22 Thread Jason Wee
NoClassDefFoundError: org/apache/cassandra/service/CassandraDaemon

This stated very clear, the class is not found in the classpath. Very
obviously you are not using the cassandra package for the distribution, so
you need to find which jar that contain this class and check in your
classpath if this jar is included.

Jason


On Tue, Jan 21, 2014 at 9:23 AM, Le Xu sharonx...@gmail.com wrote:

 Hello!
 I got this error while trying out Cassandra 1.2.13. The error message
 looks like:

 Exception in thread main java.lang.
 NoClassDefFoundError: org/apache/cassandra/service/CassandraDaemon
 Caused by: java.lang.ClassNotFoundException:
 org.apache.cassandra.service.CassandraDaemon
 at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class:
 org.apache.cassandra.service.CassandraDaemon. Program will exit.

 I checked JAVA_HOME and CASSANDRA_HOME and they are both set but I still
 got the error.

 However, based on Brian's reply in this thread:
 http://mail-archives.apache.org/mod_mbox/cassandra-user/201307.mbox/%3CCAJHHpg3Lf9tyxwgZNEN3cKH=p9xwms0w4rzqbpt8oriaq9r...@mail.gmail.com%3E
 I followed the step and printed out the $CLASSPATH  variable and got :

 /home/lexu1/scale/apache-cassandra-1.2.13-src//conf:/home/lexu1/scale/apache-cassandra-1.2.13-src//build/classes/main:/home/lexu1/scale/apache-cassandra-1.2.13-src//build/classes/thrift:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/antlr-3.2.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/avro-1.4.0-fixes.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/avro-1.4.0-sources-fixes.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/commons-cli-1.1.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/commons-codec-1.2.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/commons-lang-2.6.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/compress-lzf-0.8.4.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/concurrentlinkedhashmap-lru-1.3.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/guava-13.0.1.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/high-scale-lib-1.1.2.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/jackson-core-asl-1.9.2.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/jackson-mapper-asl-1.9.2.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/jamm-0.2.5.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/jbcrypt-0.3m.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/jline-1.0.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/json-simple-1.1.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/libthrift-0.7.0.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/log4j-1.2.16.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/lz4-1.1.0.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/metrics-core-2.2.0.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/netty-3.6.6.Final.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/servlet-api-2.5-20081211.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/slf4j-api-1.7.2.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/slf4j-log4j12-1.7.2.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/snakeyaml-1.6.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/snappy-java-1.0.5.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/snaptree-0.1.jar

 It includes apache-cassandra-1.2.13-src//build/classes/thrift but not
 service. Does the location of CassandraDaemon seems to be the problem? If
 it is, then how do I fix the problem?

 Thanks!

 Le



Introducing farsandra: A different way to integration test with c*

2014-01-22 Thread Edward Capriolo
The repo:
https://github.com/edwardcapriolo/farsandra

The code:
   Farsandra fs = new Farsandra();
fs.withVersion(2.0.4);
fs.withCleanInstanceOnStart(true);
fs.withInstanceName(1);
fs.withCreateConfigurationFiles(true);
fs.withHost(localhost);
fs.withSeeds(Arrays.asList(localhost));
fs.start();

The story:
For a while I have been developing applications that use Apache Cassandra
as their data store. Personally I am more of an end-to-end test person then
a mock test person. For years I have relied heavily on Hector's embedded
cassandra to bring up Cassandra in a sane way inside a java project.

The concept of Farsandra is to keep Cassandra close (in end to end tests
and not mocked away) but keep your classpath closer (running cassandra
embedded should be seamless and not mess with your client classpath).

Recently there has been much fragmentation with Hector Asytanax, CQL, and
multiple Cassandra releases. Bringing up an embedded test is much harder
then it need be.

Cassandra's core methods get, put, slice over thrift have been
wire-compatible from version 0.7 - current. However Java libraries for
thrift and things like guava differ across the Cassandra versions. This
makes a large number of issues when trying to use your favourite client
with your 1 or more versions of Cassandra. (sometimes a thrift mismatch
kills the entire integration and you (CANT)! test anything.

Farsandra is much like https://github.com/pcmanus/ccm in that it launches
Cassandra instances remotely inside a sub-process. Farsandra is done in
java not python, making it easier to use with java development.

I will not go and say Farsandra solves all problems. in fact it has it's
own challenges (building yaml configurations across versions, fetching
binary cassandra from the internet), but it opens up new opportunities to
developer complicated multi-node testing scenarios which are impossible due
to re-entrant embedded cassandra code!

Have fun.


Re: Moving from relational to Cassandra, how to handle intra-table relationships?

2014-01-22 Thread chandra Varahala
Hello,

You can implement relations  in couple of ways, JSON/XML and CQL collection
Classes.

Thanks
Chandra


On Tue, Jan 21, 2014 at 8:58 PM, Les Hartzman lhartz...@gmail.com wrote:

 True. Fortunately though in this application, the data is
 write-once/read-many. So that is one bullet I would dodge!

 Les


 On Tue, Jan 21, 2014 at 5:34 PM, Patricia Gorla 
 patri...@thelastpickle.com wrote:

 Hey,

 One thing to keep in mind if you want to go the serialized JSON route, is
 that you will need to read out the data each time you want to do an update.

 Cheers,
 Patricia


 On Tuesday, January 21, 2014, Les Hartzman lhartz...@gmail.com wrote:

 Hi,

 I'm looking to move from a relational DB to Cassandra. I just found that
 there are intra-table relationships in one table where the ids of the
 related rows are saved in a 'parent' row.

 How can these kinds of relationships be handled in Cassandra? I'm
 thinking that if the individual rows need to live on their own, perhaps I
 should store the data as serialized JSON in its own column of the parent.

 All thoughts appreciated!

 Thanks.

 Les



 --
 Patricia Gorla
 @patriciagorla

 Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com http://thelastpickle.com





Re: Long GC due to promotion failures

2014-01-22 Thread Lee Mighdoll
I don't recommend PrintFLSStatistics=1, it makes the gc logs hard to
mechanically parse.  Because of that, I can't easily tell whether you're in
the same situation we found.  But just in case, try setting
+CMSClassUnloadingEnabled.  There's an issue related to JMX in DSE that
prevents effective old gen collection in some cases.  The flag's low
overhead, and very effective if that's your problem too.

Cheers,
Lee


On Tue, Jan 21, 2014 at 12:02 AM, John Watson j...@disqus.com wrote:

 Pretty reliable, at some point, nodes will have super long GCs.
 Followed by https://issues.apache.org/jira/browse/CASSANDRA-6592

 Lovely log messages:

   9030.798: [ParNew (0: promotion failure size = 4194306)  (2:
 promotion failure size = 4194306)  (4: promotion failure size =
 4194306)  (promotion failed)
   Total time for which application threads were stopped: 23.2659990 seconds

 Full gc.log until just before restarting the node (see another 32s GC
 near the end): https://gist.github.com/dctrwatson/f04896c215fa2418b1d9

 Here's graph of GC time, where we can see a an increase 30 minutes
 prior (indicator that the issue will happen soon):
 http://dl.dropboxusercontent.com/s/q4dr7dle023w9ih/render.png

 Graph of various Heap usage:
 http://dl.dropboxusercontent.com/s/e8kd8go25ihbmkl/download.png

 Running compactions in the same time frame:
 http://dl.dropboxusercontent.com/s/li9tggk4r2l3u4b/render%20(1).png

 CPU, IO, ops and latencies:

 https://dl.dropboxusercontent.com/s/yh9osm9urplikb7/2014-01-20%20at%2011.46%20PM%202x.png

 cfhistograms/cfstats:
 https://gist.github.com/dctrwatson/9a08b38d0258ae434b15

 Cassandra 1.2.13
 Oracle JDK 1.6u45

 JVM opts:

 MAX_HEAP_SIZE=8G
 HEAP_NEW_SIZE=1536M

 Tried HEAP_NEW_SIZE of 768M, 800M, 1000M and 1600M
 Tried default -XX:SurvivorRatio=8 and -XX:SurvivorRatio=4
 Tried default -XX:MaxTenuringThreshold=1 and -XX:MaxTenuringThreshold=2

 All still eventually ran into long GC.

 Hardware for all 3 nodes:

 (2) E5520 @ 2.27Ghz (8 cores w/ HT) [16 cores]
 (6) 4GB RAM [24G RAM]
 (1) 500GB 7.2k for commitlog
 (2) 400G SSD for data (configured as separate data directories)



Re: Datamodel for a highscore list

2014-01-22 Thread Colin
Read users score, increment, update friends list, update user with new high 
score

Would that work?

--
Colin 
+1 320 221 9531

 

 On Jan 22, 2014, at 11:44 AM, Kasper Middelboe Petersen 
 kas...@sybogames.com wrote:
 
 Hi!
 
 I'm a little worried about the data model I have come up with for handling 
 highscores.
 
 I have a lot of users. Each user has a number of friends. I need a highscore 
 list pr friend list.
 
 I would like to have it optimized for reading the highscores as opposed to 
 setting a new highscore as the use case would suggest I would need to read 
 the list a lot more than I would need write new highscores.
 
 Currently I have the following tables:
 CREATE TABLE user (userId uuid, name varchar, highscore int, bestcombo int, 
 PRIMARY KEY(userId))
 CREATE TABLE highscore (userId uuid, score int, name varchar, PRIMARY 
 KEY(userId, score, name)) WITH CLUSTERING ORDER BY (score DESC);
 ... and a tables for friends - for the purpose of this mail assume everyone 
 is friends with everyone else
 
 Reading the highscore list for a given user is easy. SELECT * FROM highscores 
 WHERE userId = id.
 
 Problem is setting a new highscore.
 1. I need to read-before-write to get the old score
 2. I'm screwed if something goes wrong and the old score gets overwritten 
 before all the friends highscore lists gets updated - and it is an highly 
 visible error due to the same user is on the highscore multiple times.
 
 I would very much appreciate some feedback and/or alternatives to how to 
 solve this with Cassandra.
 
 
 Thanks,
 Kasper


Help me to find Compatable JDBC jar for Apache Cassandra 2.0.4

2014-01-22 Thread Chiranjeevi Ravilla
Hi  All,

I am using Apache Cassandra 2.0.4  version with cassandra-jdbc-1.2.5.jar.I am 
trying to run sample java program. aim getting below error. Please correct me 
weather i am using right Jdbc driver or suggest me  supported jdbc driver.


log4j:WARN No appenders could be found for logger 
(org.apache.cassandra.cql.jdbc.CassandraDriver).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/cassandra/cql/jdbc/AbstractJdbcType
at 
org.apache.cassandra.cql.jdbc.CassandraConnection.init(CassandraConnection.java:146)
at 
org.apache.cassandra.cql.jdbc.CassandraDriver.connect(CassandraDriver.java:92)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:233)
at CassandraTest.main(CassandraTest.java:12)
Caused by: java.lang.ClassNotFoundException: 
org.apache.cassandra.cql.jdbc.AbstractJdbcType
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)


Thanks in Advance..

--Chiru

Re: Datamodel for a highscore list

2014-01-22 Thread Jon Ribbens
On Wed, Jan 22, 2014 at 06:44:20PM +0100, Kasper Middelboe Petersen wrote:
I'm a little worried about the data model I have come up with for handling
highscores.
I have a lot of users. Each user has a number of friends. I need a
highscore list pr friend list.
I would like to have it optimized for reading the highscores as opposed to
setting a new highscore as the use case would suggest I would need to read
the list a lot more than I would need write new highscores.
Currently I have the following tables:
CREATE TABLE user (userId uuid, name varchar, highscore int, bestcombo
int, PRIMARY KEY(userId))
CREATE TABLE highscore (userId uuid, score int, name varchar, PRIMARY
KEY(userId, score, name)) WITH CLUSTERING ORDER BY (score DESC);
... and a tables for friends - for the purpose of this mail assume
everyone is friends with everyone else
Reading the highscore list for a given user is easy. SELECT * FROM
highscores WHERE userId = id.
Problem is setting a new highscore.
1. I need to read-before-write to get the old score
2. I'm screwed if something goes wrong and the old score gets overwritten
before all the friends highscore lists gets updated - and it is an highly
visible error due to the same user is on the highscore multiple times.
I would very much appreciate some feedback and/or alternatives to how to
solve this with Cassandra.

Is friendship symmetrical? Why not just store the scores in the friend
list like so:

CREATE TABLE friends (
  userID  uuid,
  friendIDuuid,
  namevarchar,
  score   int,
  PRIMARY KEY (userID, friendID)
);

and then simply sort the friends by score in your application code?

When you update a user's score, you just do something like:

  UPDATE friends SET score=x WHERE userID IN (all,my,friends) AND friendID=myID;

It should be quite efficient unless you have people with truly
ludicrous numbers of 'friends' ;-)


Re: Datamodel for a highscore list

2014-01-22 Thread Kasper Middelboe Petersen
I can think of two cases where something bad would happen in this case:
1. Something bad happens after the increment but before some or all of the
update friend list is finished
2. Someone spams two scores at the same time creating a race condition
where one of them could have a score that is not yet updated (or the old
score, depending on if the increment of the highscore is done before or
after the friend updates)

Both are unlikely things to have happen often, but I'm going to have quite
a few users using the system and it would be bound to happen and I would
really like to avoid having data corruption (especially of the kind that is
also obvious to the users) if it can at all be avoided.

Also should it happen there is no way to neither detect nor clean it up.


On Wed, Jan 22, 2014 at 6:48 PM, Colin colpcl...@gmail.com wrote:

 Read users score, increment, update friends list, update user with new
 high score

 Would that work?

 --
 Colin
 +1 320 221 9531



  On Jan 22, 2014, at 11:44 AM, Kasper Middelboe Petersen 
 kas...@sybogames.com wrote:
 
  Hi!
 
  I'm a little worried about the data model I have come up with for
 handling highscores.
 
  I have a lot of users. Each user has a number of friends. I need a
 highscore list pr friend list.
 
  I would like to have it optimized for reading the highscores as opposed
 to setting a new highscore as the use case would suggest I would need to
 read the list a lot more than I would need write new highscores.
 
  Currently I have the following tables:
  CREATE TABLE user (userId uuid, name varchar, highscore int, bestcombo
 int, PRIMARY KEY(userId))
  CREATE TABLE highscore (userId uuid, score int, name varchar, PRIMARY
 KEY(userId, score, name)) WITH CLUSTERING ORDER BY (score DESC);
  ... and a tables for friends - for the purpose of this mail assume
 everyone is friends with everyone else
 
  Reading the highscore list for a given user is easy. SELECT * FROM
 highscores WHERE userId = id.
 
  Problem is setting a new highscore.
  1. I need to read-before-write to get the old score
  2. I'm screwed if something goes wrong and the old score gets
 overwritten before all the friends highscore lists gets updated - and it is
 an highly visible error due to the same user is on the highscore multiple
 times.
 
  I would very much appreciate some feedback and/or alternatives to how to
 solve this with Cassandra.
 
 
  Thanks,
  Kasper



Re: Datamodel for a highscore list

2014-01-22 Thread Edward Capriolo
It is a tricky type of problem because some ways of doing it involve
iterative scans.
This presentation discusses a solution for top-k:

http://www.slideshare.net/planetcassandra/jonathan-halliday


On Wed, Jan 22, 2014 at 12:48 PM, Colin colpcl...@gmail.com wrote:

 Read users score, increment, update friends list, update user with new
 high score

 Would that work?

 --
 Colin
 +1 320 221 9531



  On Jan 22, 2014, at 11:44 AM, Kasper Middelboe Petersen 
 kas...@sybogames.com wrote:
 
  Hi!
 
  I'm a little worried about the data model I have come up with for
 handling highscores.
 
  I have a lot of users. Each user has a number of friends. I need a
 highscore list pr friend list.
 
  I would like to have it optimized for reading the highscores as opposed
 to setting a new highscore as the use case would suggest I would need to
 read the list a lot more than I would need write new highscores.
 
  Currently I have the following tables:
  CREATE TABLE user (userId uuid, name varchar, highscore int, bestcombo
 int, PRIMARY KEY(userId))
  CREATE TABLE highscore (userId uuid, score int, name varchar, PRIMARY
 KEY(userId, score, name)) WITH CLUSTERING ORDER BY (score DESC);
  ... and a tables for friends - for the purpose of this mail assume
 everyone is friends with everyone else
 
  Reading the highscore list for a given user is easy. SELECT * FROM
 highscores WHERE userId = id.
 
  Problem is setting a new highscore.
  1. I need to read-before-write to get the old score
  2. I'm screwed if something goes wrong and the old score gets
 overwritten before all the friends highscore lists gets updated - and it is
 an highly visible error due to the same user is on the highscore multiple
 times.
 
  I would very much appreciate some feedback and/or alternatives to how to
 solve this with Cassandra.
 
 
  Thanks,
  Kasper



Re: Datamodel for a highscore list

2014-01-22 Thread Colin Clark
How many users and how many games?

--
Colin
+1 320 221 9531



On Jan 22, 2014, at 10:59 AM, Kasper Middelboe Petersen 
kas...@sybogames.com wrote:

I can think of two cases where something bad would happen in this case:
1. Something bad happens after the increment but before some or all of the
update friend list is finished
2. Someone spams two scores at the same time creating a race condition
where one of them could have a score that is not yet updated (or the old
score, depending on if the increment of the highscore is done before or
after the friend updates)

Both are unlikely things to have happen often, but I'm going to have quite
a few users using the system and it would be bound to happen and I would
really like to avoid having data corruption (especially of the kind that is
also obvious to the users) if it can at all be avoided.

Also should it happen there is no way to neither detect nor clean it up.


On Wed, Jan 22, 2014 at 6:48 PM, Colin colpcl...@gmail.com wrote:

 Read users score, increment, update friends list, update user with new
 high score

 Would that work?

 --
 Colin
 +1 320 221 9531



  On Jan 22, 2014, at 11:44 AM, Kasper Middelboe Petersen 
 kas...@sybogames.com wrote:
 
  Hi!
 
  I'm a little worried about the data model I have come up with for
 handling highscores.
 
  I have a lot of users. Each user has a number of friends. I need a
 highscore list pr friend list.
 
  I would like to have it optimized for reading the highscores as opposed
 to setting a new highscore as the use case would suggest I would need to
 read the list a lot more than I would need write new highscores.
 
  Currently I have the following tables:
  CREATE TABLE user (userId uuid, name varchar, highscore int, bestcombo
 int, PRIMARY KEY(userId))
  CREATE TABLE highscore (userId uuid, score int, name varchar, PRIMARY
 KEY(userId, score, name)) WITH CLUSTERING ORDER BY (score DESC);
  ... and a tables for friends - for the purpose of this mail assume
 everyone is friends with everyone else
 
  Reading the highscore list for a given user is easy. SELECT * FROM
 highscores WHERE userId = id.
 
  Problem is setting a new highscore.
  1. I need to read-before-write to get the old score
  2. I'm screwed if something goes wrong and the old score gets
 overwritten before all the friends highscore lists gets updated - and it is
 an highly visible error due to the same user is on the highscore multiple
 times.
 
  I would very much appreciate some feedback and/or alternatives to how to
 solve this with Cassandra.
 
 
  Thanks,
  Kasper



Re: Datamodel for a highscore list

2014-01-22 Thread Kasper Middelboe Petersen
Yes friendship is symmetrical.

This could work for my problem right now, but I'm afraid it would just be
postponing the problem slightly until something like big tournaments (which
are coming) raises the same problem again.


On Wed, Jan 22, 2014 at 6:58 PM, Jon Ribbens 
jon-cassan...@unequivocal.co.uk wrote:

 On Wed, Jan 22, 2014 at 06:44:20PM +0100, Kasper Middelboe Petersen wrote:
 I'm a little worried about the data model I have come up with for
 handling
 highscores.
 I have a lot of users. Each user has a number of friends. I need a
 highscore list pr friend list.
 I would like to have it optimized for reading the highscores as
 opposed to
 setting a new highscore as the use case would suggest I would need to
 read
 the list a lot more than I would need write new highscores.
 Currently I have the following tables:
 CREATE TABLE user (userId uuid, name varchar, highscore int, bestcombo
 int, PRIMARY KEY(userId))
 CREATE TABLE highscore (userId uuid, score int, name varchar, PRIMARY
 KEY(userId, score, name)) WITH CLUSTERING ORDER BY (score DESC);
 ... and a tables for friends - for the purpose of this mail assume
 everyone is friends with everyone else
 Reading the highscore list for a given user is easy. SELECT * FROM
 highscores WHERE userId = id.
 Problem is setting a new highscore.
 1. I need to read-before-write to get the old score
 2. I'm screwed if something goes wrong and the old score gets
 overwritten
 before all the friends highscore lists gets updated - and it is an
 highly
 visible error due to the same user is on the highscore multiple times.
 I would very much appreciate some feedback and/or alternatives to how
 to
 solve this with Cassandra.

 Is friendship symmetrical? Why not just store the scores in the friend
 list like so:

 CREATE TABLE friends (
   userID  uuid,
   friendIDuuid,
   namevarchar,
   score   int,
   PRIMARY KEY (userID, friendID)
 );

 and then simply sort the friends by score in your application code?

 When you update a user's score, you just do something like:

   UPDATE friends SET score=x WHERE userID IN (all,my,friends) AND
 friendID=myID;

 It should be quite efficient unless you have people with truly
 ludicrous numbers of 'friends' ;-)



Re: Datamodel for a highscore list

2014-01-22 Thread Kasper Middelboe Petersen
Many million users. Just the one game- I might have some different scores I
need to keep track of, but I very much hope to be able to use the same
approach for those as for the high score mentioned here.


On Wed, Jan 22, 2014 at 7:08 PM, Colin Clark co...@clark.ws wrote:

 How many users and how many games?


 --
 Colin
 +1 320 221 9531



 On Jan 22, 2014, at 10:59 AM, Kasper Middelboe Petersen 
 kas...@sybogames.com wrote:

 I can think of two cases where something bad would happen in this case:
 1. Something bad happens after the increment but before some or all of the
 update friend list is finished
 2. Someone spams two scores at the same time creating a race condition
 where one of them could have a score that is not yet updated (or the old
 score, depending on if the increment of the highscore is done before or
 after the friend updates)

 Both are unlikely things to have happen often, but I'm going to have quite
 a few users using the system and it would be bound to happen and I would
 really like to avoid having data corruption (especially of the kind that is
 also obvious to the users) if it can at all be avoided.

 Also should it happen there is no way to neither detect nor clean it up.


 On Wed, Jan 22, 2014 at 6:48 PM, Colin colpcl...@gmail.com wrote:

 Read users score, increment, update friends list, update user with new
 high score

 Would that work?

 --
 Colin
 +1 320 221 9531



  On Jan 22, 2014, at 11:44 AM, Kasper Middelboe Petersen 
 kas...@sybogames.com wrote:
 
  Hi!
 
  I'm a little worried about the data model I have come up with for
 handling highscores.
 
  I have a lot of users. Each user has a number of friends. I need a
 highscore list pr friend list.
 
  I would like to have it optimized for reading the highscores as opposed
 to setting a new highscore as the use case would suggest I would need to
 read the list a lot more than I would need write new highscores.
 
  Currently I have the following tables:
  CREATE TABLE user (userId uuid, name varchar, highscore int, bestcombo
 int, PRIMARY KEY(userId))
  CREATE TABLE highscore (userId uuid, score int, name varchar, PRIMARY
 KEY(userId, score, name)) WITH CLUSTERING ORDER BY (score DESC);
  ... and a tables for friends - for the purpose of this mail assume
 everyone is friends with everyone else
 
  Reading the highscore list for a given user is easy. SELECT * FROM
 highscores WHERE userId = id.
 
  Problem is setting a new highscore.
  1. I need to read-before-write to get the old score
  2. I'm screwed if something goes wrong and the old score gets
 overwritten before all the friends highscore lists gets updated - and it is
 an highly visible error due to the same user is on the highscore multiple
 times.
 
  I would very much appreciate some feedback and/or alternatives to how
 to solve this with Cassandra.
 
 
  Thanks,
  Kasper





Re: Datamodel for a highscore list

2014-01-22 Thread Colin
One way might be to use userid as a rowid, and then put all of the friends with 
their scores on the same row.  You could even update the column entry like this

Score:username or Id

This way the columns would come back sorted when reading the high scores for 
the group.

To update set that uses score in that users row after reading it for update.

So each row would look like this

Rowkey - userid
Columns would be userid:score followed by friendid:score

This way, you could also get global high score list

Each user would have their own row

If multiple games, create userid+gameid as rowkey

Might this work?


--
Colin 
+1 320 221 9531

 

 On Jan 22, 2014, at 11:13 AM, Kasper Middelboe Petersen 
 kas...@sybogames.com wrote:
 
 Many million users. Just the one game- I might have some different scores I 
 need to keep track of, but I very much hope to be able to use the same 
 approach for those as for the high score mentioned here.
 
 
 On Wed, Jan 22, 2014 at 7:08 PM, Colin Clark co...@clark.ws wrote:
 How many users and how many games?
 
 
 --
 Colin 
 +1 320 221 9531
 
  
 
 On Jan 22, 2014, at 10:59 AM, Kasper Middelboe Petersen 
 kas...@sybogames.com wrote:
 
 I can think of two cases where something bad would happen in this case:
 1. Something bad happens after the increment but before some or all of the 
 update friend list is finished
 2. Someone spams two scores at the same time creating a race condition 
 where one of them could have a score that is not yet updated (or the old 
 score, depending on if the increment of the highscore is done before or 
 after the friend updates)
 
 Both are unlikely things to have happen often, but I'm going to have quite 
 a few users using the system and it would be bound to happen and I would 
 really like to avoid having data corruption (especially of the kind that is 
 also obvious to the users) if it can at all be avoided.
 
 Also should it happen there is no way to neither detect nor clean it up.
 
 
 On Wed, Jan 22, 2014 at 6:48 PM, Colin colpcl...@gmail.com wrote:
 Read users score, increment, update friends list, update user with new 
 high score
 
 Would that work?
 
 --
 Colin
 +1 320 221 9531
 
 
 
  On Jan 22, 2014, at 11:44 AM, Kasper Middelboe Petersen 
  kas...@sybogames.com wrote:
 
  Hi!
 
  I'm a little worried about the data model I have come up with for 
  handling highscores.
 
  I have a lot of users. Each user has a number of friends. I need a 
  highscore list pr friend list.
 
  I would like to have it optimized for reading the highscores as opposed 
  to setting a new highscore as the use case would suggest I would need to 
  read the list a lot more than I would need write new highscores.
 
  Currently I have the following tables:
  CREATE TABLE user (userId uuid, name varchar, highscore int, bestcombo 
  int, PRIMARY KEY(userId))
  CREATE TABLE highscore (userId uuid, score int, name varchar, PRIMARY 
  KEY(userId, score, name)) WITH CLUSTERING ORDER BY (score DESC);
  ... and a tables for friends - for the purpose of this mail assume 
  everyone is friends with everyone else
 
  Reading the highscore list for a given user is easy. SELECT * FROM 
  highscores WHERE userId = id.
 
  Problem is setting a new highscore.
  1. I need to read-before-write to get the old score
  2. I'm screwed if something goes wrong and the old score gets 
  overwritten before all the friends highscore lists gets updated - and it 
  is an highly visible error due to the same user is on the highscore 
  multiple times.
 
  I would very much appreciate some feedback and/or alternatives to how to 
  solve this with Cassandra.
 
 
  Thanks,
  Kasper
 
 


Re: Moving from relational to Cassandra, how to handle intra-table relationships?

2014-01-22 Thread Les Hartzman
Hmm. Hadn't thought about using a collection. Might be able to get away
with a map. Have to find out more about the origins of these relationships.

I don't think XML gives any advantage over JSON, but it is another
possibility.

Les


On Wed, Jan 22, 2014 at 7:43 AM, chandra Varahala 
hadoopandcassan...@gmail.com wrote:

 Hello,

 You can implement relations  in couple of ways, JSON/XML and CQL
 collection Classes.

 Thanks
 Chandra


 On Tue, Jan 21, 2014 at 8:58 PM, Les Hartzman lhartz...@gmail.com wrote:

 True. Fortunately though in this application, the data is
 write-once/read-many. So that is one bullet I would dodge!

 Les


 On Tue, Jan 21, 2014 at 5:34 PM, Patricia Gorla 
 patri...@thelastpickle.com wrote:

 Hey,

 One thing to keep in mind if you want to go the serialized JSON route,
 is that you will need to read out the data each time you want to do an
 update.

 Cheers,
 Patricia


 On Tuesday, January 21, 2014, Les Hartzman lhartz...@gmail.com wrote:

 Hi,

 I'm looking to move from a relational DB to Cassandra. I just found
 that there are intra-table relationships in one table where the ids of the
 related rows are saved in a 'parent' row.

 How can these kinds of relationships be handled in Cassandra? I'm
 thinking that if the individual rows need to live on their own, perhaps I
 should store the data as serialized JSON in its own column of the parent.

 All thoughts appreciated!

 Thanks.

 Les



 --
 Patricia Gorla
 @patriciagorla

 Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com http://thelastpickle.com






Fatal error with limit 10000

2014-01-22 Thread Robert Wille
I have a table with a bunch of records that have 10,000 keys per partition
key (not sure if that¹s the right terminology). Here¹s the schema:

CREATE TABLE bdn_index_pub (

tshard VARCHAR,

pord INT,

ord INT,

hpath VARCHAR,

page BIGINT,

PRIMARY KEY (tshard, pord)

) WITH gc_grace_seconds = 0;


In most records, there are 10,000 pord values for each tshard value.

If I run a query without a limit clause in sqlsh, I get this:

select count(*) from bdn_index_pub where tshard = '-16:12';

 count
---
 1

(1 rows)

Default LIMIT of 1 was used. Specify your own LIMIT clause to get more
results.

If I run a query with a limit clause  1, I get this:

select count(*) from bdn_index_pub where tshard = '-16:12' limit 10001;
TSocket read 0 bytes

Any query I run thereafter gives me the TSocket read 0 bytes error.

I am running 2.0.4. I¹m pretty sure this didn¹t happen in 2.0.2.

Any reason why I can¹t use a limit  1?

Thanks in advance

Robert





Re: Help me to find Compatable JDBC jar for Apache Cassandra 2.0.4

2014-01-22 Thread chandra Varahala
Did you put these jars in classpath ?

cassandra-all-1.x.x.jar
guarva
jackson-core-asl
jacckson-mapper-asl
libthrift
snappy
slf4j-api
metrics-core
netty


thanks
Chandra


On Wed, Jan 22, 2014 at 12:52 PM, Colin Clark co...@clark.ws wrote:

 Is the jar on the path?  Is cassandra home set correctly?

 Looks like cassandra cant find the jar-verify existence by searching.

 --
 Colin
 +1 320 221 9531



 On Jan 22, 2014, at 11:50 AM, Chiranjeevi Ravilla rccassandr...@gmail.com
 wrote:

 Hi  All,

 I am using Apache Cassandra 2.0.4  version with cassandra-jdbc-1.2.5.jar.I
 am trying to run sample java program. aim getting below error. Please
 correct me weather i am using right Jdbc driver or suggest me  supported
 jdbc driver.


 log4j:WARN No appenders could be found for logger
 (org.apache.cassandra.cql.jdbc.CassandraDriver).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
 more info.
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/cassandra/cql/jdbc/AbstractJdbcType
 at
 org.apache.cassandra.cql.jdbc.CassandraConnection.init(CassandraConnection.java:146)
  at
 org.apache.cassandra.cql.jdbc.CassandraDriver.connect(CassandraDriver.java:92)
 at java.sql.DriverManager.getConnection(DriverManager.java:571)
  at java.sql.DriverManager.getConnection(DriverManager.java:233)
 at CassandraTest.main(CassandraTest.java:12)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.cassandra.cql.jdbc.AbstractJdbcType
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)


 Thanks in Advance..

 --Chiru




Re: Long GC due to promotion failures

2014-01-22 Thread John Watson
I thought PrintFLSStatistics was necessary for determining heap
fragmentation? Or is it possible to see that without it as well?

Perm-Gen stays steady, but I'll enable it anyway to see if it has any affect.

Thanks,

John

On Wed, Jan 22, 2014 at 8:34 AM, Lee Mighdoll l...@underneath.ca wrote:
 I don't recommend PrintFLSStatistics=1, it makes the gc logs hard to
 mechanically parse.  Because of that, I can't easily tell whether you're in
 the same situation we found.  But just in case, try setting
 +CMSClassUnloadingEnabled.  There's an issue related to JMX in DSE that
 prevents effective old gen collection in some cases.  The flag's low
 overhead, and very effective if that's your problem too.

 Cheers,
 Lee


 On Tue, Jan 21, 2014 at 12:02 AM, John Watson j...@disqus.com wrote:

 Pretty reliable, at some point, nodes will have super long GCs.
 Followed by https://issues.apache.org/jira/browse/CASSANDRA-6592

 Lovely log messages:

   9030.798: [ParNew (0: promotion failure size = 4194306)  (2:
 promotion failure size = 4194306)  (4: promotion failure size =
 4194306)  (promotion failed)
   Total time for which application threads were stopped: 23.2659990
 seconds

 Full gc.log until just before restarting the node (see another 32s GC
 near the end): https://gist.github.com/dctrwatson/f04896c215fa2418b1d9

 Here's graph of GC time, where we can see a an increase 30 minutes
 prior (indicator that the issue will happen soon):
 http://dl.dropboxusercontent.com/s/q4dr7dle023w9ih/render.png

 Graph of various Heap usage:
 http://dl.dropboxusercontent.com/s/e8kd8go25ihbmkl/download.png

 Running compactions in the same time frame:
 http://dl.dropboxusercontent.com/s/li9tggk4r2l3u4b/render%20(1).png

 CPU, IO, ops and latencies:

 https://dl.dropboxusercontent.com/s/yh9osm9urplikb7/2014-01-20%20at%2011.46%20PM%202x.png

 cfhistograms/cfstats:
 https://gist.github.com/dctrwatson/9a08b38d0258ae434b15

 Cassandra 1.2.13
 Oracle JDK 1.6u45

 JVM opts:

 MAX_HEAP_SIZE=8G
 HEAP_NEW_SIZE=1536M

 Tried HEAP_NEW_SIZE of 768M, 800M, 1000M and 1600M
 Tried default -XX:SurvivorRatio=8 and -XX:SurvivorRatio=4
 Tried default -XX:MaxTenuringThreshold=1 and
 -XX:MaxTenuringThreshold=2

 All still eventually ran into long GC.

 Hardware for all 3 nodes:

 (2) E5520 @ 2.27Ghz (8 cores w/ HT) [16 cores]
 (6) 4GB RAM [24G RAM]
 (1) 500GB 7.2k for commitlog
 (2) 400G SSD for data (configured as separate data directories)




Re: Long GC due to promotion failures

2014-01-22 Thread John Watson
LCS does create a lot of SSTables unfortunately. The nodes are keeping
up on compactions though.

This started after starting to read from a CF that has tombstones in its rows.

What's even more concerning, is it's continuing even after stopping
reads and dropping that CF.

On Wed, Jan 22, 2014 at 3:02 AM, Jason Wee peich...@gmail.com wrote:
 SSTable count: 365

 Your sstable counts are too many... don't know what is the best count should
 be but for my experience, anything below 20 are good. Is your compaction
 running?

 I read on a few blog on how should we read cfhistograms, but never really
 understood fully. Anyone care to explain using OP attached cfhistogram ?

 Taking a wild shot, perhaps trying different build, oracle jdk 1.6u25
 perhaps?

 HTH

 Jason




 On Tue, Jan 21, 2014 at 4:02 PM, John Watson j...@disqus.com wrote:

 Pretty reliable, at some point, nodes will have super long GCs.
 Followed by https://issues.apache.org/jira/browse/CASSANDRA-6592

 Lovely log messages:

   9030.798: [ParNew (0: promotion failure size = 4194306)  (2:
 promotion failure size = 4194306)  (4: promotion failure size =
 4194306)  (promotion failed)
   Total time for which application threads were stopped: 23.2659990
 seconds

 Full gc.log until just before restarting the node (see another 32s GC
 near the end): https://gist.github.com/dctrwatson/f04896c215fa2418b1d9

 Here's graph of GC time, where we can see a an increase 30 minutes
 prior (indicator that the issue will happen soon):
 http://dl.dropboxusercontent.com/s/q4dr7dle023w9ih/render.png

 Graph of various Heap usage:
 http://dl.dropboxusercontent.com/s/e8kd8go25ihbmkl/download.png

 Running compactions in the same time frame:
 http://dl.dropboxusercontent.com/s/li9tggk4r2l3u4b/render%20(1).png

 CPU, IO, ops and latencies:

 https://dl.dropboxusercontent.com/s/yh9osm9urplikb7/2014-01-20%20at%2011.46%20PM%202x.png

 cfhistograms/cfstats:
 https://gist.github.com/dctrwatson/9a08b38d0258ae434b15

 Cassandra 1.2.13
 Oracle JDK 1.6u45

 JVM opts:

 MAX_HEAP_SIZE=8G
 HEAP_NEW_SIZE=1536M

 Tried HEAP_NEW_SIZE of 768M, 800M, 1000M and 1600M
 Tried default -XX:SurvivorRatio=8 and -XX:SurvivorRatio=4
 Tried default -XX:MaxTenuringThreshold=1 and
 -XX:MaxTenuringThreshold=2

 All still eventually ran into long GC.

 Hardware for all 3 nodes:

 (2) E5520 @ 2.27Ghz (8 cores w/ HT) [16 cores]
 (6) 4GB RAM [24G RAM]
 (1) 500GB 7.2k for commitlog
 (2) 400G SSD for data (configured as separate data directories)




Re: Extremely long GC

2014-01-22 Thread Yogi Nerella
Hi,

Can you share the GC logs for the systems you are running problems into?

Yogi


On Wed, Jan 22, 2014 at 6:50 AM, Joel Samuelsson
samuelsson.j...@gmail.comwrote:

 Hello,

 We've been having problems with long GC pauses and can't seem to get rid
 of them.

 Our latest test is on a clean machine with Ubuntu 12.04 LTS, Java 1.7.0_45
 and JNA installed.
 It is a single node cluster with most settings being default, the only
 things changed are ip-addresses, cluster name and partitioner (to
 RandomPartitioner).
 We are running Cassandra 2.0.4.
 We are running on a virtual machine with Xen.
 We have 16GB of ram and default memory settings for C* (i.e. heap size of
 4GB). CPU specified as 8 cores by our provider.

 Right now, we have no data on the machine and no requests to it at all.
 Still we get ParNew GCs like the following:
 INFO [ScheduledTasks:1] 2014-01-18 10:54:42,286 GCInspector.java (line
 116) GC for ParNew: 464 ms for 1 collections, 102838776 used; max is
 4106223616

 While this may not be extremely long, on other machines with the same
 setup but some data (around 12GB) and around 10 read requests/s (i.e.
 basically no load) we have seen ParNew GC for 20 minutes or more. During
 this time, the machine goes down completely (I can't even ssh to it). The
 requests are mostly from OpsCenter and the rows requested are not extremely
 large (typically less than 1KB).

 We have tried a lot of different things to solve these issues since we've
 been having them for a long time including:
 - Upgrading Cassandra to new versions
 - Upgrading Java to new versions
 - Printing promotion failures in GC-log (no failures found!)
 - Different sizes of heap and heap space for different GC spaces (Eden
 etc.)
 - Different versions of Ubuntu
 - Running on Amazon EC2 instead of the provider we are using now (not with
 Datastax AMI)

 Something that may be a clue is that when running the DataStax Community
 AMI on Amazon we haven't seen the GC yet (it's been running for a week or
 so). Just to be clear, another test on Amazon EC2 mentioned above (without
 the Datastax AMI) shows the GC freezes.

 If any other information is needed, just let me know.

 Best regards,
 Joel Samuelsson



Re: Exception in thread main java.lang.NoClassDefFoundError

2014-01-22 Thread Yogi Nerella
I think you are building from the source, were there any build failures,
before you started the test?

can you rebuild and try again?  Please provide the command line you are
using as well?



On Wed, Jan 22, 2014 at 3:06 AM, Jason Wee peich...@gmail.com wrote:

 NoClassDefFoundError: org/apache/cassandra/service/CassandraDaemon

 This stated very clear, the class is not found in the classpath. Very
 obviously you are not using the cassandra package for the distribution, so
 you need to find which jar that contain this class and check in your
 classpath if this jar is included.

 Jason


 On Tue, Jan 21, 2014 at 9:23 AM, Le Xu sharonx...@gmail.com wrote:

 Hello!
 I got this error while trying out Cassandra 1.2.13. The error message
 looks like:

 Exception in thread main java.lang.
 NoClassDefFoundError: org/apache/cassandra/service/CassandraDaemon
 Caused by: java.lang.ClassNotFoundException:
 org.apache.cassandra.service.CassandraDaemon
 at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
 Could not find the main class:
 org.apache.cassandra.service.CassandraDaemon. Program will exit.

 I checked JAVA_HOME and CASSANDRA_HOME and they are both set but I still
 got the error.

 However, based on Brian's reply in this thread:
 http://mail-archives.apache.org/mod_mbox/cassandra-user/201307.mbox/%3CCAJHHpg3Lf9tyxwgZNEN3cKH=p9xwms0w4rzqbpt8oriaq9r...@mail.gmail.com%3E
 I followed the step and printed out the $CLASSPATH  variable and got :

 /home/lexu1/scale/apache-cassandra-1.2.13-src//conf:/home/lexu1/scale/apache-cassandra-1.2.13-src//build/classes/main:/home/lexu1/scale/apache-cassandra-1.2.13-src//build/classes/thrift:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/antlr-3.2.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/avro-1.4.0-fixes.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/avro-1.4.0-sources-fixes.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/commons-cli-1.1.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/commons-codec-1.2.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/commons-lang-2.6.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/compress-lzf-0.8.4.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/concurrentlinkedhashmap-lru-1.3.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/guava-13.0.1.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/high-scale-lib-1.1.2.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/jackson-core-asl-1.9.2.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/jackson-mapper-asl-1.9.2.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/jamm-0.2.5.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/jbcrypt-0.3m.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/jline-1.0.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/json-simple-1.1.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/libthrift-0.7.0.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/log4j-1.2.16.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/lz4-1.1.0.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/metrics-core-2.2.0.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/netty-3.6.6.Final.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/servlet-api-2.5-20081211.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/slf4j-api-1.7.2.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/slf4j-log4j12-1.7.2.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/snakeyaml-1.6.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/snappy-java-1.0.5.jar:/home/lexu1/scale/apache-cassandra-1.2.13-src//lib/snaptree-0.1.jar

 It includes apache-cassandra-1.2.13-src//build/classes/thrift but not
 service. Does the location of CassandraDaemon seems to be the problem? If
 it is, then how do I fix the problem?

 Thanks!

 Le





token for agent

2014-01-22 Thread Daniel Curry
  I was wondering how important to have a cluster that has a node with 
a token that begin with a zero for a three node cluster?




3 NODES
---
   0 -- Node 1
 56713727820156410577229101238628035242 -- Node 2
113427455640312821154458202477256070484-- Node 3


Will this effect the agents for not connecting?

3 NODES
---
170141183460469231731687303715884105728 -- Node 1
 56713727820156410577229101238628035242  -- Node 2
113427455640312821154458202477256070484  -- Node 3


Thank you

--
Daniel Curry
Sr Linux Systems Administrator
Arrayent, Inc.
2317 Broadway Street, Suite 20
Redwood City, CA 94063
dan...@arrayent.com



Re: token for agent

2014-01-22 Thread Andrey Ilinykh
No. There is no any special in value 0.


On Wed, Jan 22, 2014 at 1:30 PM, Daniel Curry daniel.cu...@arrayent.comwrote:

   I was wondering how important to have a cluster that has a node with a
 token that begin with a zero for a three node cluster?



 3 NODES
 ---
0 -- Node 1
  56713727820156410577229101238628035242 -- Node 2
 113427455640312821154458202477256070484-- Node 3


 Will this effect the agents for not connecting?

 3 NODES
 ---
 170141183460469231731687303715884105728 -- Node 1
  56713727820156410577229101238628035242  -- Node 2
 113427455640312821154458202477256070484  -- Node 3


 Thank you

 --
 Daniel Curry
 Sr Linux Systems Administrator
 Arrayent, Inc.
 2317 Broadway Street, Suite 20
 Redwood City, CA 94063
 dan...@arrayent.com




Re: Long GC due to promotion failures

2014-01-22 Thread Lee Mighdoll
On Wed, Jan 22, 2014 at 11:35 AM, John Watson j...@disqus.com wrote:

 I thought PrintFLSStatistics was necessary for determining heap
 fragmentation? Or is it possible to see that without it as well?


I've found that easier parsing is more important than tracking indicators
of fragmentation.

Perm-Gen stays steady, but I'll enable it anyway to see if it has any
 affect.


I know CMSClassUnloadingEnabled is talked about in the context of permgen,
but it has an effect on the main heap as well.  In our case we saw DSE
nodes lock up for 10 minutes at a time trying to clear old gen; cured when
we added the flag.  It's a shot in the dark without more analysis, but just
turning on the flag is easy enough.

Cheers,
Lee


RE: cassandra read performance jumps from one row to next

2014-01-22 Thread NEWBROUGH, JONATHAN

Trying to find out why a cassandra read is taking so long, I used tracing and 
limited the number of rows. Strangely, when I query 600 rows, I get results in 
~50 milliseconds. But 610 rows takes nearly 1 second!
cqlsh select containerdefinitionid from containerdefinition limit 600;
... lots of output ...

Tracing session: 6b506cd0-83bc-11e3-96e8-e182571757d7

activity
| timestamp| source| source_elapsed
-+--+---+
  
execute_cql3_query | 15:25:02,878 | 130.4.147.116 |  0
   
Parsing statement | 15:25:02,878 | 130.4.147.116 | 39
  
Peparing statement | 15:25:02,878 | 130.4.147.116 |101
   Determining 
replicas to query | 15:25:02,878 | 130.4.147.116 |152
Executing seq scan across 1 sstables for [min(-9223372036854775808), 
min(-9223372036854775808)] | 15:25:02,879 | 130.4.147.116 |   1021
Scanned 755 
rows and matched 755 | 15:25:02,933 | 130.4.147.116 |  55169

Request complete | 15:25:02,934 | 130.4.147.116 |  56300
cqlsh select containerdefinitionid from containerdefinition limit 610;
... just about the same output and trace info, except...

Scanned 766 
rows and matched 766 | 15:25:58,908 | 130.4.147.116 | 739141
There seems to be nothing unusual about the data in those particular rows: - 
values are similar to those before and after. - using the COPY command I can 
export the whole table and import on a different cluster and performance is 
fine. - these rows are the first example, but there seem to be other places 
where query time jumps as well. Whole table is only ~3000 rows but takes ~15sec 
to list all primary keys.
There does seem to be something unusual about the data STORAGE: - snapshot 
copied to another cluster and imported gives same results with same limits - 
COPY data to CSV and then into another cluster does not, performance is great
Have tried compaction, repair, reindex, cleanup and refresh. No effect.
I realize I could fix by copying data out and in, but I'm trying to figure 
out what is going on here to avoid it happening in production on a table too 
big to fix with COPY.
Table has 17 columns, 3 indices, TEXT primary key, two LIST columns and two 
TIMESTAMP columns; the rest are TEXT. Can reproduce issue with both 
SimpleStrategy and DC-aware replication. Can reproduce with 4 copies of data on 
4 servers, 2 copies on 2 servers and 1 copy on 2 servers (so doesn't matter if 
query is performed locally or involves multiple servers). Cassandra-1.2 with 
cqlsh.
Any ideas? Suggestions?



Any Limits on number of items in a collection column type

2014-01-22 Thread Manoj Khangaonkar
Hi,

On C* 2.0.0. 3 Node cluster.

I have a column  daycount listBigInt. The column is storing a count.
Every few secs a new count is appended. The total count for the day is the
sum of all items in the list.

My application logs indicate I wrote about 11 items to the column for a
particular  row. Assume row key is day_timestamp.

But when I do a read on the column I get back a list with only 43000 items.
Checked with both java driver and CQL.

There are no errors or exceptions anywhere.

There is this statement in the WIKI Collection values may not be larger
than 64K. I assume this refers to 1 item in a collection.

Has anyone else seen an issue like this ?

regards

MJ


-- 
http://khangaonkar.blogspot.com/


Re: Any Limits on number of items in a collection column type

2014-01-22 Thread Robert Wille
Yes, I¹ve experienced this as well. It looks like you¹re getting the number
of items inserted mod 64K.

From:  Manoj Khangaonkar khangaon...@gmail.com
Reply-To:  user@cassandra.apache.org
Date:  Wednesday, January 22, 2014 at 7:17 PM
To:  user@cassandra.apache.org
Subject:  Any Limits on number of items in a collection column type

Hi,

On C* 2.0.0. 3 Node cluster.

I have a column  daycount listBigInt. The column is storing a count. Every
few secs a new count is appended. The total count for the day is the sum of
all items in the list.

My application logs indicate I wrote about 11 items to the column for a
particular  row. Assume row key is day_timestamp.

But when I do a read on the column I get back a list with only 43000 items.
Checked with both java driver and CQL.

There are no errors or exceptions anywhere.

There is this statement in the WIKI Collection values may not be larger
than 64K. I assume this refers to 1 item in a collection.

Has anyone else seen an issue like this ?

regards

MJ


-- 
http://khangaonkar.blogspot.com/




Re: Introducing farsandra: A different way to integration test with c*

2014-01-22 Thread Jonathan Ellis
Nice work, Ed.  Personally, I do find it more productive to write
system tests in Python (dtest builds on ccm to provide a number of
utilities that cut down on the bolierplate [1]), but I can understand
that others will feel differently and more testing can only improve
Cassandra.

Thanks!

[1] https://github.com/riptano/cassandra-dtest

On Wed, Jan 22, 2014 at 7:06 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
 The repo:
 https://github.com/edwardcapriolo/farsandra

 The code:
Farsandra fs = new Farsandra();
 fs.withVersion(2.0.4);
 fs.withCleanInstanceOnStart(true);
 fs.withInstanceName(1);
 fs.withCreateConfigurationFiles(true);
 fs.withHost(localhost);
 fs.withSeeds(Arrays.asList(localhost));
 fs.start();

 The story:
 For a while I have been developing applications that use Apache Cassandra as
 their data store. Personally I am more of an end-to-end test person then a
 mock test person. For years I have relied heavily on Hector's embedded
 cassandra to bring up Cassandra in a sane way inside a java project.

 The concept of Farsandra is to keep Cassandra close (in end to end tests and
 not mocked away) but keep your classpath closer (running cassandra embedded
 should be seamless and not mess with your client classpath).

 Recently there has been much fragmentation with Hector Asytanax, CQL, and
 multiple Cassandra releases. Bringing up an embedded test is much harder
 then it need be.

 Cassandra's core methods get, put, slice over thrift have been
 wire-compatible from version 0.7 - current. However Java libraries for
 thrift and things like guava differ across the Cassandra versions. This
 makes a large number of issues when trying to use your favourite client
 with your 1 or more versions of Cassandra. (sometimes a thrift mismatch
 kills the entire integration and you (CANT)! test anything.

 Farsandra is much like https://github.com/pcmanus/ccm in that it launches
 Cassandra instances remotely inside a sub-process. Farsandra is done in java
 not python, making it easier to use with java development.

 I will not go and say Farsandra solves all problems. in fact it has it's own
 challenges (building yaml configurations across versions, fetching binary
 cassandra from the internet), but it opens up new opportunities to developer
 complicated multi-node testing scenarios which are impossible due to
 re-entrant embedded cassandra code!

 Have fun.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Any Limits on number of items in a collection column type

2014-01-22 Thread Robert Wille
I didn¹t read your question properly. Collections are limited to 64K items,
not 64K bytes per item.

From:  Manoj Khangaonkar khangaon...@gmail.com
Reply-To:  user@cassandra.apache.org
Date:  Wednesday, January 22, 2014 at 7:17 PM
To:  user@cassandra.apache.org
Subject:  Any Limits on number of items in a collection column type

Hi,

On C* 2.0.0. 3 Node cluster.

I have a column  daycount listBigInt. The column is storing a count. Every
few secs a new count is appended. The total count for the day is the sum of
all items in the list.

My application logs indicate I wrote about 11 items to the column for a
particular  row. Assume row key is day_timestamp.

But when I do a read on the column I get back a list with only 43000 items.
Checked with both java driver and CQL.

There are no errors or exceptions anywhere.

There is this statement in the WIKI Collection values may not be larger
than 64K. I assume this refers to 1 item in a collection.

Has anyone else seen an issue like this ?

regards

MJ


-- 
http://khangaonkar.blogspot.com/




Re: Any Limits on number of items in a collection column type

2014-01-22 Thread Manoj Khangaonkar
Thanks. I guess I can work around by maintaining hour_counts (which will
have fewer items) and adding the hour counts to
get day counts.

regards


On Wed, Jan 22, 2014 at 7:15 PM, Robert Wille rwi...@fold3.com wrote:

 I didn’t read your question properly. Collections are limited to 64K
 items, not 64K bytes per item.

 From: Manoj Khangaonkar khangaon...@gmail.com
 Reply-To: user@cassandra.apache.org
 Date: Wednesday, January 22, 2014 at 7:17 PM
 To: user@cassandra.apache.org
 Subject: Any Limits on number of items in a collection column type

 Hi,

 On C* 2.0.0. 3 Node cluster.

 I have a column  daycount listBigInt. The column is storing a count.
 Every few secs a new count is appended. The total count for the day is the
 sum of all items in the list.

 My application logs indicate I wrote about 11 items to the column for
 a particular  row. Assume row key is day_timestamp.

 But when I do a read on the column I get back a list with only 43000
 items. Checked with both java driver and CQL.

 There are no errors or exceptions anywhere.

 There is this statement in the WIKI Collection values may not be larger
 than 64K. I assume this refers to 1 item in a collection.

 Has anyone else seen an issue like this ?

 regards

 MJ


 --
 http://khangaonkar.blogspot.com/




-- 
http://khangaonkar.blogspot.com/


./cqlsh not working

2014-01-22 Thread Chamila Wijayarathna
Hi all,
I downloaded 1.2.13 version and ran ./cqlsh inside bin folder, but it says
that bash: ./cqlsh: Permission denied, when I ran it with sudo it says
Command not found.
 When I ran chmod u+x cqlsh and then tried ./cqlsh, now it says that Can't
locate transport factory function
cqlshlib.tfactory.regular_transport_factory. What is the problem here?

Thank You.
-- 
*Chamila Dilshan Wijayarathna,*
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.


Re: Introducing farsandra: A different way to integration test with c*

2014-01-22 Thread Edward Capriolo
Right,

This does not have to be thought of as a replacement for ccm or dtest.

The particular problems I tend to have are:

When trying to do Hive and Cassandra storage handler,  Cassandra and Hive
had incompatible versions of antlr. Short of rebuilding one or both it can
not be resolved.

I have had a version of Astyanax that is build against thrift 0.7.X and
Cassandra is using thrift 0.9.X. So if I can get the Cassandra Server off
the classpath the conflict goes away.

You could do something like dtest like scenario or ccm thing as well. It is
a 100% java (minus the fork) solution. That has some wins but may not be
worth re-writing something you already have.

Edward




On Wed, Jan 22, 2014 at 10:11 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Nice work, Ed.  Personally, I do find it more productive to write
 system tests in Python (dtest builds on ccm to provide a number of
 utilities that cut down on the bolierplate [1]), but I can understand
 that others will feel differently and more testing can only improve
 Cassandra.

 Thanks!

 [1] https://github.com/riptano/cassandra-dtest

 On Wed, Jan 22, 2014 at 7:06 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
  The repo:
  https://github.com/edwardcapriolo/farsandra
 
  The code:
 Farsandra fs = new Farsandra();
  fs.withVersion(2.0.4);
  fs.withCleanInstanceOnStart(true);
  fs.withInstanceName(1);
  fs.withCreateConfigurationFiles(true);
  fs.withHost(localhost);
  fs.withSeeds(Arrays.asList(localhost));
  fs.start();
 
  The story:
  For a while I have been developing applications that use Apache
 Cassandra as
  their data store. Personally I am more of an end-to-end test person then
 a
  mock test person. For years I have relied heavily on Hector's embedded
  cassandra to bring up Cassandra in a sane way inside a java project.
 
  The concept of Farsandra is to keep Cassandra close (in end to end tests
 and
  not mocked away) but keep your classpath closer (running cassandra
 embedded
  should be seamless and not mess with your client classpath).
 
  Recently there has been much fragmentation with Hector Asytanax, CQL, and
  multiple Cassandra releases. Bringing up an embedded test is much harder
  then it need be.
 
  Cassandra's core methods get, put, slice over thrift have been
  wire-compatible from version 0.7 - current. However Java libraries for
  thrift and things like guava differ across the Cassandra versions. This
  makes a large number of issues when trying to use your favourite client
  with your 1 or more versions of Cassandra. (sometimes a thrift mismatch
  kills the entire integration and you (CANT)! test anything.
 
  Farsandra is much like https://github.com/pcmanus/ccm in that it
 launches
  Cassandra instances remotely inside a sub-process. Farsandra is done in
 java
  not python, making it easier to use with java development.
 
  I will not go and say Farsandra solves all problems. in fact it has it's
 own
  challenges (building yaml configurations across versions, fetching binary
  cassandra from the internet), but it opens up new opportunities to
 developer
  complicated multi-node testing scenarios which are impossible due to
  re-entrant embedded cassandra code!
 
  Have fun.



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced



Re: ./cqlsh not working

2014-01-22 Thread Jason Wee
Just meant it cannot find the require library, why don't you install
cassandra package to your distribution ?
http://rpm.datastax.com/community/noarch/cassandra12-1.2.13-1.noarch.rpm
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/install/installRHEL_t.html?pagename=docsversion=1.2file=install/install_rpm

That should save you a lot of trouble.

Jason


On Thu, Jan 23, 2014 at 12:02 PM, Chamila Wijayarathna 
cdwijayarat...@gmail.com wrote:

 Hi all,
 I downloaded 1.2.13 version and ran ./cqlsh inside bin folder, but it says
 that bash: ./cqlsh: Permission denied, when I ran it with sudo it says
 Command not found.
  When I ran chmod u+x cqlsh and then tried ./cqlsh, now it says that Can't
 locate transport factory function
 cqlshlib.tfactory.regular_transport_factory. What is the problem here?

 Thank You.
 --
 *Chamila Dilshan Wijayarathna,*
 SMIEEE, SMIESL,
 Undergraduate,
 Department of Computer Science and Engineering,
 University of Moratuwa.



Re: Extremely long GC

2014-01-22 Thread Joel Samuelsson
Here is one example. 12GB data, no load besides OpsCenter and perhaps 1-2
requests per minute.
INFO [ScheduledTasks:1] 2013-12-29 01:03:25,381 GCInspector.java (line 119)
GC for ParNew: 426400 ms for 1 collections, 2253360864 used; max is
4114612224


2014/1/22 Yogi Nerella ynerella...@gmail.com

 Hi,

 Can you share the GC logs for the systems you are running problems into?

 Yogi


 On Wed, Jan 22, 2014 at 6:50 AM, Joel Samuelsson 
 samuelsson.j...@gmail.com wrote:

 Hello,

 We've been having problems with long GC pauses and can't seem to get rid
 of them.

 Our latest test is on a clean machine with Ubuntu 12.04 LTS, Java
 1.7.0_45 and JNA installed.
 It is a single node cluster with most settings being default, the only
 things changed are ip-addresses, cluster name and partitioner (to
 RandomPartitioner).
 We are running Cassandra 2.0.4.
 We are running on a virtual machine with Xen.
 We have 16GB of ram and default memory settings for C* (i.e. heap size of
 4GB). CPU specified as 8 cores by our provider.

 Right now, we have no data on the machine and no requests to it at all.
 Still we get ParNew GCs like the following:
 INFO [ScheduledTasks:1] 2014-01-18 10:54:42,286 GCInspector.java (line
 116) GC for ParNew: 464 ms for 1 collections, 102838776 used; max is
 4106223616

 While this may not be extremely long, on other machines with the same
 setup but some data (around 12GB) and around 10 read requests/s (i.e.
 basically no load) we have seen ParNew GC for 20 minutes or more. During
 this time, the machine goes down completely (I can't even ssh to it). The
 requests are mostly from OpsCenter and the rows requested are not extremely
 large (typically less than 1KB).

 We have tried a lot of different things to solve these issues since we've
 been having them for a long time including:
 - Upgrading Cassandra to new versions
 - Upgrading Java to new versions
 - Printing promotion failures in GC-log (no failures found!)
 - Different sizes of heap and heap space for different GC spaces (Eden
 etc.)
 - Different versions of Ubuntu
 - Running on Amazon EC2 instead of the provider we are using now (not
 with Datastax AMI)

 Something that may be a clue is that when running the DataStax Community
 AMI on Amazon we haven't seen the GC yet (it's been running for a week or
 so). Just to be clear, another test on Amazon EC2 mentioned above (without
 the Datastax AMI) shows the GC freezes.

 If any other information is needed, just let me know.

 Best regards,
 Joel Samuelsson





Row cache vs. OS buffer cache

2014-01-22 Thread Katriel Traum
Hello list,

I was if anyone has any pointers or some advise regarding using row cache
vs leaving it up to the OS buffer cache.

I run cassandra 1.1 and 1.2 with JNA, so off-heap row cache is an option.

Any input appreciated.
Katriel