Visiting Auckland

2011-06-16 Thread aaron morton
So long as the Volcanic Ash stays away I'll be visiting Auckland next week on 
the 23rd and 24th. 

Drop me an email if you would like to meet to talk about things Cassandra. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com



Re: Cassandra JVM GC settings

2011-06-16 Thread aaron morton
It would help if you can provide some log messages from the GCInspector so 
people can see how much GC is going on. 


Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 17 Jun 2011, at 02:46, Sebastien Coutu wrote:

 Hi Everyone,
 
 I'm seeing Cassandra GC a lot and I would like to tune the Young space and 
 the Tenured space. Anyone would have recommendations on the NewRatio or 
 NewSize/MaxNewSize to use for an environment where Cassandra has several 
 column families and in which we are doing a mixed load of reading and 
 writing. The JVM has 8G of heap space assigned to it and there are 9 nodes to 
 this cluster.
 
 Thanks for the comments!
 
 Sébastien Coutu



Re: client API

2011-06-16 Thread aaron morton
The Thrift Java compiler creates code that is not compliant with Java 5.

https://issues.apache.org/jira/browse/THRIFT-1170

So you may have trouble getting the thrift API to run. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 17 Jun 2011, at 03:14, karim abbouh wrote:

 i use jdk1.6 to install and launch cassandra in a linux platform,but can i 
 use jdk1.5 for my cassandra Client ?



Re: Docs: Token Selection

2011-06-16 Thread aaron morton
 But, I'm thinking about using OldNetworkTopStrat. 

NetworkTopologyStrategy is where it's at. 

A
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 17 Jun 2011, at 01:39, AJ wrote:

 Thanks Eric!  I've finally got it!  I feel like I've just been initiated or 
 something by discovering this secret.  I kid!
 
 But, I'm thinking about using OldNetworkTopStrat.  Do you, or anyone else, 
 know if the same rules for token assignment applies to ONTS?
 
 
 On 6/16/2011 7:21 AM, Eric tamme wrote:
 AJ,
 
 sorry I seemed to miss the original email on this thread.  As Aaron
 said, when computing tokens for multiple data centers, you should
 compute them independently for each data center - as if it were its
 own Cassandra cluster.
 
 You can have overlapping token ranges between multiple data centers,
 but no two nodes can have the same token, so for subsequent data
 centers I just increment the tokens.
 
 For two data centers with two nodes each using RandomPartitioner
 calculate the tokens for the first DC normally, but int he second data
 center, increment the tokens by one.
 
 In DC 1
 node 1 = 0
 node 2 = 85070591730234615865843651857942052864
 
 In DC 2
 node 1 = 1
 node 2 =  85070591730234615865843651857942052865
 
 For RowMutations this will give each data center a local set of nodes
 that it can write to for complete coverage of the entire token space.
 If you are using NetworkTopologyStrategy for replication, it will give
 an offset mirror replication between the two data centers so that your
 replicas will not get pinned to a node in the remote DC.  There are
 other ways to select the tokens, but the increment method is the
 simplest to manage and continue to grow with.
 
 Hope that helps.
 
 -Eric
 
 



Re: Easy way to overload a single node on purpose?

2011-06-17 Thread aaron morton
The short answer to the problem you saw is monitor the disk space. Also monitor 
client side logs for errors. Running out of commit log space does not stop the 
node from doing reads, so it can still be considered up. 

One nodes view of it's own UP'ness is not as important as the other nodes (or 
clients) view of it.  For example...

A node will appear UP in the ring view of another node if it is participating 
in gossip messages and it's application state is normal. But a node will appear 
UP in it's own view of the ring most of time (assuming not bootstrap, leaving 
etc and it has joined the ring). This applies even if it's gossip service has 
been disabled.

To a client a node will appear down if it is not responding to RPC requests. 
But it could still be part of the cluster, appear UP to other nodes and be 
responding to read and/or write. 

So to monitor that a node is running in some form you can...

- you should be monitoring the TP stats anyway, so you know the node is in some 
running state 
- check that you can connect as a client to each node and do some simple call. 
Either read/write or describe_ring() which will exec locallay or 
describe_schema_versions() which will call all live nodes. A read/write will 
only verify that the node can act as a coordinator, not that it can read/write 
it's self. 
- monitor the other nodes view of each node using nodetool ring. 

Now that i've written that I'm not 100% sold on it, but it will do for now :)
 
Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 17 Jun 2011, at 10:25, Suan Aik Yeo wrote:

  Having a ping column can work if every key is replicated to every node. It 
  would tell you the cluster is working, sort of. Once the number of nodes is 
  greater than the RF, it tells you a subset of the nodes works.
 
 The way our check works is that each node checks itself, so in this context 
 we're not concerned about whether the cluster is up, but that each 
 individual node is up.
  
 So the symptoms I saw, the node actually going down etc, were probably due 
 to many different events happening at the time, and will be very hard to 
 recreate?
 
 On Thu, Jun 16, 2011 at 6:16 AM, aaron morton aa...@thelastpickle.com wrote:
  DEBUG 14:36:55,546 ... timed out
 
 Is logged when the coordinator times out waiting for the replicas to respond, 
 the timeout setting is rpc_timeout in the yaml file. This results in the 
 client getting a TimedOutException.
 
 AFAIK There is no global everything is good / bad flags to check. e.g. AFAIK 
 I node will not mark its self down if it runs out of disk space.  So you need 
 to monitor the free disk space and alert on that.
 
 Having a ping column can work if every key is replicated to every node. It 
 would tell you the cluster is working, sort of. Once the number of nodes is 
 greater than the RF, it tells you a subset of the nodes works.
 
 If you google around you'll find discussions about monitoring with munin, 
 ganglia, cloud kick and Ops Centre.
 
 If you install mx4j you can access the JMX metrics via HTTP,
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 16 Jun 2011, at 10:38, Suan Aik Yeo wrote:
 
  Here's a weird one... what's the best way to get a Cassandra node into a 
  half-crashed state?
 
  We have a 3-node cluster running 0.7.5. A few days ago this happened 
  organically to node1 - the partition the commitlog was on was 100% full and 
  there was a No space left on device error, and after a while, although 
  the cluster and node1 was still up, to the other nodes it was down, and 
  messages like:
  DEBUG 14:36:55,546 ... timed out
  started to show up in its debug logs.
 
  We have a tool to indicate to the load balancer that a Cassandra node is 
  down, but it didn't detect it that time. Now I'm having trouble 
  purposefully getting the node back to that state, so that I can try other 
  monitoring methods. I've tried to fill up the commitlog partition with 
  other files, and although I get the No space left on device error, the 
  node still doesn't go down and show the other symptoms it showed before.
 
  Also, if anyone could recommend a good way for a node itself to detect that 
  its in such a state I'd be interested in that too. Currently what we're 
  doing is making a describe_cluster_name() thrift call, but that still 
  worked when the node was down. I'm thinking of something like 
  reading/writing to a fixed value in a keyspace as a check... Unfortunately 
  Java-based solutions are out of the question.
 
 
  Thanks,
  Suan
 
 



Re: cassandra crash

2011-06-17 Thread aaron morton
What do you mean by crash ? 

If there was some sort of error in cassandra (including java running out of 
heap space) it will appear in the logs. Are there any error messages in the log.

If there was some sort of JVM error it will be outputted to std error and 
probably end up on std out /  console. If you are using a packed distro it will 
probably be in /var/log/cassandra/output.log

Cheers 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 17 Jun 2011, at 19:18, Donna Li wrote:

 All:
 Can you find some exception from the last sentence? Would cassandra crash 
 when memory is not enough? There are some other application run with 
 cassandra, the other application may use large memory.
  
  
  
 发件人: Donna Li 
 发送时间: 2011年6月17日 9:58
 收件人: user@cassandra.apache.org
 主题: cassandra crash
  
 All:
 Why cassandra crash after print the following log?
  
 INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,020 
 SSTableDeletingReference.java (line 104) Deleted 
 /usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-206-Data.db
  INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,020 
 SSTableDeletingReference.java (line 104) Deleted 
 /usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-207-Data.db
  INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,020 
 SSTableDeletingReference.java (line 104) Deleted 
 /usr/local/rss/DDB/data/data/PSCluster/VCCCurScheduleTable-137-Data.db
  INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,021 
 SSTableDeletingReference.java (line 104) Deleted 
 /usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-205-Data.db
  INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,021 
 SSTableDeletingReference.java (line 104) Deleted 
 /usr/local/rss/DDB/data/data/PSCluster/VCCCurScheduleTable-139-Data.db
  INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,021 
 SSTableDeletingReference.java (line 104) Deleted 
 /usr/local/rss/DDB/data/data/PSCluster/VCCCurScheduleTable-138-Data.db
  INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,021 
 SSTableDeletingReference.java (line 104) Deleted 
 /usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-208-Data.db
  INFO [GC inspection] 2011-06-16 14:22:59,562 GCInspector.java (line 110) GC 
 for ParNew: 385 ms, 26859800 reclaimed leaving 117789112 used; max is 
 118784
  
  
 Best Regards
 Donna li



Re: Cassandra.yaml

2011-06-19 Thread aaron morton
The change to the remove the calls to DatabaseDecriptor were in this commit on 
the 0.8 branch
https://github.com/apache/cassandra/commit/fe122c8c7d9ca0f002d5f394b4414dc91f278d1f

It looks like it did not make it over to the 0.8.0 branch 
https://github.com/apache/cassandra/blob/cassandra-0.8.0/src/java/org/apache/cassandra/config/CFMetaData.java#L642

It is in the trunk and the current trunk and builds. Can you try the nightly 
here 
https://builds.apache.org/job/Cassandra-0.8/

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 17 Jun 2011, at 20:52, Vivek Mishra wrote:

 Thanks Aaron. But I tried it with 0.8.0 release only!
  
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Friday, June 17, 2011 1:55 PM
 To: user@cassandra.apache.org
 Subject: Re: Cassandra.yaml
  
 sounds like 
 https://issues.apache.org/jira/browse/CASSANDRA-2694
  
 Cheers
  
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
  
 On 17 Jun 2011, at 20:10, Vivek Mishra wrote:
 
 
 Hi Sasha,
 This is what I am trying . I can sense this is happening with JDBCDriver 
 stuff.
 
public static void main(String[] args) {
try {
java.sql.Connection con = null;

 Class.forName(org.apache.cassandra.cql.jdbc.CassandraDriver);
con = DriverManager

 .getConnection(jdbc:cassandra:root/root@localhost:9160/Key1);
 //  con.
System.out.println(con !=null);
 
} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
}
}
 
 Getting following error:
 org.apache.cassandra.config.ConfigurationException: Cannot locate 
 cassandra.yaml
at 
 org.apache.cassandra.config.DatabaseDescriptor.getStorageConfigURL(DatabaseDescriptor.java:111)
 at 
 org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:121)
at 
 org.apache.cassandra.config.CFMetaData.fromThrift(CFMetaData.java:642)
at 
 org.apache.cassandra.cql.jdbc.ColumnDecoder.init(ColumnDecoder.java:61)
at 
 org.apache.cassandra.cql.jdbc.Connection.execute(Connection.java:142)
at 
 org.apache.cassandra.cql.jdbc.Connection.execute(Connection.java:124)
at 
 org.apache.cassandra.cql.jdbc.CassandraConnection.init(CassandraConnection.java:83)
at 
 org.apache.cassandra.cql.jdbc.CassandraDriver.connect(CassandraDriver.java:86)
at java.sql.DriverManager.getConnection(Unknown Source)
at java.sql.DriverManager.getConnection(Unknown Source)
 
 
 
 Ideally it should get it . Not sure what is the issue.
 
 -Vivek
 
 -Original Message-
 From: Sasha Dolgy [mailto:sdo...@gmail.com]
 Sent: Friday, June 17, 2011 1:31 PM
 To: user@cassandra.apache.org
 Subject: Re: Cassandra.yaml
 
 Hi Vivek,
 
 When I write client code in Java, using Hector, I don't specify a 
 cassandra.yaml ... I specify the host(s) and keyspace I want to connect to.  
 Alternately, I specify the host(s) and create the keyspace if the one I would 
 like to use doesn't exist (new cluster for example).  At no point do I use 
 yaml file with my client code
 
 The conf/cassandra.yaml is there to tell the cassandra server how to behave / 
 operate when it starts ...
 
 -sd
 
 On Fri, Jun 17, 2011 at 9:55 AM, Vivek Mishra vivek.mis...@impetus.co.in 
 wrote:
 
  
 I have a query:
  
 I have my Cassandra server running on my local machine and it has
 loaded Cassandra specific settings from
  
 apache-cassandra-0.8.0-src/apache-cassandra-0.8.0-src/conf/cassandra.y
 aml
  
 Now If I am writing a java program to connect to this server why do I
 need to provide a new Cassandra.yaml file again?  Even if server is
 already up and running
  
 Even if I can create keyspaces, columnfamilies programmatically?  Isn’t it 
 some type of redundancy?
  
 Might be my query is a bit irrelevant.
  
 -Vivek
 
 
 
 Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend a 
 live session by Head of Impetus Labs on ‘Secrets of Building a Cloud Vendor 
 Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud ‘.
 
 Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus 
 webinar on May 27 by registering athttp://www.impetus.com/webinar?eventid=42 .
 
 
 NOTE: This message may contain information that is confidential, proprietary, 
 privileged or otherwise protected by law. The message is intended solely for 
 the named addressee. If received in error, please destroy and notify the 
 sender. Any use of this email is prohibited when received in error. Impetus 
 does not represent, warrant and/or guarantee, that the integrity

Re: MemoryMeter uninitialized (jamm not specified as java agent)

2011-06-19 Thread aaron morton
What do you get for 

$ java -version
java version 1.6.0_24
Java(TM) SE Runtime Environment (build 1.6.0_24-b07-334-10M3326)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02-334, mixed mode)


Also you can check if the wrapper has correctly detected things with 

ps aux | grep javaagent

The args to the java process should include 
-javaagent:bin/../lib/jamm-0.2.2.jar 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 17 Jun 2011, at 22:18, Rene Kochen wrote:

 Since using cassandra 0.8, I see the following warning:
  
 WARN 12:05:59,807 MemoryMeter uninitialized (jamm not specified as java 
 agent); assuming liveRatio of 10.0.  Usually this means cassandra-env.sh 
 disabled jamm because you are using a buggy JRE; upgrade to the Sun JRE 
 instead
  
 I'am using Sun JRE.
  
 What can I do to resolve this? What are the consequences of this warning?
  
 Thanx,
  
 Rene



Re: Re : last record rowId

2011-06-19 Thread aaron morton
get_range_slice() api call allows you to iterate of the keys in the DB. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 18 Jun 2011, at 05:00, karim abbouh wrote:

 is there any way to remember the keys (rowId) inserted in cassandra database?
 B.R
 
 De : Jonathan Ellis jbel...@gmail.com
 À : user@cassandra.apache.org
 Cc : karim abbouh karim_...@yahoo.fr
 Envoyé le : Mercredi 15 Juin 2011 18h05
 Objet : Re: last record rowId
 
 You're better served using UUIDs than numeric row IDs for surrogate
 keys.  (Of course natural keys work fine too.)
 
 On Wed, Jun 15, 2011 at 9:16 AM, Utku Can Topçu u...@topcu.gen.tr wrote:
  As far as I can tell, this functionality doesn't exist.
 
  However you can use such a method to insert the rowId into another column
  within a seperate row, and request the latest column.
  I think this would work for you. However every insert would need a get
  request, which I think would be performance issue somehow.
 
  Regards,
  Utku
 
  On Wed, Jun 15, 2011 at 11:14 AM, karim abbouh karim_...@yahoo.fr wrote:
 
  in my java application,when we try to insert we should all the time know
  the last rowId
  in order the insert the new record in rowId+1,so for that we should save
  this rowId in a file
  is there other way to know the last record rowId?
  thanks
  B.R
 
 
 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com
 
 



Re: Error trying to move a node - 0.7

2011-06-19 Thread aaron morton
I *think* someone had a similar problem once before, moving a node that was the 
only node in a DC. 

Whats version are you using ?

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 17 Jun 2011, at 07:42, Ben Frank wrote:

 Hi All,
   I'm getting the following error when trying to move a nodes token:
 
 nodetool -h 145.6.92.82 -p 18080 move 56713727820156410577229101238628035242
 cassandra.in.sh executing for environment DEV1
 Exception in thread main java.lang.AssertionError
at
 org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:393)
at
 org.apache.cassandra.locator.TokenMetadata.ringIterator(TokenMetadata.java:418)
at
 org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(NetworkTopologyStrategy.java:94)
at
 org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:807)
at
 org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:773)
at
 org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1468)
at
 org.apache.cassandra.service.StorageService.move(StorageService.java:1605)
at
 org.apache.cassandra.service.StorageService.move(StorageService.java:1580)
 .
 .
 .
 
 my ring looks like this:
 
 Address Status State   LoadOwnsToken
 
 113427455640312821154458202477256070484
 145.6.99.80  Up Normal  1.63 GB 36.05%
 4629135223504085509237477504287125589
 145.6.92.82  Up Normal  2.86 GB 1.09%
 6479163079760931522618457053473150444
 145.6.99.81  Up Normal  2.01 GB 62.86%
 113427455640312821154458202477256070484
 
 
 '80' and '81' are configured to be in the East coast data center and '82' is
 in the West
 
 Anyone shed any light as to what might be going on here?
 
 -Ben



Re: framed transport and buffered transport

2011-06-20 Thread aaron morton
From changes.txt =
https://github.com/apache/cassandra/blob/cassandra-0.8.0/CHANGES.txt#L687

make framed transport the default so malformed requests can't OOM the=20=

  server (CASSANDRA-475)


btw, you *really* should upgrade.

Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 20 Jun 2011, at 15:07, Donna Li wrote:

 
 My cassandra version is 0.6.3, what is the advantage of framed transport?
 
 -邮件原件-
 发件人: Jonathan Ellis [mailto:jbel...@gmail.com] 
 发送时间: 2011年6月20日 10:56
 收件人: user@cassandra.apache.org
 主题: Re: framed transport and buffered transport
 
 The most important difference is that only framed is supported in 0.8+
 
 On Sun, Jun 19, 2011 at 9:27 PM, Donna Li donna...@utstar.com wrote:
 All:
 
  What is the difference of framed transport and buffered transport? And what
 is the advantage and disadvantage of the two different transports?
 
 
 
 
 
 Thanks
 
 Donna li
 
 
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Secondary indexes performance

2011-06-21 Thread aaron morton
Can you provide some more information on the query you are running ? How many 
terms are you selecting with? 

How long does it take to return 1024 rows ? IMHO thats a reasonably big slice 
to get.  

The server will pick the most selective equality predicate, and then filter the 
results from that using the other predicates.  

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 Jun 2011, at 09:04, Wojciech Pietrzok wrote:

 Hello,
 
 I've noticed that queries using secondary indexes seems to be getting
 rather slow.
 Right now I've got an Column Family with 4 indexed columns (plus 5-6
 non indexed columns, column values are small), and around 1,5-2
 millions of rows. I'm using pycassa client and query using
 get_indexed_slices method that returns over 10k rows (in batches of
 1024 rows) can take up to 30 seconds. Is it normal? Seems too long for
 me.
 
 Maybe there's a way to tune Cassandra config for better secondary
 indexes performance?
 
 Using Cassandra 0.7.6
 
 -- 
 KosciaK



Re: OOM during restart

2011-06-21 Thread aaron morton
AFAIK the node will not announce itself in the ring until the log replay is 
complete, so it will not get the schema update until after log replay. If 
possible i'd avoid making the schema change until you have solved this problem.

My theory on OOM during log replay is that the high speed inserts are a good 
way of finding out if the maximum memory required by the schema is too big to 
fit in the JVM. How big is the max JVM Heap SIze and do you have a lot of CF's?

The simple solution it to either (temporarily) increase the JVM Heap Size or 
move the log files so that the server can process only one at a time. The JVM 
option D.cassandra_ring=false will stop the node from joining the cluster and 
stop other nodes sending requests to it until you have sorted it out. 

Hope that helps. 
  
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 Jun 2011, at 10:24, Gabriel Ki wrote:

 Hi,
 
 Cassandra: 7.6-2
 I was restarting a node and ran into OOM while replaying the commit log.  I 
 am not able to bring the node up again.
 
 DEBUG 15:11:43,501 forceFlush requested but everything is clean  
   For this I don't know what to do.
 java.lang.OutOfMemoryError: Java heap space
 at 
 org.apache.cassandra.io.util.BufferedRandomAccessFile.init(BufferedRandomAccessFile.java:123)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.init(SSTableWriter.java:395)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter.init(SSTableWriter.java:76)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2238)
 at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:166)
 at org.apache.cassandra.db.Memtable.access$000(Memtable.java:49)
 at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:189)
 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 
 Any help will be appreciated.   
 
 If I update the schema while a node is down, the new schema is loaded before 
 the flushing when the node is brought up again, correct?  
 
 Thanks,
 -gabe



Re: Create columnFamily

2011-06-21 Thread aaron morton
You've set a comparator for the super column names, but not the sub columns. 
e.g. 

[default@dev] set data['31']['address']['city']='noida';
org.apache.cassandra.db.marshal.MarshalException: cannot parse 'city' as hex 
bytes
[default@dev] set data['31']['address'][utf8('city')]='noida';
Value inserted.

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 Jun 2011, at 19:06, Vivek Mishra wrote:

 I understand that I might be missing something on my end. But somehow I 
 cannot get this working using Cassandra-cli:
  
 [default@key1] create column family supusers with comparator=UTF8Type and 
 default_validation_class=UTF8Type and key_validation_class=UTF8Type and 
 column_type=Super;
  
 59e2e950-9bd4-11e0--242d50cf1fbf
 Waiting for schema agreement...
 ... schemas agree across the cluster
  
 SuperColumn family got created.
  
 Issued 
 [default@key1] assume supusers keys as ascii;
  
 But still it is failing for:
  
 [default@key1] set supusers['31']['address']['city']='noida';
  
 org.apache.cassandra.db.marshal.MarshalException: cannot parse 'city' as hex 
 bytes
  
  
 Please suggest, what am I doing incorrect here?
  
  
 
 
 Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend a 
 live session by Head of Impetus Labs on ‘Secrets of Building a Cloud Vendor 
 Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud ‘. 
 
 Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus 
 webinar on May 27 by registering at http://www.impetus.com/webinar?eventid=42 
 .
 
 
 NOTE: This message may contain information that is confidential, proprietary, 
 privileged or otherwise protected by law. The message is intended solely for 
 the named addressee. If received in error, please destroy and notify the 
 sender. Any use of this email is prohibited when received in error. Impetus 
 does not represent, warrant and/or guarantee, that the integrity of this 
 communication has been maintained nor that the communication is free of 
 errors, virus, interception or interference.



Re: Flushing behavior in Cassandra 0.8

2011-06-21 Thread aaron morton
The new memtable_total_space_in_mb option is kicking in 

https://github.com/apache/cassandra/blob/cassandra-0.8.0/NEWS.txt#L34
http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 Jun 2011, at 22:12, Rene Kochen wrote:

 I try to understand the flushing behavior in Cassandra 0.8
  
 When I create rows, after a few seconds, I see the following line in the log:
  
 INFO 11:18:46,470 flushing high-traffic column family 
 ColumnFamilyStore(table='Traxis', columnFamily='Customers')
 INFO 11:18:46,471 Enqueuing flush of 
 Memtable-Customers@14306556(697958/50059836 serialized/live bytes, 30346 ops)
 INFO 11:18:46,472 Writing Memtable-Customers@14306556(697958/50059836 
 serialized/live bytes, 30346 ops)
 INFO 11:18:47,415 Completed flushing 
 C:\Cassandra\Storage\data\Traxis\Customers-g-1-Data.db (4157370 bytes)
  
 The super-column-family is configured as follows:
  
 Memtable thresholds: 0.2953125/63/1440 (millions of ops/MB/minutes):
  
 I don’t think any of the three tresholds should trigger the flush?
  
 Thanks,
  
 Rene



Re: CommitLog replay

2011-06-21 Thread aaron morton
use nodetool cfstats or show keyspaces; in cassandra-cli to see the flush 
settings, default is (i think) 60 minutes, 0.1 million ops or 1/16th of hte 
heap size when the CF was created.

But under 0.8 there is an automagical global memory manager, see
https://github.com/apache/cassandra/blob/cassandra-0.8.0/NEWS.txt#L34
http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jun 2011, at 01:51, Stephen Pope wrote:

 I've only got one cf, and haven't changed the default flush expiry period. 
 I'm not sure the node had fully started or not. I had to restart my data 
 insertion (for other reasons), so I can check the system log upon restart 
 when the data is finished inserting.
 
 Do you know off-hand how long the default flush expiry period is?
 
 Cheers,
 Steve
 
 -Original Message-
 From: sc...@scode.org [mailto:sc...@scode.org] On Behalf Of Peter Schuller
 Sent: Tuesday, June 21, 2011 9:13 AM
 To: user@cassandra.apache.org
 Subject: Re: CommitLog replay
 
 I’ve got a single node deployment of 0.8 set up on my windows box. When I
 insert a bunch of data into it, the commitlogs directory doesn’t clear upon
 completion (should it?).
 
 It is expected that commit logs are retained for a while, and that
 there is reply going on when restarting a node. The main way to ensure
 that a smaller amount of commit log is active at any given moment, is
 to ensure that all column familes are flushed sufficiently often. This
 is because when column families are flushed, they are no longer
 necessitating the retention of the commit logs that contain the writes
 that were just flushed.
 
 Pay attention to whether you maybe have some cf:s that are written
 very rarely and won't flush until the flush expiry period.
 
 As a result, when I stop and restart Cassandra it
 replays all the commitlogs, then starts compacting (which seems like it’s
 taking a long time). While it’s compacting it won’t talk to my test client.
 
 That it starts compacting is expected if the data flushed as a result
 of the commit log reply triggers compactions. However, compaction does
 not imply that the node refuses to talk to clients.
 
 Are you sure the node has fully started? it should log when it starts
 up the thrift interface - check system.log.
 
 -- 
 / Peter Schuller



Re: Compressing data types

2011-06-21 Thread aaron morton
Also 
https://issues.apache.org/jira/browse/HADOOP-7206
Now part of brisk
http://www.datastax.com/dev/blog/brisk-1-0-beta-2-released

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jun 2011, at 04:04, Vijay wrote:

 You might want to watch https://issues.apache.org/jira/browse/CASSANDRA-47
 
 Regards,
 /VJ
 
 
 
 On Tue, Jun 21, 2011 at 5:14 AM, Timo Nentwig timo.nent...@toptarif.de 
 wrote:
 Hi!
 
 Just wondering why this doesn't already exist: wouldn't it make sense to have
 decorating data types that compress (gzip, snappy) other data types (esp. 
 UTF8Type,
 AsciiType) transparently?
 
 -tcn
 



Re: Storing files in blob into Cassandra

2011-06-22 Thread aaron morton
 If the Cassandra JVM is down, Tomcat and Httpd will continue to handle 
 requests. And Pelops will redirect these requests to another Cassandra node 
 on another server (maybe am I wrong with this assertion).

I was thinking of the server been turned off / broken / rebooting / 
disconnected from the network / taken out of rotation for maintenance. There 
are lots of reasons for a server to not be doing what it should be. 


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jun 2011, at 23:10, Damien Picard wrote:

 
 
 2011/6/22 aaron morton aa...@thelastpickle.com
 I think I have to detail my configuration. On every server of my cluster, I 
 deploy :
  - a Cassandra node
  - a Tomcat instance
  - the webapp, deployed on Tomcat
  - Apache httpd, in front of Tomcat with mod_jakarta
 
 You will have a bunch of services on the machine competing with each other 
 for resources (cpu, memory and network IO). It's not an approach I would 
 take. 
 
 You will also tightly couple the front end HTTP capacity to the DB capacity. 
 e.g. consider what happens when a cassandra node is down for a while, what 
 does this mean for your ability to accept http connections?
 If the Cassandra JVM is down, Tomcat and Httpd will continue to handle 
 requests. And Pelops will redirect these requests to another Cassandra node 
 on another server (maybe am I wrong with this assertion).
  
 Requests from your web app may go to the local cassandra node, but thats just 
 the coordinator. They will be forwarded onto the replicas that contain the 
 data.  
 Yes, but as you notice before, this node can be down, so I will configure 
 Pelops to redistribute requests on another node. So there is no strong couple 
 between Cassandra and Tomcat ; It will works as if they was on different 
 servers. 
 
 Data are stored with RandomPartitionner, replication factor is 2.
 
 RF 3 is the minimum RF you need to use for QUORUM to be less than the RF. 
 Thank you for this advice ; I will reconsider  the RF, but for this time, I 
 use only CL.ONE, not QUORUM. But it could change in a near future.
 
 In such case, do you advise me to store files in Cassandra ?
 
 Depends on your scale, workload and performance requirements. I would do some 
 tests about how much data you expect to hold and what sort of workloads you 
 need to support.  Personally I think files are best kept in a file system, 
 until a compelling reason is found to do other wise. 
 Thank you, I think that distributing files in the cluster with something like 
 distributed file systems is a compelling reason to store files on Cassandra. 
 I don't want to add another complex component to my arch.
 
 Hope that helps. 
 
 It does ! A lot ! Thank you. 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 22 Jun 2011, at 20:23, Damien Picard wrote:
 
 store your images / documents / etc. somewhere and reference them
 in Cassandra.  That's the consensus that's been bandied about on this
 list quite frequently
 
 Thank you for your answers.
 
 I think I have to detail my configuration. On every server of my cluster, I 
 deploy :
  - a Cassandra node
  - a Tomcat instance
  - the webapp, deployed on Tomcat
  - Apache httpd, in front of Tomcat with mod_jakarta
 
 In front of these, I use a Round-Robin DNS load balancer which balance 
 request on every httpd.
 Every Tomcat instance can access every Cassandra node, allowing them to deal 
 with every request.
 Data are stored with RandomPartitionner, replication factor is 2.
 
 In my case, it would be very easy to store images in Cassandra because these 
 images will be accessible everywhere in my cluster. If I store images in 
 FileSystem, I have to replicate them manually (probably with a distributed 
 filesystem) on every server (quite complicated). This is why I prefer to 
 store files into Cassandra.
 
 According to Sylvain, the main thing to know is the max size of a file. In 
 so far as I am on a web purpose, I can define this max file size to 10 Mb 
 (HTTP POST max size) without disapointing my users.Furthermore, most of 
 these files will not exceed 2 or 3 Mb. In such case, do you advise me to 
 store files in Cassandra ?
 
 Thank you.
 
 2011/6/22 Sylvain Lebresne sylv...@datastax.com
 Let's be more precise in saying that this all depends on the
 expected size of the documents. If you know that the documents
 will be on the few hundreds kilobytes mark on average and
 no more than a few megabytes (say  5MB, even though there is
 no magic number), then storing them as blob will work perfectly
 fine (which is not saying storing them externally with metadata in
 Cassandra won't, but using blobs can be simpler in some cases).
 
 I've very successfully stored tons of images as blobs in Cassandra.
 I just knew they couldn't get super big because the system wasn't
 allowing it.
 
 The point with the size being that each time

Re: Secondary indexes performance

2011-06-22 Thread aaron morton
 it will probably be better to denormalize and store
 some precomputed data

Yes, if you know there are queries you need to serve it is better to support 
those directly in the data model. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jun 2011, at 23:52, Wojciech Pietrzok wrote:

 OK, got some results (below).
 2 nodes, one on localhost, second on LAN, reading with
 ConsistencyLevel.ONE, buffer_size=512 rows (that's how many rows
 pycassa will get on one connection, than it will use last row_id as
 start row for next query)
 
 Queries types:
 1) get_range - just added limit of 1024 rows
 2) get_indexed_slices ASCII - one term: on indexed column with ASCII type
 3) get_indexed_slices INT - one term: on indexed column with INT type
 4) get_indexed_slices ASCII  + GTE, LTE on indexed INT - three terms:
 on indexed column with INT type + LTE, GTE on indexed column with INT
 type
 5) get_indexed_slices 2 terms, ASCII - two terms, both columns
 indexed, with ASCII type
 6) get_indexed_slices ASCII + GTE, LTE on non indexed INT - like 4)
 but LTE, GTE are on non-indexed column
 
 3 runs for each set of queries, on successive runs times were better.
 Times are in seconds
 
 
 But if you say that 1024 rows is reasonably big slice (not mentioning
 over 10k rows) it will probably be better to denormalize and store
 some precomputed data
 
 
 Results:
 
 # Run 1
 PERF: [a] get_range: 0.58[s]
 PERF: [a] get_indexed_slices ASCII: 3.96[s]
 PERF: [a] get_indexed_slices INT: 1.82[s]
 PERF: [a] get_indexed_slices INT + GTE, LTE on indexed INT: 1.31[s] #
 314 returned
 PERF: [cr] get_indexed_slices ASCII: 1.13[s]
 PERF: [cr] get_indexed_slices 2 terms, ASCII: 8.69[s]
 
 # Run 2, same queries
 PERF: [a] get_range: 0.33[s]
 PERF: [a] get_indexed_slices ASCII: 0.36[s]
 PERF: [a] get_indexed_slices INT: 5.39[s]
 PERF: [a] get_indexed_slices INT + GTE, LTE on indexed INT : 5.42[s] #
 314 returned
 PERF: [cr] get_indexed_slices ASCII: 0.55[s]
 PERF: [cr] get_indexed_slices 2 terms, ASCII: 3.57[s]
 
 # Run 3, same queries
 PERF: [a] get_range: 0.18[s]
 PERF: [a] get_indexed_slices ASCII: 0.39[s]
 PERF: [a] get_indexed_slices INT: 0.83[s]
 PERF: [a] get_indexed_slices INT + GTE, LTE on indexed INT : 0.85[s] #
 314 returned
 PERF: [cr] get_indexed_slices ASCII: 0.39[s]
 PERF: [cr] get_indexed_slices 2 terms, ASCII: 3.36[s]
 
 # changed some terms, so always 1024 returned are returned
 # Run 1
 PERF: [a] get_range: 0.31[s]
 PERF: [a] get_indexed_slices ASCII: 3.14[s]
 PERF: [a] get_indexed_slices INT: 0.70[s]
 PERF: [a] get_indexed_slices INT + GTE, LTE on indexed INT : 4.72[s]
 PERF: [cr] get_indexed_slices ASCII: 0.73[s]
 PERF: [cr] get_indexed_slices 2 terms, ASCII: 0.85[s]
 PERF: [cr] get_indexed_slices ASCII + GTE, LTE on non indexed INT : 2.17[s]
 
 # Run 2, same queries
 PERF: [a] get_range: 0.20[s]
 PERF: [a] get_indexed_slices ASCII: 0.60[s]
 PERF: [a] get_indexed_slices INT: 1.22[s]
 PERF: [a] get_indexed_slices INT + GTE, LTE on indexed INT : 1.27[s]
 PERF: [cr] get_indexed_slices ASCII: 0.48[s]
 PERF: [cr] get_indexed_slices 2 terms, ASCII: 0.50[s]
 PERF: [cr] get_indexed_slices ASCII + GTE, LTE on non indexed INT : 2.22[s]
 
 # Run 3, same queries
 PERF: [a] get_range: 0.25[s]
 PERF: [a] get_indexed_slices ASCII: 0.44[s]
 PERF: [a] get_indexed_slices INT: 0.89[s]
 PERF: [a] get_indexed_slices INT + GTE, LTE on indexed INT : 6.58[s]
 PERF: [cr] get_indexed_slices ASCII: 1.18[s]
 PERF: [cr] get_indexed_slices 2 terms, ASCII: 0.50[s]
 PERF: [cr] get_indexed_slices ASCII + GTE, LTE on non indexed INT : 2.09[s]
 
 
 
 
 2011/6/21 aaron morton aa...@thelastpickle.com:
 Can you provide some more information on the query you are running ? How 
 many terms are you selecting with?
 
 How long does it take to return 1024 rows ? IMHO thats a reasonably big 
 slice to get.
 
 The server will pick the most selective equality predicate, and then filter 
 the results from that using the other predicates.
 
 Cheers
 
 
 -- 
 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  KosciaK mail: kosci...@gmail.com
www : http://kosciak.net/
 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-



Re: insufficient space to compact even the two smallest files, aborting

2011-06-22 Thread aaron morton
Setting them to 2 and 2 means compaction can only ever compact 2 files at time, 
so it will be worse off.

Lets the try following:

- restore the compactions settings to the default 4 and 32
- run `ls -lah` in the data dir and grab the output
- run `nodetool flush` this will trigger minor compaction once the memtables 
have been flushed
- check the logs for messages from 'CompactionManager'
- when done grab the output from  `ls -lah` again. 

Hope that helps. 

 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23 Jun 2011, at 02:04, Héctor Izquierdo Seliva wrote:

 Hi All. I set the compaction threshold at minimum 2, maximum 2 and try
 to run compact, but it's not doing anything. There are over 69 sstables
 now, read performance is horrible, and it's taking an insane amount of
 space. Maybe I don't quite get how the new per bucket stuff works, but I
 think this is not normal behaviour.
 
 El lun, 13-06-2011 a las 10:32 -0500, Jonathan Ellis escribió:
 As Terje already said in this thread, the threshold is per bucket
 (group of similarly sized sstables) not per CF.
 
 2011/6/13 Héctor Izquierdo Seliva izquie...@strands.com:
 I was already way over the minimum. There were 12 sstables. Also, is
 there any reason why scrub got stuck? I did not see anything in the
 logs. Via jmx I saw that the scrubbed bytes were equal to one of the
 sstables size, and it stuck there for a couple hours .
 
 El lun, 13-06-2011 a las 22:55 +0900, Terje Marthinussen escribió:
 That most likely happened just because after scrub you had new files
 and got over the 4 file minimum limit.
 
 https://issues.apache.org/jira/browse/CASSANDRA-2697
 
 Is the bug report.
 
 
 
 
 
 
 
 
 
 



Re: Strange Connection error of nodetool

2011-06-22 Thread aaron morton
Check the list here 
http://wiki.apache.org/cassandra/JmxGotchas

I *think* the jmx server tells the client to connect back on another host/port.

Hope that helps. 
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jun 2011, at 21:02, 박상길 wrote:

 Hi.
 
 I'm running 5 cassandra nodes. Say, the addresses are 112.234.123.111 ~ 
 112.234.123.115; the real address is different. 
 When I run nodetool, the one node of address 112.234.123.112 has failed to 
 connect. Showing error message like this. 
 
 iPark:~ hayarobi$ nodetool --host 112.234.123.112 ring
 Error connection to remote JMX agent!
 java.rmi.ConnectException: Connection refused to host: 122.234.123.112; 
 nested exception is: 
 
 The host to connect address differ! I had tried to query 112.* but, the 
 nodetool tried to connect 122.*. It happened just one machine. All other 
 machines works fine. 
 And I can connect to 112.234.123.112 by cassandra-cli or other tools using 
 other port (such as 22 of ssh, 80 of http). It has trouble only on nodetool.
 
 Does anyone has an idea? 
 
 I'll paste the full stack trace below.
 
 iPark:~ hayarobi$ nodetool --host 112.234.123.111 ring
 Address Status State   LoadOwnsToken  
  
   
 136112946768375 
 112.234.123.111  Up Normal  725.01 KB   20.00%  0 
   
 112.234.123.112  Up Normal  725.93 KB   20.00%  
 340282366920938000  
 112.234.123.113  Up Normal  728.2 KB20.00%  
 680564733841877000  
 112.234.123.114  Up Normal  713.1 KB20.00%  
 102084710076282 
 112.234.123.115  Up Normal  722.67 KB   20.00%  
 136112946768375 
 iPark:~ hayarobi$ nodetool --host 112.234.123.115 ring
 Address Status State   LoadOwnsToken  
  
   
 136112946768375 
 112.234.123.111  Up Normal  725.01 KB   20.00%  0 
   
 112.234.123.112  Up Normal  725.93 KB   20.00%  
 340282366920938000  
 112.234.123.113  Up Normal  728.2 KB20.00%  
 680564733841877000  
 112.234.123.114  Up Normal  713.1 KB20.00%  
 102084710076282 
 112.234.123.115  Up Normal  722.67 KB   20.00%  
 136112946768375 
 iPark:~ hayarobi$ nodetool --host 112.234.123.114 ring
 Address Status State   LoadOwnsToken  
  
   
 136112946768375 
 112.234.123.111  Up Normal  725.01 KB   20.00%  0 
   
 112.234.123.112  Up Normal  725.93 KB   20.00%  
 340282366920938000  
 112.234.123.113  Up Normal  728.2 KB20.00%  
 680564733841877000  
 112.234.123.114  Up Normal  713.1 KB20.00%  
 102084710076282 
 112.234.123.115  Up Normal  722.67 KB   20.00%  
 136112946768375 
 iPark:~ hayarobi$ nodetool --host 112.234.123.112 ring
 Error connection to remote JMX agent!
 java.rmi.ConnectException: Connection refused to host: 122.234.123.112; 
 nested exception is: 
java.net.ConnectException: Connection refused
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601)
at 
 sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:110)
at javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown 
 Source)
at 
 javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2327)
at 
 javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:279)
at 
 javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)
at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:137)
at org.apache.cassandra.tools.NodeProbe.init(NodeProbe.java:107)
at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:511)
 Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200

Re: unsubscribe

2011-06-22 Thread aaron morton
http://wiki.apache.org/cassandra/FAQ#unsubscribe


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23 Jun 2011, at 06:02, Carey Hollenbeck wrote:

 unsubscribe
  
 From: William Oberman [mailto:ober...@civicscience.com] 
 Sent: Wednesday, June 22, 2011 1:46 PM
 To: user@cassandra.apache.org
 Subject: Re: rpm from 0.7.x - 0.8?
  
 Thanks Jonathan.  I'm sure it's been true for everyone else as well, but the 
 rolling upgrade seems to have worked like a charm for me (other than the JMX 
 port # changing initial confusion).
 
 One minor thing that probably particular to my case: when I removed the old 
 package, it unlinked my symlink /var/lib/cassandra/data (rather than edit the 
 cassandra config, I symlinked my amazon disk to where cassandra expected it). 
  At first I thought I had lost all of my data, but after restoring the link, 
 everything was happy.
 
 will
 
 On Wed, Jun 22, 2011 at 12:34 PM, Jonathan Ellis jbel...@gmail.com wrote:
 Doesn't matter.  auto_bootstrap only applies to first start ever.
 
 On Wed, Jun 22, 2011 at 10:48 AM, William Oberman
 ober...@civicscience.com wrote:
  I have a question about auto_bootstrap.  When I originally brought up the
  cluser, I did:
  -seed with auto_boot = false
  -1,2,3 with auto_boot = true
 
  Now that I'm doing a rolling upgrade, do I set them all to auto_boot =
  true?  Or does the seed stay false?  Or should I mark them all false?  I
  have manually set tokens on all of the.
 
  The doc confused me:
  Set to 'true' to make new [non-seed] nodes automatically migrate the right
  data to themselves. (If no InitialToken is specified, they will pick one
  such that they will get half the range of the most-loaded node.) If a node
  starts up without bootstrapping, it will mark itself bootstrapped so that
  you can't subsequently accidently bootstrap a node with data on it. (You can
  reset this by wiping your data and commitlog directories.)
  Default is: 'false', so that new clusters don't bootstrap immediately. You
  should turn this on when you start adding new nodes to a cluster that
  already has data on it.
 
  I'm not adding new nodes, but the cluster does have data on it...
 
  will
 
  On Wed, Jun 22, 2011 at 11:39 AM, William Oberman ober...@civicscience.com
  wrote:
 
  I just did a remove then install, and it seems to work.
 
  For those of you out there with JMX issues, the default port moved from
  8080 to 7199 (which includes the internal default to nodetool).  I was
  confused why nodetool ring would fail on some boxes and not others.  I had
  to add -p depending on the version of nodetool
 
  will
 
  On Wed, Jun 22, 2011 at 10:15 AM, William Oberman
  ober...@civicscience.com wrote:
 
  I'm running 0.7.4 from rpm (riptano).  If I do a yum upgrade, it's trying
  to do 0.7.6.  To get 0.8.x I have to do install apache-cassandra08.  But
  that is going to install two copies.
 
  Is there a semi-official way of properly upgrading to 0.8 via rpm?
 
  --
  Will Oberman
  Civic Science, Inc.
  3030 Penn Avenue., First Floor
  Pittsburgh, PA 15201
  (M) 412-480-7835
  (E) ober...@civicscience.com
 
 
 
  --
  Will Oberman
  Civic Science, Inc.
  3030 Penn Avenue., First Floor
  Pittsburgh, PA 15201
  (M) 412-480-7835
  (E) ober...@civicscience.com
 
 
 
  --
  Will Oberman
  Civic Science, Inc.
  3030 Penn Avenue., First Floor
  Pittsburgh, PA 15201
  (M) 412-480-7835
  (E) ober...@civicscience.com
 
 
 
 
 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com
 
 
 
 -- 
 Will Oberman
 Civic Science, Inc.
 3030 Penn Avenue., First Floor
 Pittsburgh, PA 15201
 (M) 412-480-7835
 (E) ober...@civicscience.com



Re: Atomicity Strategies

2011-06-22 Thread aaron morton
Atomic on a single machine yes. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23 Jun 2011, at 09:42, AJ wrote:

 On 4/9/2011 7:52 PM, aaron morton wrote:
 My understanding of what they did with locking (based on the examples) was 
 to achieve a level of transaction isolation 
 http://en.wikipedia.org/wiki/Isolation_(database_systems) 
 http://en.wikipedia.org/wiki/Isolation_%28database_systems%29
 
 I think the issue here is more about atomicity 
 http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic
 
 We cannot guarantee that all or none of the mutations in your batch are 
 completed. There is some work in this area though 
 https://issues.apache.org/jira/browse/CASSANDRA-1684
 
 
 Just to be clear, you are speaking in the general sense, right?  The batch 
 mutate link you provide says that in the case that ALL the mutates of the 
 batch are for the SAME key (row), then the whole batch is atomic:
 
As a special case, mutations against a single key are atomic but not 
 isolated.
 
 So, is it true that if I want to update multiple columns for one key, then it 
 will be an all or nothing update for the whole batch if using batch update?  
 But, if your batch mutate containts mutates for more than one key, then all 
 the updates for one key will be atomic, followed by all the updates for the 
 next key will be atomic, and so on.  Correct?
 
 Thanks!
 



Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

2011-06-22 Thread aaron morton
 1. Is it feasible to run directly against a Cassandra data directory restored 
 from an EBS snapshot? (as opposed to nodetool snapshots restored from an EBS 
 snapshot).

I dont have experience with the EBS snapshot, but I've never been a fan of OS 
level snapshots that are not coordinated with the DB layer. 

 2. Noting the wiki's consistent Cassandra backups advice; if I schedule 
 nodetool snapshots across the cluster, should the relative age of the 
 'sibling' snapshots be a concern? How far apart can they be before its a 
 problem? (seconds? minutes? hours?)

Consider the snapshot to be from the time of the first one. 

Previous discussion on AWS backup 
http://www.mail-archive.com/user@cassandra.apache.org/msg12831.html

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23 Jun 2011, at 10:48, Thoku Hansen wrote:

 I have a couple of questions regarding the coordination of Cassandra nodetool 
 snapshots with Amazon EBS snapshots as part of a Cassandra backup/restore 
 strategy.
 
 Background: I have a cluster running in EC2. Its nodes are configured like so:
 
 * Instance type: m1.xlarge
 * Cassandra commit log writing to RAID-0 ephemeral storage
 * Cassandra data writing to an EBS volume.
 
 Note: there is a lot of conflicting information/advice about using Cassandra 
 in EC2 w.r.t ephemeral vs. EBS. The above configuration seems to work well 
 for my application. I only described this to provide context for my EBS 
 snapshotting question. With respect, I hope not to debate Cassandra 
 performance for ephemeral vs. EBS in this thread!
 
 I am setting up a process that performs regular EBS (-S3) snapshots for the 
 purpose of backing up Cassandra plus other data.
 I presume this will need to be coordinated with regular Cassandra (nodetool) 
 snapshots also.
 
 My questions:
 1. Is it feasible to run directly against a Cassandra data directory restored 
 from an EBS snapshot? (as opposed to nodetool snapshots restored from an EBS 
 snapshot).
 2. Noting the wiki's consistent Cassandra backups advice; if I schedule 
 nodetool snapshots across the cluster, should the relative age of the 
 'sibling' snapshots be a concern? How far apart can they be before its a 
 problem? (seconds? minutes? hours?)
 
 My motivation for these two questions: I'm trying to figure out how much 
 effort needs to be put into:
 * Time-coordinated scheduling of nodetool snapshots across the cluster
 * Automation of the process of determining the most appropriate set of 
 nodetool snapshots to use when restoring a cluster.
 
 Thanks!



Re: Decorator Algorithm

2011-06-23 Thread aaron morton
Various places in the code call IPartitioner.decorateKey() which returns a 
DecoratedKeyT which contains both the original key and the TokenT

The RandomPartitioner md5 to hash the key ByteBuffer and create a BigInteger. 
OPP converts the key into utf8 encoded String.  

Using the token to find which endpoints contain replicas is done by the 
AbstractReplicationStrategy.calculateNaturalEndpoints() implementations. 

Does that help? 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23 Jun 2011, at 19:58, Jonathan Colby wrote:

 Hi -
 
 I'd like to understand more how the token is hashed with the key to determine 
 on which node the data is stored - called decorating in cassandra speak.
 
 Can anyone share any documentation on this or describe this more in detail?   
 Yes, I could look at the code, but I was hoping to be able to read more about 
 how it works first.
 
 thanks.



Re: insufficient space to compact even the two smallest files, aborting

2011-06-23 Thread aaron morton
Missed that in the history, cheers. 
A
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23 Jun 2011, at 20:26, Sylvain Lebresne wrote:

 As Jonathan said earlier, you are hitting
 https://issues.apache.org/jira/browse/CASSANDRA-2765
 
 This will be fixed in 0.8.1 that is currently under a vote and should be
 released soon (let's say beginning of next week, maybe sooner).
 
 --
 Sylvain
 
 2011/6/23 Héctor Izquierdo Seliva izquie...@strands.com:
 Hi Aaron. Reverted back to 4-32. Did the flush but it did not trigger
 any minor compaction. Ran compact by hand, and it picked only two
 sstables.
 
 Here's the ls before:
 
 http://pastebin.com/xDtvVZvA
 
 And this is the ls after:
 
 http://pastebin.com/DcpbGvK6
 
 Any suggestions?
 
 
 
 El jue, 23-06-2011 a las 10:55 +1200, aaron morton escribió:
 Setting them to 2 and 2 means compaction can only ever compact 2 files at 
 time, so it will be worse off.
 
 Lets the try following:
 
 - restore the compactions settings to the default 4 and 32
 - run `ls -lah` in the data dir and grab the output
 - run `nodetool flush` this will trigger minor compaction once the 
 memtables have been flushed
 - check the logs for messages from 'CompactionManager'
 - when done grab the output from  `ls -lah` again.
 
 Hope that helps.
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 23 Jun 2011, at 02:04, Héctor Izquierdo Seliva wrote:
 
 Hi All. I set the compaction threshold at minimum 2, maximum 2 and try
 to run compact, but it's not doing anything. There are over 69 sstables
 now, read performance is horrible, and it's taking an insane amount of
 space. Maybe I don't quite get how the new per bucket stuff works, but I
 think this is not normal behaviour.
 
 El lun, 13-06-2011 a las 10:32 -0500, Jonathan Ellis escribió:
 As Terje already said in this thread, the threshold is per bucket
 (group of similarly sized sstables) not per CF.
 
 2011/6/13 Héctor Izquierdo Seliva izquie...@strands.com:
 I was already way over the minimum. There were 12 sstables. Also, is
 there any reason why scrub got stuck? I did not see anything in the
 logs. Via jmx I saw that the scrubbed bytes were equal to one of the
 sstables size, and it stuck there for a couple hours .
 
 El lun, 13-06-2011 a las 22:55 +0900, Terje Marthinussen escribió:
 That most likely happened just because after scrub you had new files
 and got over the 4 file minimum limit.
 
 https://issues.apache.org/jira/browse/CASSANDRA-2697
 
 Is the bug report.
 
 
 
 
 
 
 
 
 
 
 
 
 
 



Re: get_range_slices result

2011-06-23 Thread aaron morton
Not sure what your question is. 

Does this help ? http://wiki.apache.org/cassandra/FAQ#range_rp

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23 Jun 2011, at 21:59, karim abbouh wrote:

 how can get_range_slices() function returns sorting key ?
 BR



Re: RAID or no RAID

2011-06-27 Thread aaron morton
RAID0 so you have one big volume. 

For performance (cassandra does not stripe sstables across the data dirs) and 
otherwise you'll have fragmentation and wont be able to utilise all your space.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 28 Jun 2011, at 11:46, mcasandra wrote:

 Which one is preferred RAID0 or spreading data files accross various disks on
 the same node? I like RAID0 but what would be the most convincing argument
 to put additional RAID controller card in the machine?
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RAID-or-no-RAID-tp6522904p6522904.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: Clock skew

2011-06-27 Thread aaron morton
Without exception the timestamp is set by the client, not the server. The one 
exception to the without exception rule is CounterColumnType operations. 

If you are in a situation where you need better timing than you can get with 
ntp you should try to design around it. 
 
Hope that helps. 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 28 Jun 2011, at 10:03, A J wrote:

 During writes, the timestamp field in the column is the system-time of
 that node (correct me if that is not the case and the system-time of
 the co-ordinator is what gets applied to all the replicas).
 During reads, the latest write wins.
 
 What if there is a clock skew ? It could lead to a stale write
 over-riding the actual latest write, just because the clock of that
 node is ahead of the other node. Right ?



Re: RAID or no RAID

2011-06-27 Thread aaron morton
 Not sure what the intended purpose is, but we've mostly used it as an 
 emergency disk-capacity-increase option

Thats what I've used it for.  

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 28 Jun 2011, at 15:55, Dan Kuebrich wrote:

 Not sure what the intended purpose is, but we've mostly used it as an 
 emergency disk-capacity-increase option.  It's not as good as raid because 
 each disk size is counted individually (a compacted sstable can only be on 
 one disk) so compaction size limits aren't expanded as one might expect.
 
 On Mon, Jun 27, 2011 at 11:30 PM, mcasandra mohitanch...@gmail.com wrote:
 I thought there is an option to give multiple data dirs in cassandra.yaml.
 What's the purpose of that?
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RAID-or-no-RAID-tp6522904p6523523.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.
 



Re: remove all the columns of a key in a column family

2011-06-28 Thread aaron morton
That error is thrown if you send a Deletion with a predicate that has neither 
columns or a SliceRange. 

Send a Deletion that does not have a predicate. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 28 Jun 2011, at 18:11, Donna Li wrote:

 To delete all the columns for row send a Mutation where the Deletion has 
 neither a super_column or predicate 
 I test, but throw the exception “A SlicePredicate must be given a list of 
 Columns, a SliceRange, or both”
  
 Best Regards
 Donna li
  
 发件人: aaron morton [mailto:aa...@thelastpickle.com] 
 发送时间: 2011年6月28日 12:30
 收件人: user@cassandra.apache.org
 主题: Re: remove all the columns of a key in a column family
  
 AFAIK that is still not supported. 
  
 To delete all the columns for row send a Mutation where the Deletion has 
 neither a super_column or predicate 
  
 Cheers
  
  
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
  
 On 28 Jun 2011, at 15:50, Donna Li wrote:
 
 
  
 Cassandra version is 0.7.2, when I use batch_mutate, the following exception 
 throw “TException:Deletion does not yet support SliceRange predicates”, which 
 version support delete the whole row of a key?
  
  
 Best Regards
 Donna li
  
 发件人: Donna Li 
 发送时间: 2011年6月28日 10:59
 收件人: user@cassandra.apache.org
 主题: remove all the columns of a key in a column family
  
 All:
 Can I remove all the columns of a key in a column family under the condition 
 that not know what columns the column family has?
  
  
 Best Regards
 Donna li
  



Re: Truncate introspection

2011-06-28 Thread aaron morton
Drop CF takes a snapshot of the CF first, and then marks SSTables on disk as 
compacted so they will be safely deleted later. Finally it removes the CF from 
the meta data. 

If you see the SSTables on disk, you should see 0 length .compacted files for 
every one of them. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 28 Jun 2011, at 20:00, David Boxenhorn wrote:

 Does drop work in a similar way?
 
 When I drop a CF and add it back with a different schema, it seems to work.
 
 But I notice that in between the drop and adding it back, when the CLI
 tells me the CF doesn't exist, the old data is still there.
 
 I've been assuming that this works, but just wanted to make sure...
 
 On Tue, Jun 28, 2011 at 12:56 AM, Jonathan Ellis jbel...@gmail.com wrote:
 Each node (independently) has logic that guarantees that any writes
 processed before the truncate, will be wiped out.
 
 This does not mean that each node will wipe out the same data, or even
 that each node will process the truncate (which would result in a
 timedoutexception).
 
 It also does not mean you can't have writes immediately after the
 truncate that would race w/ a truncate, check for zero sstables
 procedure.
 
 On Mon, Jun 27, 2011 at 3:35 PM, Ethan Rowe et...@the-rowes.com wrote:
 If those went to zero, it would certainly tell me something happened.  :)  I
 guess watching that would be a way of seeing something was going on.
 Is the truncate itself propagating a ring-wide marker or anything so the CF
 is logically empty before being physically removed?  That's the impression
 I got from the docs but it wasn't totally clear to me.
 
 On Mon, Jun 27, 2011 at 3:33 PM, Jonathan Ellis jbel...@gmail.com wrote:
 
 There's a JMX method to get the number of sstables in a CF, is that
 what you're looking for?
 
 On Mon, Jun 27, 2011 at 1:04 PM, Ethan Rowe et...@the-rowes.com wrote:
 Is there any straightforward means of seeing what's going on after
 issuing a
 truncate (on 0.7.5)?  I'm not seeing evidence that anything actually
 happened.  I've disabled read repair on the column family in question
 and
 don't have anything actively reading/writing at present, apart from my
 one-off tests to see if rows have disappeared.
 Thanks in advance.
 
 
 
 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com
 
 
 
 
 
 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com
 



Re: Re : Re : get_range_slices result

2011-06-28 Thread aaron morton
First thing is you really should upgrade from 0.6, the current release is 0.8. 

Info on time uuid's
http://wiki.apache.org/cassandra/FAQ#working_with_timeuuid_in_java

If you are using a higher level client like Hector or Pelops it will take care 
of encoding for you. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 28 Jun 2011, at 22:20, karim abbouh wrote:

 can i have an example for usingTimeUUIDType   as comparator  in a client  
 java code.
 
 De : karim abbouh karim_...@yahoo.fr
 À : user@cassandra.apache.org user@cassandra.apache.org
 Envoyé le : Lundi 27 Juin 2011 17h59
 Objet : Re : Re : get_range_slices result
 
 i used TimeUUIDType as type in storage-conf.xml file
  ColumnFamily Name=table CompareWith=TimeUUIDType /
 
 and i used it as comparator in my java code,
 but in the execution i get exception : 
 Erreur --java.io.UnsupportedEncodingException: TimeUUIDType
 
 
 how can i write it?
 
 BR
 
 De : David Boxenhorn da...@citypath.com
 À : user@cassandra.apache.org
 Cc : karim abbouh karim_...@yahoo.fr
 Envoyé le : Vendredi 24 Juin 2011 11h25
 Objet : Re: Re : get_range_slices result
 
 You can get the best of both worlds by repeating the key in a column,
 and creating a secondary index on that column.
 
 On Fri, Jun 24, 2011 at 1:16 PM, Sylvain Lebresne sylv...@datastax.com 
 wrote:
  On Fri, Jun 24, 2011 at 10:21 AM, karim abbouh karim_...@yahoo.fr wrote:
  i want get_range_slices() function returns records sorted(orded)  by the
  key(rowId) used during the insertion.
  is it possible?
 
  You will have to use the OrderPreservingPartitioner. This is no
  without inconvenience however.
  See for instance
  http://wiki.apache.org/cassandra/StorageConfiguration#line-100 or
  http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/
  that give more details on the pros and cons (the short version being
  that the main advantage of
  OrderPreservingPartitioner is what you're asking for, but it's main
  drawback is that load-balancing
  the cluster will likely be very very hard).
 
  In general the advice is to stick with RandomPartitioner and design a
  data model that avoids needing
  range slices (or at least needing that the result is sorted). This is
  very often not too hard and more
  efficient, and much more simpler than to deal with the load balancing
  problems of OrderPreservingPartitioner.
 
  --
  Sylvain
 
 
  
  De : aaron morton aa...@thelastpickle.com
  À : user@cassandra.apache.org
  Envoyé le : Jeudi 23 Juin 2011 20h30
  Objet : Re: get_range_slices result
 
  Not sure what your question is.
  Does this help ? http://wiki.apache.org/cassandra/FAQ#range_rp
  Cheers
  -
  Aaron Morton
  Freelance Cassandra Developer
  @aaronmorton
  http://www.thelastpickle.com
  On 23 Jun 2011, at 21:59, karim abbouh wrote:
 
  how can get_range_slices() function returns sorting key ?
  BR
 
 
 
 
 
 
 
 
 



Re: Query indexed column with key filter‏

2011-06-28 Thread aaron morton
Currently these are two different types of query, using a key range is 
equivalent to the get_range_slices() API function and column clauses is a 
get_indexed_slices() call. So you would be asking for a potentially painful 
join between.

Creating a column with the same value as the key sounds reasonable. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 29 Jun 2011, at 05:31, Daning wrote:

 I found this code
 
// Start and finish keys, *and* column relations (KEY  foo AND KEY  
 bar and name1 = value1).
if (select.isKeyRange()  (select.getKeyFinish() != null)  
 (select.getColumnRelations().size()  0))
throw new InvalidRequestException(You cannot combine key range 
 and by-column clauses in a SELECT);
 
 in
 
 http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/cql/QueryProcessor.java
 
 
 This operation is exactly what I want - query by column then filter by key. I 
 want to know why this query is not supported, and what's the good work around 
 for it? At this moment my workaound is to create a column which is exactly 
 same as key.
 
 Thanks,
 
 Daning



Re: Server-side CQL parameters substitution

2011-06-28 Thread aaron morton
see https://issues.apache.org/jira/browse/CASSANDRA-2475

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 29 Jun 2011, at 08:45, Michal Augustýn wrote:

 Hi all,
 
 in most SQL implementations, it's possible to declare parameters in
 SQL command text (i.e. SELECT * FROM T WHERE Id=@myId). Then the
 client application sends this SQL command and parameters values
 separately - the server is responsible for the parameters
 substitution.
 
 In CQL API (~the execute_cql_query method), we must compose the
 command (~substitute the parameters) in client application, the same
 code must be re-implemented in all drivers (Java, Python, Node.js,
 .NET, ...) respectively. And that's IMHO tedious and error prone.
 
 So do you/we plane to improve CQL API in this way?
 
 Thanks!
 
 Augi
 
 P.S.: Yes, I'm working on .NET driver and I'm too lazy to implement
 client-side parameters substitution ;-)



Re: custom reconciling columns?

2011-06-28 Thread aaron morton
Can you provide some more info:

- how big are the rows, e.g. number of columns and column size  ? 
- how much data are you asking for ? 
- what sort of read query are you using ? 
- what sort of numbers are you seeing ?
- are you deleting columns or using TTL ? 

I would consider issues with the data churn, data model and query before 
looking at serialisation. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 29 Jun 2011, at 10:37, Yang wrote:

 I can see that as my user history grows, the reads time proportionally ( or 
 faster than linear) grows.
 if my business requirements ask me to keep a month's history for each user, 
 it could become too slow.- I was suspecting that it's actually the 
 serializing and deserializing that's taking time (I can definitely it's cpu 
 bound)
 
 
 
 On Tue, Jun 28, 2011 at 3:04 PM, aaron morton aa...@thelastpickle.com wrote:
 There is no facility to do custom reconciliation for a column. An append 
 style operation would run into many of the same problems as the Counter type, 
 e.g. not every node may get an append and there is a chance for lost appends 
 unless you go to all the trouble Counter's do.
 
 I would go with using a row for the user and columns for each item. Then you 
 can have fast no look writes.
 
 What problems are you seeing with the reads ?
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 29 Jun 2011, at 04:20, Yang wrote:
 
  for example, if I have an application that needs to read off a user 
  browsing history, and I model the user ID as the key,
  and the history data within the row. with current approach, I could model 
  each visit as  a column,
  the possible issue is that *possibly* (I'm still doing a lot of profiling 
  on this to verify) that a lot of time is spent on serialization into the 
  message and out of the
  message, plus I do not need the full features provided by the column : for 
  example I do not need a timestamp on each visit, etc,
  so it might be faster to put the entire history in a blob, and each visit 
  only takes up a few bytes in the blob, and
  my code manipulates the blob.
 
  problem is, I still need to avoid the read-before-write, so I send only the 
  latest visit, and let cassandra do the reconcile, which appends the
  visit to the blob, so this needs custom reconcile behavior.
 
  is there a way to incorporate such custom reconcile under current code 
  framework? (I see custom sorting, but no custom reconcile)
 
  thanks
  yang
 
 



Re: Cannot set column value to zero

2011-06-29 Thread aaron morton
The extra () in the describe keyspace output is only there if the column 
comparator is the BytesType, the client tries to format the data as UTF8. 

Dont forget truncate is doing snapshots, so check the snapshots dir and delete 
things if you are using it a lot for testing. 

The 0 == 1 thing does not ring any bells. Let us know if it happens again. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 30 Jun 2011, at 02:13, dnalls...@taz.qinetiq.com wrote:

 I had a strange problem recently where I was unable to set the value of a 
 column
 to '0' (it always returned '1') but setting it to other values worked fine:
 
 [default@Test] set Urls['rowkey']['status']='1';
 Value inserted.
 [default@Test] get Urls['rowkey'];
 = (column=status, value=1, timestamp=1309189541891000)
 Returned 1 results.
 
 [default@Test] set Urls['rowkey']['status']='0';
 Value inserted.
 [default@Test] get Urls['rowkey'];
 = (column=status, value=1, timestamp=1309189551407616)
 Returned 1 results.
 
 This was on a one-node test cluster (v0.7.6) with no other clients; setting
 other values (e.g. '9') worked fine. However, attempting to set the value back
 to '0' always resulted in a value of '1'.
 
 I noticed this shortly after truncating the CF.
 
 The column family was shown as follows below. One thing that looks odd is that
 on other test clusters the Column Name is followed by a reference to
 the index, e.g. Column Name: status (737461747573) - but here it isn't.
 
 I was wondering if there was some interaction between truncating the CF and 
 the
 use of a KEYS index? (Presumably it would be safer to delete all data
 directories in order to wipe the cluster during experimentation, rather than
 truncating?)
 
 Unfortunately I'm not sure how to recreate the situation as this was a test
 machine on which I played around with various configurations - but maybe
 someone has seen a similar problem elsewhere? In the end I had to wipe the 
 data
 and start again, and all seemed fine, although the index reference is still
 absent as mentioned above.
 
 [default@Test] describe keyspace;
 Keyspace: Test:
 ...
ColumnFamily: Foo
  default_validation_class: org.apache.cassandra.db.marshal.BytesType
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period in seconds: 0.0/0
  Key cache size / save period in seconds: 0.0/14400
  Memtable thresholds: 0.5/128/60 (millions of ops/minutes/MB)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Built indexes: [Foo.737461747573]
  Column Metadata:
Column Name: status
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Type: KEYS
 ...
 
 
 This message was sent using IMP, the Internet Messaging Program.
 
 This email and any attachments to it may be confidential and are
 intended solely for the use of the individual to whom it is addressed.
 If you are not the intended recipient of this email, you must neither
 take any action based upon its contents, nor copy or show it to anyone.
 Please contact the sender if you believe you have received this email in
 error. QinetiQ may monitor email traffic data and also the content of
 email for the purposes of security. QinetiQ Limited (Registered in
 England  Wales: Company Number: 3796233) Registered office: Cody Technology 
 Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.



Re: hadoop results

2011-06-29 Thread aaron morton
How about  get_slice() with reversed == true and count = 1 to get the highest 
time UUID ? 

Or you can also store a column with a magic name that have the value of the 
timeuuid that is the current metric to use. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 30 Jun 2011, at 06:35, William Oberman wrote:

 I'll start with my question: given a CF with comparator TimeUUIDType, what is 
 the most efficient way to get the greatest column's value?
 
 Context: I've been running cassandra for a couple of months now, so obviously 
 it's time to start layering more on top :-)  In my test environment, I 
 managed to get pig/hadoop running, and developed a few scripts to collect 
 metrics I've been missing since I switched from MySQL to cassandra (including 
 the ever useful select count(*) from table equivalent).  
 
 I was hoping to dump the results of this processing back into cassandra for 
 use in other tools/processes.  My initial thought was: new CF called stats 
 with comparator TimeUUIDType.  The basic idea being I'd store:
 stat_name - time stat was computed (as UUID) - value
 That way I can also see a historical perspective of any given stat for 
 auditing (and for cumulative stats to see trends).  The stat_name itself is a 
 URI that is composed of what and any constraints on the what (including 
 an optional time range, if the stat supports it).  E.g. 
 ClassOfSomething/ID/MetricName/OptionalTimeRange (or something, still 
 deciding on the format of the URI).  But, right now, the only way I know to 
 get the current stat value would be to iterate over all columns (the 
 TimeUUIDs) and then return the last one.
 
 Thanks for any tips,
 
 will



Re: Chunking if size 64MB

2011-06-29 Thread aaron morton
AFAIK there is no server side chunking of column values.

This link http://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage is 
just suggesting in the app you do not store more than 64MB per column. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 30 Jun 2011, at 07:25, A J wrote:

 From what I read, Cassandra allows a single column value to be up-to
 2GB but would chunk the data if greater than 64MB.
 Is the chunking transparent to the application or does the app need to
 know if/how/when the chunking happened for a specific column value
 that happened to be  64MB.
 
 Thank you.



Re: SimpleAuthenticator

2011-06-30 Thread aaron morton
cassandra.in.sh is old skool 0.6 series, 0.7 series uses cassandra-env.sh. The 
packages put it in /etc/cassandra.
 
This works for me at the end of cassandra-env.sh 

JVM_OPTS=$JVM_OPTS -Dpasswd.properties=/etc/cassandra/passwd.properties
JVM_OPTS=$JVM_OPTS -Daccess.properties=/etc/cassandra/access.properties

btw at a minimum you should upgrade from 0.7.2 to 0.7.6-2 see 
https://github.com/apache/cassandra/blob/cassandra-0.7.6-2/NEWS.txt#L61

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 1 Jul 2011, at 02:20, Earl Barnes wrote:

 Hi,
  
 I am encountering an error while trying to set up simple authentication in a 
 test environment. 
  
 BACKGROUND
 Cassandra Version: ReleaseVersion: 0.7.2-0ubuntu4~lucid1
 OS Level: Linux cassandra1 2.6.32-32-server #62-Ubuntu SMP Wed Apr 20 
 22:07:43 UTC 2011 x86_64 GNU/Linux
 2 node cluster
 Properties file exist in the following directory:
 
   /etc/cassandra/access.properties
   /etc/cassandra/passwd.properties
 The authenticator element in the /etc/cassandra/cassandra.yaml file is set to:
 authenticator: org.apache.cassandra.auth.SimpleAuthenticator
 The authority element in the /etc/cassandra/cassandra.yaml file is set to:
 authority: org.apache.cassandra.auth.SimpleAuthority
  
 The cassandra.in.sh file located in /usr/share/cassandra has been updated to 
 show the location of the properties files in the following manner:
  
 # Location of access.properties and passwd.properties
 JVM_OPTS=
 -Dpasswd.properties=/etc/cassandra/passwd.properties
 -Daccess.properties=/etc/cassandra/access.properties
  
 Also, the destination of the configuration directory:
 CASSANDRA_CONF=/etc/cassandra
  
 ERROR
 After setting DEBUG mode, I get the following error message in the system.log:
  
  INFO [main] 2011-06-30 10:12:01,365 AbstractCassandraDaemon.java (line 249) 
 Cassandra shutting down...
  INFO [main] 2011-06-30 10:12:01,366 CassandraDaemon.java (line 159) Stop 
 listening to thrift clients
  INFO [main] 2011-06-30 10:13:14,186 AbstractCassandraDaemon.java (line 77) 
 Logging initialized
  INFO [main] 2011-06-30 10:13:14,196 AbstractCassandraDaemon.java (line 97) 
 Heap size: 510263296/511311872
  WARN [main] 2011-06-30 10:13:14,227 CLibrary.java (line 93) Obsolete version 
 of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later
  WARN [main] 2011-06-30 10:13:14,227 CLibrary.java (line 93) Obsolete version 
 of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later
  WARN [main] 2011-06-30 10:13:14,228 CLibrary.java (line 125) Unknown 
 mlockall error 0
  INFO [main] 2011-06-30 10:13:14,234 DatabaseDescriptor.java (line 121) 
 Loading settings from file:/etc/cassandra/cassandra.yaml
  INFO [main] 2011-06-30 10:13:14,337 DatabaseDescriptor.java (line 181) 
 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
 ERROR [main] 2011-06-30 10:13:14,342 DatabaseDescriptor.java (line 405) Fatal 
 configuration error
 org.apache.cassandra.config.ConfigurationException: When using 
 org.apache.cassandra.auth.SimpleAuthenticator passwd.properties properties 
 must be defined.
 at 
 org.apache.cassandra.auth.SimpleAuthenticator.validateConfiguration(SimpleAuthenticator.java:148)
 at 
 org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:200)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:100)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:217)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at 
 org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160)
 Data from the output.log:
  
  INFO 10:12:01,365 Cassandra shutting down...
  INFO 10:12:01,366 Stop listening to thrift clients
  INFO 10:13:14,186 Logging initialized
  INFO 10:13:14,196 Heap size: 510263296/511311872
  WARN 10:13:14,227 Obsolete version of JNA present; unable to read errno. 
 Upgrade to JNA 3.2.7 or later
  WARN 10:13:14,227 Obsolete version of JNA present; unable to read errno. 
 Upgrade to JNA 3.2.7 or later
  WARN 10:13:14,228 Unknown mlockall error 0
  INFO 10:13:14,234 Loading settings from file:/etc/cassandra/cassandra.yaml
  INFO 10:13:14,337 DiskAccessMode 'auto' determined to be mmap, 
 indexAccessMode is mmap
 ERROR 10:13:14,342 Fatal configuration error
 org.apache.cassandra.config.ConfigurationException: When using 
 org.apache.cassandra.auth.SimpleAuthenticator passwd.properties properties 
 must be defined.
 at 
 org.apache.cassandra.auth.SimpleAuthenticator.validateConfiguration(SimpleAuthenticator.java:148

Re: Repair doesn't work after upgrading to 0.8.1

2011-06-30 Thread aaron morton
This seems to be a known issue related to 
https://issues.apache.org/jira/browse/CASSANDRA-2818 e.g. 
https://issues.apache.org/jira/browse/CASSANDRA-2768

There was some discussion on the IRC list today, driftx said the simple fix was 
a full cluster restart. Or perhaps a rolling restart with the 2818 patch 
applied may work. 

Starting with Dcassandra.load_ring_state=false causes the node to rediscover 
the ring which may help (just a guess really). But if there is bad node start 
been passed around in gossip it will just get the bad state again.  

Anyone else ?


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 1 Jul 2011, at 09:11, Héctor Izquierdo Seliva wrote:

 Hi all,
 
 I have upgraded all my cluster to 0.8.1. Today one of the disks in one
 of the nodes died. After replacing the disk I tried running repair, but
 this message appears:
 
 INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30
 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.80
 from repair because it is on version 0.7 or sooner. You should consider
 updating this node before running repair again.
 INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30
 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.76
 from repair because it is on version 0.7 or sooner. You should consider
 updating this node before running repair again.
 INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30
 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.80
 from repair because it is on version 0.7 or sooner. You should consider
 updating this node before running repair again.
 INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30
 20:36:25,086 AntiEntropyService.java (line 179) Excluding /10.20.13.77
 from repair because it is on version 0.7 or sooner. You should consider
 updating this node before running repair again.
 INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30
 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.76
 from repair because it is on version 0.7 or sooner. You should consider
 updating this node before running repair again.
 INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30
 20:36:25,086 AntiEntropyService.java (line 782) No neighbors to repair
 with for sbs on
 (170141183460469231731687303715884105727,28356863910078205288614550619314017621]:
  manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098 completed.
 INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30
 20:36:25,086 AntiEntropyService.java (line 179) Excluding /10.20.13.79
 from repair because it is on version 0.7 or sooner. You should consider
 updating this node before running repair again.
 INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30
 20:36:25,086 AntiEntropyService.java (line 782) No neighbors to repair
 with for sbs on
 (141784319550391026443072753096570088105,170141183460469231731687303715884105727]:
  manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf completed.
 INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30
 20:36:25,087 AntiEntropyService.java (line 782) No neighbors to repair
 with for sbs on
 (113427455640312821154458202477256070484,141784319550391026443072753096570088105]:
  manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a completed.
 
 What can I do?
 



Re: incomplete schema sync for new node

2011-07-03 Thread aaron morton
First, move off 0.7.2 if you can. While you may not get hit with this  
https://github.com/apache/cassandra/blob/cassandra-0.7.6-2/NEWS.txt#L61 you may 
have trouble with this https://issues.apache.org/jira/browse/CASSANDRA-2554

For background read the section on Staring Up and on Concurrency here 
http://wiki.apache.org/cassandra/LiveSchemaUpdates

 java.lang.RuntimeException: java.lang.RuntimeException: Could not reach 
 schema agreement with /50.0.0.3 in 6ms

 
Means you have split brain schemas in the your cluster. use describe cluster in 
the cli to see how many versions of the schema you have out there. 

The exception is thrown when the placement strategy (Simple or OldNTS) is 
trying to calculate the Natural Endpoints for a Token  
(AbstractReplicationStrategy.calculateNaturalEndpoints()) . This can happen 
reading/writing a key, or in your case when the node is bootstrapping and 
trying to work out which endpoints are responsible for the token ranges. AFAIK 
adding the migrations if an online process, the server is up and running while 
they are been added. So if there is anything that happens while the schema is 
invalid that requires a valid schema you will get the error. 

All the Previous version mismatch. cannot apply. errors in the log for 0.4 
mean it got a migration from someone but the migration was received out of 
order. The current version on the node is not the version that was present when 
this migration was applied. 
 
The simple answer is stop doing do what you're doing, it sounds dangerous and 
inefficient to me. AFAIK it's not what the schema migrations were designed to 
do and moving from CL 1 to 3 will increase the repair workload.Aside from the 
risks of changing RF up and down a lot.  

The long answer may be to always apply the schema changes on the same node; 
check there is a single version of the schema before adding a new one; and take 
a look at monkeying around with the Schema and Migrations CF's in the System KS 
to delete migrations you want skipped. 

Am frowning and tutting and stoking my beard. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 2 Jul 2011, at 12:58, Jeremy Stribling wrote:

 Oops, forgot to mention that we're using Cassandra 0.7.2.
 
 On 07/01/2011 05:46 PM, Jeremy Stribling wrote:
 Hi all,
 
 I'm running into a problem with Cassandra, where a new node coming up seems 
 to only get an incomplete set of schema mutations when bootstrapping, and as 
 a result hits an IllegalStateException: replication factor (3) exceeds 
 number of endpoints (2) error.
 
 I will describe the sequence of events below as I see them, but first I need 
 to warn you that I run Cassandra in a very non-standard way.  I embed it in 
 a JVM, along with Zookeeper, and other classes for a product we are working 
 on.  We need to bring nodes up and down dynamically in our product, 
 including going from one node to three nodes, and back down to one, at any 
 time.  If we ever drop below three nodes, we have code that sets the 
 replication factor of our keyspaces to 1; similarly, whenever we have three 
 or more nodes, we change the replication factor to 3.  I know this is 
 frowned upon by the community, but we're stuck with doing it this way for 
 now.
 
 Ok, here's the scenario:
 
 1) Node 50.0.0.4 bootstraps into a cluster consisting of nodes 50.0.0.2 and 
 50.0.0.3.
 2) Once 50.0.0.4 is fully bootstrapped, we change the replication factor for 
 our two keyspaces to 3.
 3) Then node 50.0.0.2 is taken down permanently, and we change the 
 replication factor back down to 1.
 4) We then remove node 50.0.0.2's tokens using the removeToken call on node 
 50.0.0.3.
 5) Then we start node 50.0.0.5, and have it join the cluster using 50.0.0.3 
 and 50.0.0.4 as seeds.
 6) 50.0.0.5 starts receiving schema mutations to get it up to speed; the 
 last one it receives (7d51e757-a40b-11e0-a98d-65ed1eced995) has the 
 replication factor at 3.  However, there should be more schema updates after 
 this that never arrive (you can see them arrive at 50.0.0.4 while it is 
 bootstrapping).
 7) Minutes after receiving this last mutation, node 50.0.0.5 hits the 
 IllegalStateException I've listed above, and I think for that reason never 
 successfully joins the cluster.
 
 My question is why doesn't node 50.0.0.5 receive the schema updates that 
 follow 7d51e757-a40b-11e0-a98d-65ed1eced995?  (For example, 
 8fc8820d-a40c-11e0-9eaf-6720e49624c2 is present in 50.0.0.4's log and sets 
 the replication factor back down to 1.)
 
 I've put logs for nodes 50.0.0.3/4/5 at 
 http://pdos.csail.mit.edu/~strib/cassandra_logs.tgz .  The logs are pretty 
 messy because they includes log messages from both Zookeeper and our product 
 code -- sorry about that.  Also, I think the clock on node 50.0.0.4 is a few 
 minutes ahead of the other nodes' clocks.
 
 I also noticed in 50.0.0.4's log the following exceptions:
 
 2011-07-01 18:00:49,832

Re: flushing issue

2011-07-04 Thread aaron morton
When you say using CassandraServer do you mean an embedded cassandra server ? 
What process did you use to add the Keyspaces ? Adding a KS via the thrift API 
should take care of everything.

The simple test is stop the server and the clients, start the server again and 
see if the KS is defined by using nodetool cfstats. 

Cheers 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 4 Jul 2011, at 22:28, Vivek Mishra wrote:

 Hi,
 I know, I might be missing something here.
 I am currently facing 1 issue.
  
 I have 2 cassandra clients(1. Using CassandraServer 2. Using 
 Cassandra.Client) running connecting to same host.
  
 I have created Keyspace K1, K2 using client1(e.g. CassandraServer), but 
 somehow those keyspaces are not available with Client2(e.g. Cassandra.Client).
  
 I have also tried by flusing StorageService.instance.ForceFlush to tables. 
 But that also didn’t work.
  
  
  
 Any help/Suggestion?
  
 
 
 Register for Impetus Webinar on ‘Leveraging the Cloud for your Product 
 Testing Needs’ on June 22 (10:00am PT). Meet Impetus as a sponsor for Hadoop 
 Summit 2011 in Santa Clara, CA on June 29. 
 
 Click http://www.impetus.com to know more. Follow us on 
 www.twitter.com/impetuscalling 
 
 
 NOTE: This message may contain information that is confidential, proprietary, 
 privileged or otherwise protected by law. The message is intended solely for 
 the named addressee. If received in error, please destroy and notify the 
 sender. Any use of this email is prohibited when received in error. Impetus 
 does not represent, warrant and/or guarantee, that the integrity of this 
 communication has been maintained nor that the communication is free of 
 errors, virus, interception or interference.



Re: copy data from multi-node cluster to single node

2011-07-04 Thread aaron morton
 How do you change the name of a cluster?  The FAQ instructions do not seem to 
 work for me - are they still valid for 0.7.5?
 Is the backup / restore mechanism going to work, or is there a better/simpler 
 to copy data from multi-node to single-node?

Bug fixed on 0.7.6 
https://github.com/apache/cassandra/blob/cassandra-0.7.6-2/CHANGES.txt#L21

Also you should move to 0.7.6 to get the Gossip fix 
https://github.com/apache/cassandra/blob/cassandra-0.7.6-2/CHANGES.txt#L6

When it comes to moving the data back to a single node I would:
- run repair
- snapshot prod node
- clear all data including the system KS data from the dev node
- copy the snapshot data for only your KS to the dev node into the correct 
directory, e.g. data/my-keyspace . 
- start the dev node
- add your KS, the node will now load the data

Ignoring the system data means the dev node can sort it's cluster name and 
token out using the yaml file. 

Even with 3 nodes and RF 3 it's impossible to ever say that one node has a 
complete copy of the data. Running repair will make it more likely, but the 
node could drop a mutation message during the repair or drop off gossip for few 
seconds. If you really want to have *everything* from the prod cluster then 
copy the data from all 3 nodes onto the dev node and compact it down. 

Hope that helps. 
  
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 5 Jul 2011, at 03:05, Ross Black wrote:

 Hi,
 
 I am using Cassandra 0.7.5 on Linux machines.
 
 I am trying to backup data from a multi-node cluster (3 nodes) and restore it 
 into a single node cluster that has a different name (for development 
 testing).
 
 The multi-node cluster is backed up using clustertool global_snapshot, and 
 then I copy the snapshot from a single node and replace the data directory in 
 the single node.
 The multi-node cluster has a replication factor of 3, so I assume that 
 restoring any node from the multi-node cluster will be the same.
 When started up this fails with a node name mismatch.
 
 I have tried removing all the Location* files in the data directory (as per 
 http://wiki.apache.org/cassandra/FAQ#clustername_mismatch) but the single 
 node then fails with an error message:
 org.apache.cassandra.config.ConfigurationException: Found system table files, 
 but they couldn't be loaded. Did you change the partitioner?
 
 
 How do you change the name of a cluster?  The FAQ instructions do not seem to 
 work for me - are they still valid for 0.7.5?
 Is the backup / restore mechanism going to work, or is there a better/simpler 
 to copy data from multi-node to single-node?
 
 Thanks,
 Ross
 



Re: copy data from multi-node cluster to single node

2011-07-05 Thread aaron morton
 Is it possible the snapshots from different nodes have the same name?
The directory name will be made up of the current timestamp on the machine and 
the optional name passed via the command line. 

The SSTables from different nodes may have name collisions. If you are 
aggregating data from multiple nodes onto one you will need to manually update 
them. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 5 Jul 2011, at 14:59, Zhu Han wrote:

 On Tue, Jul 5, 2011 at 8:58 AM, aaron morton aa...@thelastpickle.com wrote:
 How do you change the name of a cluster?  The FAQ instructions do not seem 
 to work for me - are they still valid for 0.7.5?
 Is the backup / restore mechanism going to work, or is there a 
 better/simpler to copy data from multi-node to single-node?
 
 Bug fixed on 0.7.6 
 https://github.com/apache/cassandra/blob/cassandra-0.7.6-2/CHANGES.txt#L21
 
 Also you should move to 0.7.6 to get the Gossip fix 
 https://github.com/apache/cassandra/blob/cassandra-0.7.6-2/CHANGES.txt#L6
 
 When it comes to moving the data back to a single node I would:
 - run repair
 - snapshot prod node
 - clear all data including the system KS data from the dev node
 - copy the snapshot data for only your KS to the dev node into the correct 
 directory, e.g. data/my-keyspace . 
 - start the dev node
 - add your KS, the node will now load the data
 
 Ignoring the system data means the dev node can sort it's cluster name and 
 token out using the yaml file. 
 
 Even with 3 nodes and RF 3 it's impossible to ever say that one node has a 
 complete copy of the data. Running repair will make it more likely, but the 
 node could drop a mutation message during the repair or drop off gossip for 
 few seconds. If you really want to have *everything* from the prod cluster 
 then copy the data from all 3 nodes onto the dev node and compact it down. 
 
 Is it possible the snapshots from different nodes have the same name?
  
 
 Hope that helps. 
   
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 5 Jul 2011, at 03:05, Ross Black wrote:
 
 Hi,
 
 I am using Cassandra 0.7.5 on Linux machines.
 
 I am trying to backup data from a multi-node cluster (3 nodes) and restore 
 it into a single node cluster that has a different name (for development 
 testing).
 
 The multi-node cluster is backed up using clustertool global_snapshot, and 
 then I copy the snapshot from a single node and replace the data directory 
 in the single node.
 The multi-node cluster has a replication factor of 3, so I assume that 
 restoring any node from the multi-node cluster will be the same.
 When started up this fails with a node name mismatch.
 
 I have tried removing all the Location* files in the data directory (as per 
 http://wiki.apache.org/cassandra/FAQ#clustername_mismatch) but the single 
 node then fails with an error message:
 org.apache.cassandra.config.ConfigurationException: Found system table 
 files, but they couldn't be loaded. Did you change the partitioner?
 
 
 How do you change the name of a cluster?  The FAQ instructions do not seem 
 to work for me - are they still valid for 0.7.5?
 Is the backup / restore mechanism going to work, or is there a 
 better/simpler to copy data from multi-node to single-node?
 
 Thanks,
 Ross
 
 
 



Re: Problems Iterating over tokens in 0.7.5

2011-07-06 Thread Aaron Morton
If you still have problems send through some details of where you get incorrect 
results.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 6/07/2011, at 3:23 AM, Anand Somani meatfor...@gmail.com wrote:

 Hi,
 
 Using thrift and get_range_slices call with tokenrange. Using Random 
 Partionioner. Have only tried this on  0.7.5
 Used to work in 0.6.4 or earlier version for me , but I notice that it does 
 not work for me anymore. The need is to iterate over a token range to do some 
 bookkeeping. 
 The logic is use 
 TokenRange from describe_ring 
 and then for each range 
 set the start and end token
 get a batch of rows using get_range_slices
 Then use the last token from the batch to set the start_token and repeat (get 
 the next batch). iterate until no more to get (or last from new batch is same 
 as last from previous batch)
 Now this works when in a test I insert n records and then for iterating use a 
 batch size m such that m  n. As soon as I use m  n, I get incorrect count 
 or an infinite loop where the range seems to repeat.
 
 Anybody seen this issue or am I using it incorrectly for newer versions of 
 cassandra? I will also look up how this is done in Hector, but in the 
 meantime if somebody has seen this behavior, please do respond.
 
 Thanks
 Anand
 
  


Re: deleting keys

2011-07-06 Thread Aaron Morton
See http://wiki.apache.org/cassandra/FAQ#range_ghosts

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 6/07/2011, at 3:46 AM, karim abbouh karim_...@yahoo.fr wrote:

 i use  get_range_slice to get the list of keys,
 then i call  client.remove(keyspace, key, columnFamily, timestamp, 
 ConsistencyLevel.ALL);
 to delete the record
 but i still have the keys.
 why?
 can i do it otherwise?


Re: Details of 'nodetool move'

2011-07-06 Thread Aaron Morton
Use move when you need to change the token distribution, e.g. To re-balance the 
ring.

During decommission writes that would go to the old node will also (I think, 
may be instead off) go to the node that will later be responsible for the old 
nodes range. Same thing when a node enters the ring, it will also be sent 
writes while it is bootstrapping.

Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 6/07/2011, at 10:35 AM, A J s5a...@gmail.com wrote:

 Hello,
 Where can I find details of nodetool move. Most places just mention
 that 'move the target node to a given Token. Moving is essentially a
 convenience over decommission + bootstrap.'
 
 Stuff like, when do I need to do and on what nodes? What is the value
 of 'new token' to be provided ? What happens if there is a mis-match
 between 'new token' in nodetool move command and initial_token in
 cassandra.yaml file.
 What happens when nodetool move is not successful. Does Cassandra know
 where to look for data (some data might still be on the old node and
 some on new) ?
 Repercussions of not running nodetool move or running it incorrectly ?
 Does a Read Repair take care of move for that specific key in question ?
 Does anti-entropy somehow take care of move ?
 
 Thanks.


Re: memory

2011-07-06 Thread Aaron Morton
That advice is a little out of date, specially in the future world of 0.8 
memory management, see 
http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/

Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 6/07/2011, at 5:51 PM, Donna Li donna...@utstar.com wrote:

 All:
 For a rough rule of thumb, Cassandra's internal datastructures will require 
 about memtable_throughput_in_mb * 3 * number of hot CFs + 1G + internal 
 caches.
  
 Why cassandra need so much memory? What is the 1G memory used for?
  
  
 Best Regards
 Donna li


Re: result sorted by keys in reversed

2011-07-06 Thread Aaron Morton
It's not currently supported via the api. But I *think* it's technically 
possible, the code could  page backwards using the index sampling the same way 
it does for columns. 

Best advice is to raise a ticket on 
https://issues.apache.org/jira/browse/CASSANDRA (maybe do a search first, 
someone else may have requested it)

Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 7/07/2011, at 1:39 AM, Monnom Monprenom accountfor...@yahoo.fr wrote:

 Hi,
 
 I am using get_range_slice and I get the results sorted by keys, Is it 
 possible to have the results also sorted by keys but in reverse (from the 
 biggest to the smallest)?


Re: commitlogs not draining

2011-07-06 Thread Aaron Morton
When you run drain the node will log someone like node drained when it is 
done.

The commit log should be empty, any data in the log may be due to changes in 
the system tables after the drain. Can you raise a ticket and include the 
commit logs left behind and any relevant log messages?

The non draining logs may be this 
https://issues.apache.org/jira/browse/CASSANDRA-2829 .

If a node tool flush does not clear them a restart will.

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 7/07/2011, at 12:26 PM, Scott Dworkis s...@mylife.com wrote:

 couple questions about commitlogs and the nodetool drain operator:
 
 * in 0.6, after invoking a drain, the commitlog directory would be empty. in 
 0.8, it seems to contain 2 files, a 44 byte .header file and 270 byte .log 
 file.  do these indicate a fully drained commitlog?
 
 * i have a couple nodes for which the commitlogs do not seem to be draining 
 at all... they remain several hundred k or meg in size.  are they corrupt? if 
 the data is not precious, is there some way to clear and reset them to work 
 around this?
 
 also, i see this in system.log:
 
 /data/var/log/cassandra/system.log.1:DEBUG [COMMIT-LOG-WRITER] 2011-07-06 
 11:04:10,076 CommitLog.java (line 473) Not safe to delete commit log 
 CommitLogSegment(/data/var/lib/cassandra/commitlog/CommitLog-1309288064262.log);
  dirty is LocationInfo (0), ; hasNext: true
 
 -scott


Re: Running hadoop jobs against data in remote data center

2011-07-06 Thread Aaron Morton
See 
http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers
 and 
http://www.datastax.com/docs/0.8/brisk/about_brisk#about-the-brisk-architecture

It's possible to run multi DC and use LOCAL_QUORUM consistency level in your 
production centre to allow the prod code to get on with it's life without 
worrying about the other DC.

Hope that helps.


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 7/07/2011, at 1:29 PM, Jason Baker ja...@apture.com wrote:

 I'm just setting up a Cassandra cluster for my company.  For a variety of 
 reasons, we have the servers that run our hadoop jobs in our local office and 
 our production machines in a collocated data center.  We don't want to run 
 hadoop jobs against cassandra servers on the other side of the US from us, 
 not to mention that we don't want them impacting performance in production.  
 What's the best way to handle this?
 
 My first instinct is to add some servers locally to the node and use 
 NetworkTopologyStrategy.  This way, the servers automatically get updated 
 with the latest changes, and we get a bit of extra redundancy for our 
 production machine.  Of course, the glaring weakness of this strategy is that 
 our stats servers aren't in a datacenter with any kind of production 
 guarantees.  The network connection is relatively slow and unreliable, the 
 servers may go out at any time, and I generally don't want to tie our 
 production performance or reliability to these servers.
 
 Is this as dumb an idea as I suspect it is, or can this be made to work?  :-)
 
 Are there any better ways to accomplish what I'm trying to accomplish?


Re: Pig pulling an older value from cassandra

2011-07-08 Thread aaron morton
Jeremy did you get anywhere with this ? 

If you are reading at CL ONE Read Repair will run in the background, so it may 
only be visible to subsequent reads. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 6 Jul 2011, at 20:52, Jeremy Hanna wrote:

 I'm seeing some strange behavior and not sure how it is possible.  We updated 
 some data using a pig script and that wrote back to cassandra.  We get the 
 value and list the value on the Cassandra CLI and it's the updated value - 
 from MARKET to market.  However, when doing a pig script to filter by the 
 known good values, we are left with about 42k rows that still have MARKET.  
 If we list a subset of them, get the key, and get/list them on the CLI, they 
 are lowercase market. 
 
 Anyone have any suggestions as to how this might be possible?  Our read 
 repair chance is set to 1.0. 
 
 Jeremy



Re: Re : result sorted by keys in reversed

2011-07-08 Thread aaron morton
 Is it possible to have same results sorting in reversed by another method 
 without get_range_slice in JAVA ?

Sorry I don't understand your question.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 7 Jul 2011, at 01:56, Monnom Monprenom wrote:

 Thanks,
 
 Is it possible to have same results sorting in reversed by another method 
 without get_range_slice in JAVA ?
 
 De : Aaron Morton aa...@thelastpickle.com
 À : user@cassandra.apache.org user@cassandra.apache.org
 Envoyé le : Jeudi 7 Juillet 2011 2h52
 Objet : Re: result sorted by keys in reversed
 
 It's not currently supported via the api. But I *think* it's technically 
 possible, the code could  page backwards using the index sampling the same 
 way it does for columns. 
 
 Best advice is to raise a ticket on 
 https://issues.apache.org/jira/browse/CASSANDRA (maybe do a search first, 
 someone else may have requested it)
 
 Cheers
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 7/07/2011, at 1:39 AM, Monnom Monprenom accountfor...@yahoo.fr wrote:
 
 Hi,
 
 I am using get_range_slice and I get the results sorted by keys, Is it 
 possible to have the results also sorted by keys but in reverse (from the 
 biggest to the smallest)?
 
 



Re: List nodes where write was applied to

2011-07-08 Thread aaron morton
The logs will give you some idea, but it's not information that is available as 
part of a request. 

Turn the logging up to DEBUG and watch what happens. You will see the 
coordinator log where it is sending messages together with some unique 
identifiers that you will also see logged on the replicas.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 7 Jul 2011, at 10:01, A J wrote:

 Is there a way to find what all nodes was a write applied to ? It
 could be a successful write (i.e. w was met) or unsuccessful write
 (i.e. less than w nodes were met). In either case, I am interested in
 finding:
 Number of nodes written to (before timeout or on success)
 Name of nodes written to (before timeout or on success)
 
 Thanks.



Re: how large cassandra could scale when it need to do manual operation?

2011-07-08 Thread aaron morton
AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time ago. 
Twitter is a vocal supporter with a large Apache Cassandra install, e.g. 
Twitter currently runs a couple hundred Cassandra nodes across a half dozen 
clusters.  
http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011


If you are working with a 3 node cluster removing/rebuilding/what ever one node 
will effect 33% of your capacity. When you scale up the contribution from each 
individual node goes down, and the impact of one node going down is less. 
Problems that happen with a few nodes will go away at scale, to be replaced by 
a whole set of new ones.   

 1):  the load balance need to manually performed on every node, according to: 

Yes

 2): when adding new nodes, need to perform node repair and cleanup on every 
 node 
You only need to run cleanup, see 
http://wiki.apache.org/cassandra/Operations#Bootstrap

 3) when decommission a node, there is a chance that slow down the entire 
 cluster. (not sure why but I saw people ask around about it.) and the only 
 way to do is shutdown the entire the cluster, rsync the data, and start all 
 nodes without the decommission one. 

I cannot remember any specific cases where decommission requires a full cluster 
stop, do you have a link? With regard to slowing down, the decommission process 
will stream data from the node you are removing onto the other nodes this can 
slow down the target node (I think it's more intelligent now about what is 
moved). This will be exaggerated in a 3 node cluster as you are removing 33% of 
the processing and adding some (temporary) extra load to the remaining nodes. 

 after all, I think there is alot of human work to do to maintain the cluster 
 which make it impossible to scale to thousands of nodes, 
Automation, Automation, Automation is the only way to go. 

Chef, Puppet, CF Engine for general config and deployment; Cloud Kick, munin, 
ganglia etc for monitoring. And 
Ops Centre (http://www.datastax.com/products/opscenter) for cassandra specific 
management.

 I am totally wrong about all of this, currently I am serving 1 millions pv 
 every day with Cassandra and it make me feel unsafe, I am afraid one day one 
 node crash will cause the data broken and all cluster goes wrong
With RF3 and a 3Node cluster you have room to lose one node and the cluster 
will be up for 100% of the keys. While better than having to worry about *the* 
database server, it's still entry level fault tolerance. With RF 3 in a 6 Node 
cluster you can lose up to 2 nodes and still be up for 100% of the keys. 

Is there something you are specifically concerned about with your current 
installation ? 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8 Jul 2011, at 08:50, Yan Chunlu wrote:

 hi, all:
 I am curious about how large that Cassandra can scale? 
 
 from the information I can get, the largest usage is at facebook, which is 
 about 150 nodes.  in the mean time they are using 2000+ nodes with Hadoop, 
 and yahoo even using 4000 nodes of Hadoop. 
 
 I am not understand why is the situation, I only have  little knowledge with 
 Cassandra and even no knowledge with Hadoop. 
 
 
 
 currently I am using cassandra with 3 nodes and having problem bring one back 
 after it out of sync, the problems I encountered making me worry about how 
 cassandra could scale out: 
 
 1):  the load balance need to manually performed on every node, according to: 
 
 def tokens(nodes): 
 
 for x in xrange(nodes): 
 
 print 2 ** 127 / nodes * x 
 
 
 
 2): when adding new nodes, need to perform node repair and cleanup on every 
 node 
 
 
 
 3) when decommission a node, there is a chance that slow down the entire 
 cluster. (not sure why but I saw people ask around about it.) and the only 
 way to do is shutdown the entire the cluster, rsync the data, and start all 
 nodes without the decommission one. 
 
 
 
 
 
 after all, I think there is alot of human work to do to maintain the cluster 
 which make it impossible to scale to thousands of nodes, but I hope I am 
 totally wrong about all of this, currently I am serving 1 millions pv every 
 day with Cassandra and it make me feel unsafe, I am afraid one day one node 
 crash will cause the data broken and all cluster goes wrong 
 
 
 
 in the contrary, relational database make me feel safety but it does not 
 scale well. 
 
 
 
 thanks for any guidance here.
 



Re: Corrupted data

2011-07-08 Thread aaron morton
You may not lose data. 

- What version and whats the upgrade history?
- What RF / node count / CL  ?
- Have you been running repair consistently ?
- Is this on a single node or all nodes ?

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8 Jul 2011, at 09:38, Héctor Izquierdo Seliva wrote:

 Hi everyone,
 
 I'm having thousands of these errors:
 
 WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
 CompactionManager.java (line 737) Non-fatal error reading row
 (stacktrace follows)
 java.io.IOError: java.io.IOException: Impossible row size
 6292724931198053
   at
 org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:719)
   at
 org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:633)
   at org.apache.cassandra.db.compaction.CompactionManager.access
 $600(CompactionManager.java:65)
   at org.apache.cassandra.db.compaction.CompactionManager
 $3.call(CompactionManager.java:250)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at java.util.concurrent.ThreadPoolExecutor
 $Worker.runTask(ThreadPoolExecutor.java:886)
   at java.util.concurrent.ThreadPoolExecutor
 $Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: Impossible row size 6292724931198053
   ... 9 more
 INFO [CompactionExecutor:1] 2011-07-08 16:36:45,705
 CompactionManager.java (line 743) Retrying from row index; data is -8
 bytes starting at 4735525245
 WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
 CompactionManager.java (line 767) Retry failed too.  Skipping to next
 row (retry's stacktrace follows)
 java.io.IOError: java.io.EOFException: bloom filter claims to be
 863794556 bytes, longer than entire row size -8
 
 
 THis is during scrub, as I saw similar errors while in normal operation.
 Is there anything I can do? It looks like I'm going to lose a ton of
 data
 



Re: how large cassandra could scale when it need to do manual operation?

2011-07-09 Thread aaron morton
 about the decommission problem, here is the link:  
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
The key part of that post is and since the second node was under heavy load, 
and not enough ram, it was busy GCing and worked horribly slow . 

 maybe I was misunderstanding the replication factor, doesn't it RF=3 means I 
 could lose two nodes and still have one available(with 100% of the keys), 
 once Nodes=3?
When you start losing replicas the CL you use dictates if the cluster is still 
up for 100% of the keys. See http://thelastpickle.com/2011/06/13/Down-For-Me/ 

  I have the strong willing to set RF to a very high value...
As chris said 3 is about normal, it means the QUORUM CL is only 2 nodes. 

 I am also trying to deploy cassandra across two datacenters(with 20ms 
 latency).

Lookup LOCAL_QUORUM in the wiki

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 9 Jul 2011, at 02:01, Chris Goffinet wrote:

 As mentioned by Aaron, yes we run hundreds of Cassandra nodes across multiple 
 clusters. We run with RF of 2 and 3 (most common). 
 
 We use commodity hardware and see failure all the time at this scale. We've 
 never had 3 nodes that were in same replica set, fail all at once. We 
 mitigate risk by being rack diverse, using different vendors for our hard 
 drives, designed workflows to make sure machines get serviced in certain time 
 windows and have an extensive automated burn-in process of (disk, memory, 
 drives) to not roll out nodes/clusters that could fail right away.
 
 On Sat, Jul 9, 2011 at 12:17 AM, Yan Chunlu springri...@gmail.com wrote:
 thank you very much for the reply. which brings me more confidence on 
 cassandra.
 I will try the automation tools, the examples you've listed seems quite 
 promising!
 
 
 about the decommission problem, here is the link:  
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-decommission-two-slow-nodes-td5078455.html
  I am also trying to deploy cassandra across two datacenters(with 20ms 
 latency). so I am worrying about the network latency will even make it worse. 
  
 
 maybe I was misunderstanding the replication factor, doesn't it RF=3 means I 
 could lose two nodes and still have one available(with 100% of the keys), 
 once Nodes=3?   besides I am not sure what's twitters setting on RF, but it 
 is possible to lose 3 nodes in the same time(facebook once encountered photo 
 loss because there RAID broken, rarely happen though). I have the strong 
 willing to set RF to a very high value...
 
 Thanks!
 
 
 On Sat, Jul 9, 2011 at 5:22 AM, aaron morton aa...@thelastpickle.com wrote:
 AFAIK Facebook Cassandra and Apache Cassandra diverged paths a long time ago. 
 Twitter is a vocal supporter with a large Apache Cassandra install, e.g. 
 Twitter currently runs a couple hundred Cassandra nodes across a half dozen 
 clusters.  
 http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassandra-sf-2011
 
 
 If you are working with a 3 node cluster removing/rebuilding/what ever one 
 node will effect 33% of your capacity. When you scale up the contribution 
 from each individual node goes down, and the impact of one node going down is 
 less. Problems that happen with a few nodes will go away at scale, to be 
 replaced by a whole set of new ones.   
 
 
 1):  the load balance need to manually performed on every node, according 
 to: 
 
 Yes
   
 2): when adding new nodes, need to perform node repair and cleanup on every 
 node 
 
 
 
 
 
 
 You only need to run cleanup, see 
 http://wiki.apache.org/cassandra/Operations#Bootstrap
 
 
 
 
 
 
 
 3) when decommission a node, there is a chance that slow down the entire 
 cluster. (not sure why but I saw people ask around about it.) and the only 
 way to do is shutdown the entire the cluster, rsync the data, and start all 
 nodes without the decommission one. 
 
 I cannot remember any specific cases where decommission requires a full 
 cluster stop, do you have a link? With regard to slowing down, the 
 decommission process will stream data from the node you are removing onto the 
 other nodes this can slow down the target node (I think it's more intelligent 
 now about what is moved). This will be exaggerated in a 3 node cluster as you 
 are removing 33% of the processing and adding some (temporary) extra load to 
 the remaining nodes. 
 
 
 
 
 
 
 
 after all, I think there is alot of human work to do to maintain the cluster 
 which make it impossible to scale to thousands of nodes, 
 
 Automation, Automation, Automation is the only way to go. 
 
 Chef, Puppet, CF Engine for general config and deployment; Cloud Kick, munin, 
 ganglia etc for monitoring. And 
 
 
 
 
 
 
 Ops Centre (http://www.datastax.com/products/opscenter) for cassandra 
 specific management.
 
 
 
 
 
 
 
 I am totally wrong about all of this, currently

Re: Cassandra Secondary index/Twissandra

2011-07-09 Thread aaron morton
 Is there a limit on the number of columns in a single column family that 
 serve as secondary indexes? 
AFAIK there is no coded limit, however every index is implemented as another 
(hidden) Column Family that inherits the settings of the parent CF. So under 
0.7 you may run out of memory, under 0.8 you may flush  a lot. Also, when an 
indexed column is updated there are potentially 3 operations that have to 
happen: read the old value, delete the old value, write the new value. More 
indexes == more index updating, just like any other database. 
 Does performance decrease (significantly) if the uniqueness of the column’s 
 values is high?
Low cardinality is recommended
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Secondary-indices-Why-low-cardinality-td6160509.html

 The CF for Userline/Uimeline - have comparator of LONG_TYPE and not 
 TimeUUID?
Probably just to make the demo easier. It's used to order tweets in the user 
and public timelines by the current time 
https://github.com/twissandra/twissandra/blob/master/cass.py#L204

 Does performance decrease (significantly) if the uniqueness of the column’s 
 name is high when comparator is LONG_TYPE/TimeUUID and each row has lots of 
 columns?
Depends on what sort of operations you are doing. Some read operations have to 
pay a constant cost to decode the row level column index, this can be tuned 
though. AFAIK the comparator type has very little to do with the performance. 

Hope that helps. 

-
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 9 Jul 2011, at 12:15, Eldad Yamin wrote:

 Hi,
 I have few questions:
 
 Secondary index
 Is there a limit on the number of columns in a single column family that 
 serve as secondary indexes? 
 Does performance decrease (significantly) if the uniqueness of the column’s 
 values is high?
 
 Twissandra
 Why in the source (or any tutorial I've read):
 The CF for Userline/Uimeline - have comparator of LONG_TYPE and not 
 TimeUUID?
 https://github.com/twissandra/twissandra/blob/master/tweets/management/commands/sync_cassandra.py
 Does performance decrease (significantly) if the uniqueness of the column’s 
 name is high when comparator is LONG_TYPE/TimeUUID and each row has lots of 
 columns?
 
 Thanks!
 Eldad



Re: Corrupted data

2011-07-09 Thread aaron morton
 Nop, only when something breaks
Unless you've been working at QUORUM life is about to get trickier.  Repair is 
an essential part of running a cassandra cluster, without it you risk data loss 
and dead data coming back to life. 

If you have been writing at QUORUM, so have a reasonable expectation of data 
replication, the normal approach is to happily let scrub skip the rows, after 
scrub has completed a repair will see the data repaired using one of the other 
replicas. That's probably already happened as the scrub process skipped the 
rows when writing them out to the new files. 

Try to run repair. Try running it on a single CF to start with.


Good luck

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 9 Jul 2011, at 16:45, Héctor Izquierdo Seliva wrote:

 Hi Peter.
 
 I have a problem with repair, and it's that it always brings the node
 doing the repairs down. I've tried setting index_interval to 5000, and
 it still dies with OutOfMemory errors, or even worse, it generates
 thousands of tiny sstables before dying.
 
 I've tried like 20 repairs during this week. None of them finished. This
 is on a 16GB machine using 12GB heap so it doesn't crash (too early).
 
 
 El sáb, 09-07-2011 a las 16:16 +0200, Peter Schuller escribió:
 - Have you been running repair consistently ?
 
 Nop, only when something breaks
 
 This is unrelated to the problem you were asking about, but if you
 never run delete, make sure you are aware of:
 
 http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
 http://wiki.apache.org/cassandra/DistributedDeletes
 
 
 
 



Re: Cassandra Secondary index/Twissandra

2011-07-10 Thread aaron morton
 Can you recommend on a better way of doing that or a way to tune Cassandra to 
 support those 2 CF?
A select with no start or finish column name, a column count and not in 
reversed order is about the fastest read query. 

You will need to do a reversed query, which will be a little slower. But may 
still be plenty fast enough, depending on scale and throughput and all those 
other things. see http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10 Jul 2011, at 00:14, Eldad Yamin wrote:

 Aaron - Thank you for the fast response!
 
 Does performance decrease (significantly) if the uniqueness of the column’s 
 name is high when comparator is LONG_TYPE/TimeUUID and each row has lots of 
 columns?
 
 Depends on what sort of operations you are doing. Some read operations have 
 to pay a constant cost to decode the row level column index, this can be 
 tuned though. AFAIK the comparator type has very little to do with the 
 performance. 
 
 In Twissandra, the columns are used as alternative index for the 
 Userline/Timeline. therefore the operation I'm going to do is slice_range.
 I'm going to get (for example) the first 50  columns (using comparator of 
 TimeUUID/LONG).
 Can you recommend on a better way of doing that or a way to tune Cassandra to 
 support those 2 CF?
 
 
 Thanks!
 
 On Sun, Jul 10, 2011 at 3:26 AM, aaron morton aa...@thelastpickle.com wrote:
 Is there a limit on the number of columns in a single column family that 
 serve as secondary indexes? 
 
 AFAIK there is no coded limit, however every index is implemented as another 
 (hidden) Column Family that inherits the settings of the parent CF. So under 
 0.7 you may run out of memory, under 0.8 you may flush  a lot. Also, when an 
 indexed column is updated there are potentially 3 operations that have to 
 happen: read the old value, delete the old value, write the new value. More 
 indexes == more index updating, just like any other database. 
 Does performance decrease (significantly) if the uniqueness of the column’s 
 values is high?
 Low cardinality is recommended
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Secondary-indices-Why-low-cardinality-td6160509.html
 
 The CF for Userline/Uimeline - have comparator of LONG_TYPE and not 
 TimeUUID?
 
 Probably just to make the demo easier. It's used to order tweets in the user 
 and public timelines by the current time 
 https://github.com/twissandra/twissandra/blob/master/cass.py#L204
 
 Does performance decrease (significantly) if the uniqueness of the column’s 
 name is high when comparator is LONG_TYPE/TimeUUID and each row has lots of 
 columns?
 
 Depends on what sort of operations you are doing. Some read operations have 
 to pay a constant cost to decode the row level column index, this can be 
 tuned though. AFAIK the comparator type has very little to do with the 
 performance. 
 
 Hope that helps. 
 
 -
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 9 Jul 2011, at 12:15, Eldad Yamin wrote:
 
 Hi,
 I have few questions:
 
 Secondary index
 Is there a limit on the number of columns in a single column family that 
 serve as secondary indexes? 
 Does performance decrease (significantly) if the uniqueness of the column’s 
 values is high?
 
 Twissandra
 Why in the source (or any tutorial I've read):
 The CF for Userline/Uimeline - have comparator of LONG_TYPE and not 
 TimeUUID?
 https://github.com/twissandra/twissandra/blob/master/tweets/management/commands/sync_cassandra.py
 Does performance decrease (significantly) if the uniqueness of the column’s 
 name is high when comparator is LONG_TYPE/TimeUUID and each row has lots of 
 columns?
 
 Thanks!
 Eldad
 
 



Re: Corrupted data

2011-07-10 Thread aaron morton
 1) do I need to treat every node as failure and do a rolling replacement?  
 since there might be some inconsistent in the cluster even I have no way to 
 find out.
see 
http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds

 2) is that the reason that caused the node repair hung? the log message says:
 Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
 WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
 Read timed out
I cannot find that anywhere in the code base, can you provide some more 
information ? 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10 Jul 2011, at 03:26, Yan Chunlu wrote:

 I am running RF=2(I have changed it from 2-3 and back to 2) and 3 nodes and 
 didn't running node repair more than 10 days, did not aware of this is 
 critical.  I run node repair recently and one of the node always hung... from 
 log it seems doing nothing related to the repair.
 
 so I got two problems:
 
 1) do I need to treat every node as failure and do a rolling replacement?  
 since there might be some inconsistent in the cluster even I have no way to 
 find out.
 2) is that the reason that caused the node repair hung? the log message says:
 Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
 WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
 Read timed out
 
 then nothing.
 
 thanks!
 
 On Sat, Jul 9, 2011 at 10:16 PM, Peter Schuller peter.schul...@infidyne.com 
 wrote:
  - Have you been running repair consistently ?
 
  Nop, only when something breaks
 
 This is unrelated to the problem you were asking about, but if you
 never run delete, make sure you are aware of:
 
 http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
 http://wiki.apache.org/cassandra/DistributedDeletes
 
 
 --
 / Peter Schuller
 
 
 
 -- 
 闫春路



Re: node stuck leaving

2011-07-10 Thread aaron morton
Thats the correct way to use remove token, it's there when the node you are 
removing from the ring cannot be started
http://wiki.apache.org/cassandra/Operations#Removing_nodes_entirely

Dead nodes popping up and an inconsistent view of the ring is a bit nasty. 

You can *try* restarting the node which thing the missing node is up with using 
the Dcassandra.load_ring_state=false JVM property. But you may have to take 
more drastic action. 
http://www.datastax.com/docs/0.8/troubleshooting/index#view-of-ring-differs-between-some-nodes

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10 Jul 2011, at 03:52, Héctor Izquierdo Seliva wrote:

 I'm also having problems with removetoken. Maybe I'm doing it wrong, but
 I was under the impression that I just had to call once removetoken.
 When I take a look at the nodes ring, the dead node keeps popping up.
 What's even more incredible is that in some of them it says UP
 
 



Re: R: Re: Re: AntiEntropy?

2011-07-12 Thread aaron morton
 Running nodetool repair causes Cassandra to execute a major compaction
This is not what I would call factually accurate. Repair does not run a major 
compaction. Major compaction is when all SSTables for a CF are compacted down 
to one SSTable. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 12 Jul 2011, at 10:09, cbert...@libero.it wrote:

 The book is wrong, at least by current versions of Cassandra (I'm
 basing that on the quote you pasted, I don't know the context).
 
 To be sure that I didn't misunderstand (English is not my mother tongue) here 
 is what the entire repair paragraph says ...
 
 Basic Maintenance
 There are a few tasks that you’ll need to perform before or after more 
 impactful tasks.
 For example, it makes sense to take a snapshot only after you’ve performed a 
 flush. So
 in this section we look at some of these basic maintenance tasks: repair, 
 snapshot, and
 cleanup.
 
 Repair
 Running nodetool repair causes Cassandra to execute a major compaction. A 
 Merkle
 tree of the data on the target node is computed, and the Merkle tree is 
 compared with
 those of other replicas. This step makes sure that any data that might be out 
 of sync
 with other nodes isn’t forgotten.
 During a major compaction (see “Compaction” in the Glossary), the server 
 initiates a
 TreeRequest/TreeReponse conversation to exchange Merkle trees with neighboring
 nodes. The Merkle tree is a hash representing the data in that column family. 
 If the
 trees from the different nodes don’t match, they have to be reconciled (or 
 “repaired”)
 in order to determine the latest data values they should all be set to. This 
 tree compar-
 ison validation is the responsibility of the org.apache.cassandra.service.
 AntiEntropy
 Service class. AntiEntropyService implements the Singleton pattern and 
 defines 
 the
 static Differencer class as well, which is used to compare two trees. If it 
 finds any
 differences, it launches a repair for the ranges that don’t agree.
 So although Cassandra takes care of such matters automatically on occasion, 
 you can
 run it yourself as well.
 
 
 
 
 nodetool repair must be scheduled by the operator to run regularly.
 The name repair is a bit unfortunate; it is not meant to imply that
 it only needs to run when something is wrong.
 
 -- 
 / Peter Schuller
 
 
 



Re: commitlog replay missing data

2011-07-13 Thread Aaron Morton
Have you verified that data you expect to see is not in the server after 
shutdown?

WRT the differed in the difference between the Memtable data size and SSTable 
live size, don't believe everything you read :)

Memtable live size is increased by the serialised byte size of every column 
inserted, and is never decremented. Deletes and overwrites will inflate this 
value. What was your workload like?

As of 0.8 we now have global memory management for cf's that tracks actual JVM 
bytes used by a CF. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 12/07/2011, at 3:28 PM, Jeffrey Wang jw...@palantir.com wrote:

 Hey all,
 
  
 
 Recently upgraded to 0.8.1 and noticed what seems to be missing data after a 
 commitlog replay on a single-node cluster. I start the node, insert a bunch 
 of stuff (~600MB), stop it, and restart it. There are log messages pertaining 
 to the commitlog replay and no errors, but some of the data is missing. If I 
 flush before stopping the node, everything is fine, and running cfstats in 
 the two cases shows different amounts of data in the SSTables. Moreover, the 
 amount of data that is missing is nondeterministic. Has anyone run into this? 
 Thanks.
 
  
 
 Here is the output of a side-by-side diff between cfstats outputs for a 
 single CF before restarting (left) and after (right). Somehow a 37MB memtable 
 became a 2.9MB SSTable (note the difference in write count as well)?
 
  
 
 Column Family: Blocks   Column 
 Family: Blocks
 
 SSTable count: 0  | SSTable 
 count: 1
 
 Space used (live): 0  | Space used 
 (live): 2907637
 
 Space used (total): 0 | Space used 
 (total): 2907637
 
 Memtable Columns Count: 8198  | Memtable 
 Columns Count: 0
 
 Memtable Data Size: 37550510  | Memtable Data 
 Size: 0
 
 Memtable Switch Count: 0  | Memtable 
 Switch Count: 1
 
 Read Count: 0   Read Count: 0
 
 Read Latency: NaN ms.   Read Latency: 
 NaN ms.
 
 Write Count: 8198 | Write Count: 
 1526
 
 Write Latency: 0.018 ms.  | Write 
 Latency: 0.011 ms.
 
 Pending Tasks: 0Pending 
 Tasks: 0
 
 Key cache capacity: 20  Key cache 
 capacity: 20
 
 Key cache size: 0   Key cache 
 size: 0
 
 Key cache hit rate: NaN Key cache hit 
 rate: NaN
 
 Row cache: disabled Row cache: 
 disabled
 
 Compacted row minimum size: 0 | Compacted row 
 minimum size: 1110
 
 Compacted row maximum size: 0 | Compacted row 
 maximum size: 2299
 
 Compacted row mean size: 0| Compacted row 
 mean size: 1960
 
  
 
 Note that I patched https://issues.apache.org/jira/browse/CASSANDRA-2317 in 
 my version, but there are no deletions involved so I don’t think it’s 
 relevant unless I messed something up while patching.
 
  
 
 -Jeffrey
 


Re: Storing counters in the standard column families along with non-counter columns ?

2011-07-13 Thread Aaron Morton
If you can provide some more details on the use case we may be able to provide 
some data model help.

You can always use a dedicated CF for the counters, and use the same row key.

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 12/07/2011, at 6:36 AM, Aditya Narayan ady...@gmail.com wrote:

 Oops that's really very much disheartening and it could seriously impact our 
 plans for going live in near future. Without this facility I guess counters 
 currently have very little usefulness.
 
 On Mon, Jul 11, 2011 at 8:16 PM, Chris Burroughs chris.burrou...@gmail.com 
 wrote:
 On 07/10/2011 01:09 PM, Aditya Narayan wrote:
  Is there any target version in near future for which this has been promised
  ?
 
 The ticket is problematic in that it would -- unless someone has a
 clever new idea -- require breaking thrift compatibility to add it to
 the api.  Since is unfortunate since it would be so useful.
 
 If it's in the 0.8.x series it will only be through CQL.
 


Re: Range query ordering with CQL JDBC

2011-07-17 Thread aaron morton
You are probably seeing this http://wiki.apache.org/cassandra/FAQ#range_rp

Row keys are not ordered by their key, they are ordered by the token created by 
the partitioner.

If you still think there is a problem provide an example of the data your are 
seeing and what you expected to see. 

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 16 Jul 2011, at 06:09, Matthieu Nahoum wrote:

 Hi Eric,
 
 I am using the default partitioner, which is the RandomPartitioner I guess.
 The key type is String. Are Strings ordered by lexicographic rules?
 
 Thanks 
 
 On Fri, Jul 15, 2011 at 12:04 PM, Eric Evans eev...@rackspace.com wrote:
 On Thu, 2011-07-14 at 11:07 -0500, Matthieu Nahoum wrote:
  I am trying to range-query a column family on which the keys are
  epochs (similar to the output of System.currentTimeMillis() in Java).
  In CQL (Cassandra 0.8.1 with JDBC driver):
 
  SELECT * FROM columnFamily WHERE KEY  '130920500';
 
  I can't get to have a result that make sense, it always returns wrong
  timestamps. So I must make an error somewhere in the way I input the
  querying value. I tried in clear (like above), in hexadecimal, etc.
 
  What is the correct way of doing this? Is it possible that my key is
  too long?
 
 What partitioner are you using?  What is the key type?
 
 --
 Eric Evans
 eev...@rackspace.com
 
 
 
 
 -- 
 ---
 Engineer at NAVTEQ
 Berkeley Systems Engineer '10
 ENAC Engineer '09
 
 151 N. Michigan Ave.
 Appt. 3716
 Chicago, IL, 60601
 USA
 Cell: +1 (510) 423-1835
 
 http://www.linkedin.com/in/matthieunahoum
 



Re: Data overhead discussion in Cassandra

2011-07-17 Thread aaron morton
What RF are you using ? 

On disk each column has 15 bytes of overhead, plus the column name and the 
column value. So for an 8 byte long and a 8 byte double there will be 16 bytes 
of data and 15 bytes of data. 

The index file also contains the the row key, the MD5 token (for RP) and the 
row offset for the data file. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 15 Jul 2011, at 07:09, Sameer Farooqui wrote:

 We just set up a demo cluster with Cassandra 0.8.1 with 12 nodes and loaded 
 1.5 TB of data into it. However, the actual space on disk being used by data 
 files in Cassandra is 3 TB. We're using a standard column family with a 
 million rows (key=string) and 35,040 columns per key. The column name is a 
 long and the column value is a double.
 
 I was just hoping to understand more about why the data overhead is so large. 
 We're not using expiring columns. Even considering indexing and bloom 
 filters, it shouldn't have bloated up the data size to 2x the original 
 amount. Or should it have?
 
 How can we better anticipate the actual data usage on disk in the future?
 
 - Sameer



Re: Thrift Java Client - Get a column family from a Keyspace

2011-07-17 Thread aaron morton
 Currently the only way for that would be iterating through the list of column 
 families returned by the getCf_defs() method.

Yes. 

BTW most people access cassandra via a higher level client, for the Java peeps 
tend to use either  Hector or Pelops. Aside from not having to code against 
thrift they also provide connection management and retry features that are dead 
handy.  

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 14 Jul 2011, at 23:59, Chandrasekhar M wrote:

 Hi
  
 I have been playing around with Cassandra and its Java Thrift Client.
  
 From my understanding, one could get/retrieve a Keyspace, KsDef object, using 
 the describe_keyspace(String name) method on the Cassandra.Client object.
  
 Subsequently, one could get a list of all the ColumnFamily definitions in a 
 keyspace, using the getCf_defs() method on the KsDef Object.
  
 Is there a way to get a single ColumnFamily if I know the name of the 
 columnfamily (just a convenience function) ?
  
 Currently the only way for that would be iterating through the list of column 
 families returned by the getCf_defs() method.
  
 Thanks in Advance
 Chandra
 
 
 Register for Impetus Webinar on ‘Device Side Performance Optimization of 
 Mobile Apps’, July 08 (10:00 am Pacific Time). Impetus is presenting a 
 Cassandra case study on July 11 as a sponsor for Cassandra SF 2011 in San 
 Francisco. 
 
 Click http://www.impetus.com to know more. Follow us on 
 www.twitter.com/impetuscalling 
 
 
 NOTE: This message may contain information that is confidential, proprietary, 
 privileged or otherwise protected by law. The message is intended solely for 
 the named addressee. If received in error, please destroy and notify the 
 sender. Any use of this email is prohibited when received in error. Impetus 
 does not represent, warrant and/or guarantee, that the integrity of this 
 communication has been maintained nor that the communication is free of 
 errors, virus, interception or interference.



Re: What available Cassandra schema documentation is available?

2011-07-18 Thread aaron morton
Indexes are not supported on sub columns. 

Also, you definition seems to mix standard and sub columns together in the CF. 
For a super CF all top level columns contain sub columns.
Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 14 Jul 2011, at 19:39, Andreas Markauskas wrote:

 I couldn't find any schema example for the supercolumn column family
 that is strongly typed. For example,
 
 create column family Super1 with comparator=UTF8Type and
 column_type=Super and key_validation_class=UTF8Type and
 column_metadata = [
 {column_name: username, validation_class:UTF8Type},
 {column_name: email, validation_class:UTF8Type, index_type: KEYS},
 {column_name: address, validation_class:UTF8Type, subcolumn_metadata = [
 {column_name: street, validation_class:UTF8Type},
 {column_name: state, validation_class:UTF8Type, index_type: KEYS}
 ]
 }
 ];
 
 Or does someone know a better method? I like to make it as painless as
 possible for developers with a strongly typed schema so as to avoid
 orphan data.



Re: thrift install

2011-07-18 Thread aaron morton
Why are you installing thrift ?

The cassandra binary packages contain all the dependancies. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 19 Jul 2011, at 07:51, Sal Lopez wrote:

 Does anyone have documentation/tips for installing thrift on a server that 
 does not have access to the internet? See error below:
 
 Buildfile: build.xml
 
 setup.init:
 [mkdir] Created dir: /tmp/thrift-0.6.1/lib/java/build
 [mkdir] Created dir: /tmp/thrift-0.6.1/lib/java/build/lib
 [mkdir] Created dir: /tmp/thrift-0.6.1/lib/java/build/tools
 [mkdir] Created dir: /tmp/thrift-0.6.1/lib/java/build/test
 
 mvn.ant.tasks.download:
   [get] Getting: 
 http://repo1.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3.jar
   [get] To: 
 /tmp/thrift-0.6.1/lib/java/build/tools/maven-ant-tasks-2.1.3.jar
   [get] Error getting 
 http://repo1.maven.org/maven2/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3.jar
  to /tmp/thrift-0.6.1/lib/java/build/tools/maven-ant-tasks-2.1.3.jar
 
 BUILD FAILED
 java.net.ConnectException: Connection timed out
 
 Thanks. Sal



Re: How to keep only exactly column of key

2011-07-18 Thread aaron morton
There is no support for a feature like that, and i doubt it would ever be 
supported. For one there there are no locks during a write, so it's not 
possible to definitively say there are 100 columns at a particular instance of 
time. 

You would need to read all columns and delete the ones you no longer need.

You could also try Redis. 

Cheers

  
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 19 Jul 2011, at 03:22, JKnight JKnight wrote:

 Dear all, 
 
 I want to keep only 100 column of a key: when I add a column for a key, if 
 the number column of key is 100, another column (by order) will be deleted. 
 
 Does Cassandra have setting for that?
 
 -- 
 Best regards,
 JKnight



Re: b-tree

2011-07-20 Thread aaron morton
Just throwing out a (half baked) idea, perhaps the Nested Set Model of trees 
would work  http://en.wikipedia.org/wiki/Nested_set_model

* Ever row would represent a set with a left and right encoded into the key
* Members are inserted as columns into *every* set / row they are a member. So 
we are de-normalising and trading space for time. 
* May need to maintain a custom secondary index of the materialised sets. e.g. 
slice a row to get the first column = the left value you are interested in, 
that is the key for the set. 

I've not thought it through much further than that, a lot would depend on your 
data. The top sets may get very big, . 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 Jul 2011, at 08:33, Jeffrey Kesselman wrote:

 Im not sure if I have an answer for you, anyway, but I'm curious
 
 A b-tree and a binary tree are not the same thing.  A binary tree is a basic 
 fundamental data structure,  A b-tree is an approach to storing and indexing 
 data on disc for a database.
 
 Which do you mean?
 
 On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin elda...@gmail.com wrote:
 Hello,
 Is there any good way of storing a binary-tree in Cassandra?
 I wonder if someone already implement something like that and how 
 accomplished that without transaction supports (while the tree keep evolving)?
 
 I'm asking that becouse I want to save geospatial-data, and SimpleGeo did it 
 using b-tree:
 http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php
 
 Thanks!
 
 
 
 -- 
 It's always darkest just before you are eaten by a grue.



Re: Data Visualization Best Practices

2011-07-20 Thread aaron morton
This project may provide some inspiration 
https://github.com/driftx/chiton

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 Jul 2011, at 06:36, Selcuk Bozdag wrote:

 Hi,
 
 Cassandra provides a flexible scheme-less data storage facility which
 is a perfect match for one of our projects. However, regarding the
 requirements it is also necessary to list the CFs in a tabular
 fashion. I searched on the Internet for some guidelines but could not
 get a handy practice for viewing such scheme-less data.
 
 Have you experienced such a case where you required to show CFs (which
 obviously may not have the same columns) inside tables? What would be
 the most relevant way of showing such data?
 
 Regards,
 
 Selcuk



Re: Repair taking a long, long time

2011-07-20 Thread aaron morton
The first thing to do is understand what the server is doing. 

As Edward said, there are two phases to the repair first the differences are 
calculated and then they are shared between the neighbours. Lets an a third 
step, once the neighbour gets the data it has to rebuild the indexes and bloom 
filter, not huge but lets include it for completeness. 

So...

0. Check for ERRORS in the log.
1. check nodetool compactstats , if the Merkle tree build is going on it will 
say Validation Compaction. Run it twice and check for progress.
2. check nodetool netstats, this will show which segments of the data are been 
streamed. Run it twice and check for progress. 
3. check nodetool compactstats, if the data has completed streaming and indexes 
are been built it will say SSTable build

Once we know what stage of the repair your server is at it's possible to reason 
about what is going on.

If you want to dive deeper look for a log messages on the machine you started 
the repair on from the AnitEntropyService. 

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 Jul 2011, at 02:31, David Boxenhorn wrote:

 As I indicated below (but didn't say specifically) another option is to set 
 read repair chance to 1.0 for all your CFs and loop over all your data, since 
 read triggers a read repair. 
 
 On Wed, Jul 20, 2011 at 4:58 PM, Maxim Potekhin potek...@bnl.gov wrote:
 I can re-load all data that I have in the cluster, from a flat-file cache I 
 have
 on NFS, many times faster than the nodetool repair takes. And that's not
 even accurate because as other noted nodetool repair eats up disk space
 for breakfast and takes more than 24hrs on 200GB data load, at which point
 I have to cancel. That's not acceptable. I simply don't know what to do now.
 
 
 
 On 7/20/2011 8:47 AM, David Boxenhorn wrote:
 
 I have this problem too, and I don't understand why.
 
 I can repair my nodes very quickly by looping though all my data (when you 
 read your data it does read-repair), but nodetool repair takes forever. I 
 understand that nodetool repair builds merkle trees, etc. etc., so it's a 
 different algorithm, but why can't nodetool repair be smart enough to choose 
 the best algorithm? Also, I don't understand what's special about my data 
 that makes nodetool repair so much slower than looping through all my data.
 
 
 On Wed, Jul 20, 2011 at 12:18 AM, Maxim Potekhin potek...@bnl.gov wrote:
 Thanks Edward. I'm told by our IT that the switch connecting the nodes is 
 pretty fast.
 Seriously, in my house I copy complete DVD images from my bedroom to
 the living room downstairs via WiFi, and a dozen of GB does not seem like a
 problem, on dirt cheap hardware (Patriot Box Office).
 
 I also have just _one_ column major family but caveat emptor -- 8 indexes 
 attached to
 it (and there will be more). There is one accounting CF which is small, 
 can't possibly
 make a difference.
 
 By contrast, compaction (as in nodetool) performs quite well on this 
 cluster. I start suspecting some
 sort of malfunction.
 
 Looked at the system log during the repair, there is some compaction agent 
 doing
 work that I'm not sure makes sense (and I didn't call for it). Disk 
 utilization all of a sudden goes up to 40%
 per Ganglia, and stays there, this is pretty silly considering the cluster 
 is IDLE and we have SSDs. No external writes,
 no reads. There are occasional GC stoppages, but these I can live with.
 
 This repair debacle happens 2nd time in a row. Cr@p. I need to go to 
 production soon
 and that doesn't look good at all. If I can't manage a system that simple 
 (and/or get help
 on this list) I may have to cut losses i.e. stay with Oracle.
 
 Regards,
 
 Maxim
 
 
 
 
 On 7/19/2011 12:16 PM, Edward Capriolo wrote:
 
 Well most SSD's are pretty fast. There is one more to consider. If Cassandra 
 determines nodes are out of sync it has to transfer data across the network. 
 If that is the case you have to look at 'nodetool streams' and determine how 
 much data is being transferred between nodes. There are some open tickets 
 where with larger tables repair is streaming more then it needs to. But even 
 if the transfers are only 10% of your 200GB. Transferring 20 GB is not 
 trivial.
 
 If you have multiple keyspaces and column families repair one at a time 
 might make the process more manageable.
 
 
 
 



Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-20 Thread Aaron Morton
If you have never run repair also check the section on repair on this page 
http://wiki.apache.org/cassandra/Operations About how frequently it should be 
run.

There is an issue where repair can stream too much data, and this can lead to 
excessive disk use.

My non scientific approach to the never run repair before problem is to repair 
a single CF at a time, starting with the small ones that are less likely to 
have differences as they will stream the smallest amount of data. 

If you really want to conserve disk IO during the repair consider disabling the 
minor compaction by setting the min and max thresholds to 0 via node tool.

hope that helps.


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 20/07/2011, at 11:46 PM, Yan Chunlu springri...@gmail.com wrote:

 just found this:
 https://issues.apache.org/jira/browse/CASSANDRA-2156
 
 but seems only available to 0.8 and people submitted a patch for 0.6, I am 
 using 0.7.4, do I need to dig into the code and make my own patch?
 
 does add compaction throttle solve the io problem?  thanks!
 
 On Wed, Jul 20, 2011 at 4:44 PM, Yan Chunlu springri...@gmail.com wrote:
 at the beginning of using cassandra, I have no idea that I should run node 
 repair frequently, so basically, I have 3 nodes with RF=3 and have not run 
 node repair for months, the data size is 20G.
 
 the problem is when I start running node repair now, it eat up all disk io 
 and the server load became 20+ and increasing, the worst thing is, the entire 
 cluster has slowed down and can not handle request. so I have to stop it 
 immediately because it make my web service unavailable.
 
 the server has Intel Xeon-Lynnfield 3470-Quadcore [2.93GHz] and 8G memory, 
 with Western Digital WD RE3 WD1002FBYS SATA disk.
 
 I really have no idea what to do now, as currently I have already found some 
 data loss, any suggestions would be appreciated.
 
 
 
 -- 
 闫春路


Re: PHPCassa get number of rows

2011-07-20 Thread Aaron Morton
Cassandra does not provide a way to count the number of rows, the best you can 
do is a series of range calls and count them on the client side 
http://thobbs.github.com/phpcassa/tutorial.html

If this is something you need in your app consider creating a custom secondary 
index to store the row keys and counting the columns. NOTE: counting columns 
just reads aol the columns, for a big row it can result in an OOM.

Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 20/07/2011, at 8:29 AM, Jean-Nicolas Boulay Desjardins 
jnbdzjn...@gmail.com wrote:

 Hi,
 
 How can I get the number of rows with PHPCassa?
 
 Thanks in advance.


Re: with proof Re: cassandra goes infinite loop and data lost.....

2011-07-20 Thread aaron morton
Personally I would do a repair first if you need to do one, just so you are 
confident everything is where is should be. 

Then do the move as described in the wiki. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 Jul 2011, at 15:14, Yan Chunlu wrote:

 sorry for the misunderstanding.  I saw many N of 2147483647 which N=0 and 
 thought it was not doing anything.
 
 my node was very unbalanced and I was intend to rebalance it by nodetool 
 move after a node repair, does that cause the slices much large?
 
 Address Status State   LoadOwnsToken  
  

 84944475733633104818662955375549269696  
 10.28.53.2  Down   Normal  71.41 GB81.09%  
 52773518586096316348543097376923124102  
 10.28.53.3 Up Normal  14.72 GB10.48%  
 70597222385644499881390884416714081360  
 10.28.53.4  Up Normal  13.5 GB 8.43%   
 84944475733633104818662955375549269696  
 
 
 should I do nodetool move according to 
 http://wiki.apache.org/cassandra/Operations#Load_balancing  before doing 
 repair?
 
 thank you for your help!
 
 
 
 On Thu, Jul 21, 2011 at 10:47 AM, Jonathan Ellis jbel...@gmail.com wrote:
 This is not an infinite loop, you can see the column objects being
 iterated over are different.
 
 Like I said last time, I do see that it's saying N of 2147483647
 which looks like you're
 doing slices with a much larger limit than is advisable.
 
 On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu springri...@gmail.com wrote:
  this time it is another node, the node goes down during repair, and come
  back but never up, I change log level to DEBUG and found out it print out
  the following message infinitely
  DEBUG [main] 2011-07-20 20:58:16,286 SliceQueryFilter.java (line 123)
  collecting 0 of 2147483647: 76616c7565:false:6@1311207851757243
  DEBUG [main] 2011-07-20 20:58:16,319 SliceQueryFilter.java (line 123)
  collecting 0 of 2147483647: 76616c7565:false:98@1306722716288857
  DEBUG [main] 2011-07-20 20:58:16,424 SliceQueryFilter.java (line 123)
  collecting 0 of 2147483647: 76616c7565:false:95@1311089980134545
  DEBUG [main] 2011-07-20 20:58:16,611 SliceQueryFilter.java (line 123)
  collecting 0 of 2147483647: 76616c7565:false:85@1311154048866767
  DEBUG [main] 2011-07-20 20:58:16,754 SliceQueryFilter.java (line 123)
  collecting 0 of 2147483647: 76616c7565:false:366@1311207176880564
  DEBUG [main] 2011-07-20 20:58:16,770 SliceQueryFilter.java (line 123)
  collecting 0 of 2147483647: 76616c7565:false:80@1310443605930900
  DEBUG [main] 2011-07-20 20:58:16,816 SliceQueryFilter.java (line 123)
  collecting 0 of 2147483647: 76616c7565:false:486@1311173929610402
  DEBUG [main] 2011-07-20 20:58:16,870 SliceQueryFilter.java (line 123)
  collecting 0 of 2147483647: 76616c7565:false:101@1310818289021118
  DEBUG [main] 2011-07-20 20:58:17,041 SliceQueryFilter.java (line 123)
  collecting 0 of 2147483647: 76616c7565:false:677@1311202595772170
  DEBUG [main] 2011-07-20 20:58:17,047 SliceQueryFilter.java (line 123)
  collecting 0 of 2147483647: 76616c7565:false:374@1311147641237918
 
 
 
  On Thu, Jul 14, 2011 at 1:36 PM, Jonathan Ellis jbel...@gmail.com wrote:
 
  That says I'm collecting data to answer requests.
 
  I don't see anything here that indicates an infinite loop.
 
  I do see that it's saying N of 2147483647 which looks like you're
  doing slices with a much larger limit than is advisable (good way to
  OOM the way you already did).
 
  On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu springri...@gmail.com wrote:
   I gave cassandra 8GB heap size and somehow it run out of memory and
   crashed.
   after I start it, it just runs in to the following infinite loop, the
   last
   line:
   DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
   collecting 0 of 2147483647: 100zs:false:14@1310168625866434
   goes for ever
   I have 3 nodes and RF=2, so I am losing data. is that means I am screwed
   and
   can't get it back?
   DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
   collecting 20 of 2147483647: q74k:false:14@1308886095008943
   DEBUG [main] 2011-07-13 22:19:00,585 SliceQueryFilter.java (line 123)
   collecting 0 of 2147483647: 10fbu:false:1@1310223075340297
   DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
   collecting 0 of 2147483647: apbg:false:13@1305641597957086
   DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
   collecting 1 of 2147483647: auje:false:13@1305641597957075
   DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
   collecting 2 of 2147483647: ayj8:false:13@1305641597957060
   DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123)
   collecting 3 of 2147483647: b4fz:false:13@1305641597957096
   DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123

Re: reset keys_cached

2011-07-21 Thread aaron morton
To clear the key cache use the invalidateKeyCache() operation on the column 
family in JConsole / JMX

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 Jul 2011, at 18:15, 魏金仙 wrote:

 Can any one tell how to reset keys_cached?
 Thanks.
 
 



Re: Memtables stored in which location

2011-07-21 Thread aaron morton
Try the project wiki here http://wiki.apache.org/cassandra/ArchitectureOverview 
or the my own blog here
http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/

There is also a list of articles on the wiki here 
http://wiki.apache.org/cassandra/ArticlesAndPresentations

in short, writes got to the commit log first, then the memtable in memory, 
which is later flushed to disk. A read is from potentially multiple sstables 
and memtables. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 Jul 2011, at 21:17, CASSANDRA learner wrote:

 Hi,
 
 You r right but i too have some concerns...
 
 Any ways , some where memtable has to be stored right, like we say memtable 
 data is flushed to create sstable on disk.
 Exactly from which location or memory it will be getting from. is it like an 
 objects streams or like it is storing the values in commitlog.
 my next question is , data is written to commit log. all the data is 
 available here, and the sstable are getting created on disk, then where and 
 when these memtables are coming into picture
 
 On Thu, Jul 21, 2011 at 1:44 PM, samal sa...@wakya.in wrote:
 SSTable is stored on disk not memtable.
 
 Memtable is memory representation of data, which is on flush to create 
 SSTable on disk.
 
 This is the location where SSTable is stored
 https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L71
 
 
 Where as Commitlog which is back up (log) for memtable replaying store in
 https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L75
 location.
 
 Once the all memtable is flushed to disk, new commit log segment is created.
 
 On Thu, Jul 21, 2011 at 1:12 PM, Abdul Haq Shaik 
 abdulsk.cassan...@gmail.com wrote:
 Hi,
 
 Can you please let me know where exactly the memtables are getting stored. I 
 wanted to know the physical location
 
 



Re: cassandra massive write perfomance problem

2011-07-21 Thread aaron morton
background http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts

Without more info my initial guess is some GC pressure and/or IO pressure from 
compaction. Check the logs for messages from the GCInspector or connect 
JConsole to the instance and take a look at the heap. Here is some info on 
looking at the IO stats 
http://spyced.blogspot.com/2010/01/linux-performance-basics.html
 
With regard to the 25+GB on disk, that all depends on how much data you are 
writing. Be aware that compacted files are not immediately deleted 
http://wiki.apache.org/cassandra/FAQ#cleaning_compacted_tables

You may also want to track things by looking at nodetool tpstats and cfstats 
(for latency).

Cheers
  
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 Jul 2011, at 21:49, lebron james wrote:

 Please help me solve one problem I have server with 4 GB RAM and 2x 4 cores 
 CPU When i start do massive writes in cassandra all works fine. but after 
 couple hours with 10K inserts per second database grows up to 25+ GB 
 performance go down to 500 insert per seconds I find out this because 
 compacting operations is very slow and i dont understand why, i set 8 
 concurrent compacting threads but cassandra dont use 8 threads only 2 cores 
 are loaded.



Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-21 Thread aaron morton
What are you seeing in compaction stats ? 

You may see some of  https://issues.apache.org/jira/browse/CASSANDRA-2280 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 Jul 2011, at 23:17, Yan Chunlu wrote:

 after tried nodetool -h reagon repair key cf, I found that even repair single 
 CF, it involves rebuild all sstables(using nodetool compactionstats), is that 
 normal? 
 
 On Thu, Jul 21, 2011 at 7:56 AM, Aaron Morton aa...@thelastpickle.com wrote:
 If you have never run repair also check the section on repair on this page 
 http://wiki.apache.org/cassandra/Operations About how frequently it should be 
 run.
 
 There is an issue where repair can stream too much data, and this can lead to 
 excessive disk use.
 
 My non scientific approach to the never run repair before problem is to 
 repair a single CF at a time, starting with the small ones that are less 
 likely to have differences as they will stream the smallest amount of data. 
 
 If you really want to conserve disk IO during the repair consider disabling 
 the minor compaction by setting the min and max thresholds to 0 via node tool.
 
 hope that helps.
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 20/07/2011, at 11:46 PM, Yan Chunlu springri...@gmail.com wrote:
 
 just found this:
 https://issues.apache.org/jira/browse/CASSANDRA-2156
 
 but seems only available to 0.8 and people submitted a patch for 0.6, I am 
 using 0.7.4, do I need to dig into the code and make my own patch?
 
 does add compaction throttle solve the io problem?  thanks!
 
 On Wed, Jul 20, 2011 at 4:44 PM, Yan Chunlu springri...@gmail.com wrote:
 at the beginning of using cassandra, I have no idea that I should run node 
 repair frequently, so basically, I have 3 nodes with RF=3 and have not run 
 node repair for months, the data size is 20G.
 
 the problem is when I start running node repair now, it eat up all disk io 
 and the server load became 20+ and increasing, the worst thing is, the 
 entire cluster has slowed down and can not handle request. so I have to stop 
 it immediately because it make my web service unavailable.
 
 the server has Intel Xeon-Lynnfield 3470-Quadcore [2.93GHz] and 8G memory, 
 with Western Digital WD RE3 WD1002FBYS SATA disk.
 
 I really have no idea what to do now, as currently I have already found some 
 data loss, any suggestions would be appreciated.  
 
 
 
 -- 
 闫春路
 
 
 
 -- 
 闫春路



Re: Memtables stored in which location

2011-07-21 Thread aaron morton
The data file with rows and columns, the bloom filter for the rows in the data 
file, the index for rows in the data file and the statistics. 

Cheers

 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 Jul 2011, at 23:26, Nilabja Banerjee wrote:

 One more thing I want to ask here ...in the data folder of cassandra, for 
 each columnfamily four type of .db files are generated. for example:  
 CFname-f-1-Data.db, CFname-f-1-Filter.db, CFname-f-1-Index.db, 
 CFname-f-1-Statistic.db, 
 
 What are these extensions are? 
 
 Thank you
 
 
 
 On 21 July 2011 16:11, samal sa...@wakya.in wrote:
 
 Any ways , some where memtable has to be stored right, like we say memtable 
 data is flushed to create sstable on disk.
 Exactly from which location or memory it will be getting from. is it like an 
 objects streams or like it is storing the values in commitlog.
 
 A Memtable is Cassandra's in-memory representation of key/value pairs.
  
 my next question is , data is written to commit log. all the data is 
 available here, and the sstable are getting created on disk, then where and 
 when these memtables are coming into picture
 
 Commitlog is append only file which record write sequentially, more[2], can 
 be thought as check sum file, which to used to recalculate data for memtables 
 in case of crash.
 A write first hits the CommitLog, then Cassandra stores/writes values to 
 in-memory data structures called Memtables. The Memtables are flushed to disk 
 whenever one of the configurable thresholds is met.[3] 
 For each column family there is corresponding memtable.
 There is generally one commitlog file for all CF.
 
 SSTables are immutable once written to disk cannot be modified. It will only 
 be replaced by new SSTable after compaction
 
 
 [1]http://wiki.apache.org/cassandra/ArchitectureOverview
 [2]http://wiki.apache.org/cassandra/ArchitectureCommitLog
 [3]http://wiki.apache.org/cassandra/MemtableThresholds
 
 



Re: Need help json2sstable

2011-07-21 Thread aaron morton
mmm, there is no -f option for sstable2json /  SSTableExport. Datastax 
guys/girls ??

this works for me 

bin/sstable2json /var/lib/cassandra/data/dev/data-g-1-Data.db -k 666f6f  
output.txt

NOTE: key is binary, so thats the ascii encoding for foo

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21 Jul 2011, at 23:19, Nilabja Banerjee wrote:

 This is the full path of  SSTables:  
 /Users/nilabja/Development/Cassandra/apache-cassandra-0.7.5/data/cctest/BTP-f-1-Data.db
 cctest=  keyspace
 BTP= Columnfamily name
 json file= /Users/nilabja/Development/Cassandra/testjson.txt
 
 commands are:  
 bin/sstable2json -f output.txt 
 /Users/nilabja/Development/Cassandra/apache-cassandra-0.7.5/data/cctest1/BTP-f-1-Data.db
  -k keyname
 
 bin/json2sstable -k cctest -c BTP /Users/nilabja/Desktop/testjson.txt 
 /Users/nilabja/Development/Cassandra/apache-cassandra-0.7.5/data/json2sstable/Fetch_CCDetails-f-1-Data.db
  
 
 
 Thank You
 
 
 On 21 July 2011 16:07, aaron morton aa...@thelastpickle.com wrote:
 What is the command line you are executing ? 
 
 That error is only returned by sstable2json when an sstable path is not 
 passed on the command line. 
 
 Cheers
  
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 21 Jul 2011, at 18:50, Nilabja Banerjee wrote:
 
 Thank you...
 but  I have already gone through that.. but still not working... I am 
 getting .. You must supply exactly one sstable
  Can you tell me why I am getting this?
  
 
 On 21 July 2011 02:41, Tyler Hobbs ty...@datastax.com wrote:
 The sstable2json/json2sstable format is detailed here:
 http://www.datastax.com/docs/0.7/utilities/sstable2json
 
 On Wed, Jul 20, 2011 at 4:58 AM, Nilabja Banerjee
 nilabja.baner...@gmail.com wrote:
 
 
 
 
  On 20 July 2011 11:33, Nilabja Banerjee nilabja.baner...@gmail.com wrote:
 
  Hi All,
 
  Here Is my Json structure.
 
 
  {Fetch_CC :{
  cc:{ :1000,
   :ICICI,
   :,
   city:{
   name:banglore
 };
 };
  }
 
  If the structure is incorrect, please give me one small structre to use
  below utility.
  I am using 0.7.5 version.
  Now how can I can use Json2SStable utilities? Please provide me the steps.
  What are the things I have configure?
 
  Thank You
 
 
 
 
 
 --
 Tyler Hobbs
 Software Engineer, DataStax
 Maintainer of the pycassa Cassandra Python client library
 
 
 



Re: b-tree

2011-07-21 Thread aaron morton
 But how will you be able to maintain it while it evolves and new data is 
 added without transactions?

What is the situation you think you need transactions for ?

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 00:06, Eldad Yamin wrote:

 Aaron,
 Nested set is exactly what I had in mind.
 But how will you be able to maintain it while it evolves and new data is 
 added without transactions?
 
 Thanks!
 
 On Thu, Jul 21, 2011 at 1:44 AM, aaron morton aa...@thelastpickle.com wrote:
 Just throwing out a (half baked) idea, perhaps the Nested Set Model of trees 
 would work  http://en.wikipedia.org/wiki/Nested_set_model
 
 * Ever row would represent a set with a left and right encoded into the key
 * Members are inserted as columns into *every* set / row they are a member. 
 So we are de-normalising and trading space for time. 
 * May need to maintain a custom secondary index of the materialised sets. 
 e.g. slice a row to get the first column = the left value you are interested 
 in, that is the key for the set. 
 
 I've not thought it through much further than that, a lot would depend on 
 your data. The top sets may get very big, . 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 21 Jul 2011, at 08:33, Jeffrey Kesselman wrote:
 
 Im not sure if I have an answer for you, anyway, but I'm curious
 
 A b-tree and a binary tree are not the same thing.  A binary tree is a basic 
 fundamental data structure,  A b-tree is an approach to storing and indexing 
 data on disc for a database.
 
 Which do you mean?
 
 On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin elda...@gmail.com wrote:
 Hello,
 Is there any good way of storing a binary-tree in Cassandra?
 I wonder if someone already implement something like that and how 
 accomplished that without transaction supports (while the tree keep 
 evolving)?
 
 I'm asking that becouse I want to save geospatial-data, and SimpleGeo did it 
 using b-tree:
 http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php
 
 Thanks!
 
 
 
 -- 
 It's always darkest just before you are eaten by a grue.
 
 



Re: Compacting manual managing and optimization

2011-07-21 Thread aaron morton
See the online help in cassandra-cli on CREATE / UPDATE COLUMN FAMILY for 
min_compaction_threshold and max_compaction_threshold. 

Also look in the cassandra.yaml file for information on configuring compaction. 

If compaction is really hurting your system it may be a sign that you need to 
scale up or make some other changes. What does your cluster look like ? # 
nodes, load per node, throughput, # clients etc

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 00:30, lebron james wrote:

 Hi! Tell me please, how i can manage compacting process, turn them off and 
 start manualy when i need. How i can improve performance of compacting 
 process? Thanks!



Re: Need help json2sstable

2011-07-21 Thread aaron morton
In my DB the keys added by the client were ascii strings like foo, but these 
are stored as binary arrays in cassandra. So I cannot use the string foo with 
22table2json I have to use the ascii encoding 666f6f .

This will *probably* be what you see in the output from cassandra-cli list 
(unless you have either set a key_validation_class for the CF or used the 
assume statement). 

If one way does not work try the other. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 01:15, Nilabja Banerjee wrote:

 Thank You...
 
 But truely speaking I dnt get you what do you mean by key is binary, so 
 thats the ascii encoding for foo 
 and another thing... this is the output of list BTP command
 
 RowKey: 0902
 = (super_column=0902,
  (column=30, value=303039303030303032, timestamp=1310471032735000)
  (column=31, value=303139303030303032, timestamp=1310471032737000)
  (column=3130, value=30313039303030303032, timestamp=131047103275)
  (column=3131, value=30313139303030303032, timestamp=1310471032752000)
  (column=3132, value=30313239303030303032, timestamp=1310471032753000)
  (column=3133, value=30313339303030303032, timestamp=1310471032755000)
  (column=3134, value=30313439303030303032, timestamp=1310471032757000)
  (column=3135, value=30313539303030303032, timestamp=1310471032758000)
  (column=3136, value=30313639303030303032, timestamp=131047103276)
  (column=3137, value=30313739303030303032, timestamp=1310471032761000)
  (column=3138, value=30313839303030303032, timestamp=1310471032763000)
  (column=3139, value=30313939303030303032, timestamp=1310471032764000)
  (column=32, value=303239303030303032, timestamp=1310471032738000)
  (column=3230, value=30323039303030303032, timestamp=1310471032766000)
  (column=3231, value=30323139303030303032, timestamp=1310471032767000)
  (column=3232, value=30323239303030303032, timestamp=1310471032769000)
  (column=3233, value=30323339303030303032, timestamp=1310471032771000)
  (column=3234, value=30323439303030303032, timestamp=1310471032772000)
  (column=3235, value=30323539303030303032, timestamp=1310471032774000)
  (column=3236, value=30323639303030303032, timestamp=1310471032775000)
  (column=3237, value=30323739303030303032, timestamp=1310471032776000)
  (column=3238, value=30323839303030303032, timestamp=1310471032778000)
  (column=3239, value=30323939303030303032, timestamp=131047103278)
  (column=33, value=303339303030303032, timestamp=131047103274)
 
 How can I Use this facility sstable2json ? 
 Thank you for keeping your patience.. ;) 
 
 On 21 July 2011 17:33, aaron morton aa...@thelastpickle.com wrote:
 mmm, there is no -f option for sstable2json /  SSTableExport. Datastax 
 guys/girls ??
 
 this works for me 
 
 bin/sstable2json /var/lib/cassandra/data/dev/data-g-1-Data.db -k 666f6f  
 output.txt
 
 NOTE: key is binary, so thats the ascii encoding for foo
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 21 Jul 2011, at 23:19, Nilabja Banerjee wrote:
 
 This is the full path of  SSTables:  
 /Users/nilabja/Development/Cassandra/apache-cassandra-0.7.5/data/cctest/BTP-f-1-Data.db
 cctest=  keyspace
 BTP= Columnfamily name
 json file= /Users/nilabja/Development/Cassandra/testjson.txt
 
 commands are:  
 bin/sstable2json -f output.txt 
 /Users/nilabja/Development/Cassandra/apache-cassandra-0.7.5/data/cctest1/BTP-f-1-Data.db
  -k keyname
 
 bin/json2sstable -k cctest -c BTP /Users/nilabja/Desktop/testjson.txt 
 /Users/nilabja/Development/Cassandra/apache-cassandra-0.7.5/data/json2sstable/Fetch_CCDetails-f-1-Data.db
  
 
 
 
 Thank You
 
 
 On 21 July 2011 16:07, aaron morton aa...@thelastpickle.com wrote:
 What is the command line you are executing ? 
 
 That error is only returned by sstable2json when an sstable path is not 
 passed on the command line. 
 
 Cheers
  
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 21 Jul 2011, at 18:50, Nilabja Banerjee wrote:
 
 Thank you...
 but  I have already gone through that.. but still not working... I am 
 getting .. You must supply exactly one sstable
  Can you tell me why I am getting this?
  
 
 On 21 July 2011 02:41, Tyler Hobbs ty...@datastax.com wrote:
 The sstable2json/json2sstable format is detailed here:
 http://www.datastax.com/docs/0.7/utilities/sstable2json
 
 On Wed, Jul 20, 2011 at 4:58 AM, Nilabja Banerjee
 nilabja.baner...@gmail.com wrote:
 
 
 
 
  On 20 July 2011 11:33, Nilabja Banerjee nilabja.baner...@gmail.com 
  wrote:
 
  Hi All,
 
  Here Is my Json structure.
 
 
  {Fetch_CC :{
  cc:{ :1000,
   :ICICI,
   :,
   city

Re: Modeling troubles

2011-07-21 Thread aaron morton
I've no idea about the game or how long you will have to live to compute all 
the combinations but how about:

- row key is byte array describing the position of white/black pieces and the 
move indicator. You would need to have both rows keyed from blacks perspective 
and rows keyed from whites perspective.
- each column name is the byte array for the possible positions of the other 
colour

Good luck. 
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 01:18, Stephen Pope wrote:

 For a side project I’m working on I want to store the entire set of possible 
 Reversi boards. There are an estimated 10^28 possible boards. Each board 
 (from the best way I could think of to implement it) is made up of 2, 64-bit 
 numbers (black pieces, white pieces…pieces in neither of those are empty 
 spaces) and a bit to indicate who’s turn it is. I’ve thought of a few 
 possible ways to do it:
  
 -  Entire board as row key, in an array of bytes. I’m not sure how 
 well Cassandra can handle 10^28 rows. I could also break this up into 
 separate cfs for each depth of move (initially there are 4 pieces on the 
 board in total. I could make a cf for 5 piece, 6, etc to 64). I’m not sure if 
 there’s any advantage to doing that.
 -  64-bit number for the black pieces as row key, with 65-bit column 
 names (white pieces + turn). I’ve read somewhere that there’s a rough limit 
 of 2-billion columns, so this will be problematic for certain. This can also 
 be broken into separate cfs, but I’m still going to hit the column limit
  
 Is there a better way to achieve what I’m trying to do, or will either of 
 these approaches surprise me and work properly?



Re: Is it safe to stop a read repair and any suggestion on speeding up repairs

2011-07-21 Thread aaron morton
nit pick: nodetool repair is just called repair (or the Anti Entropy Service). 
Read Repair is something that happens during a read request. 

Short answer, yes it's safe to kill cassandra during a repair. It's one of the 
nice things about never mutating data. 

Longer answer: If nodetool compactionstats says there are no Validation 
compactions running (and the compaction queue is empty)  and netstats says 
there is nothing streaming there is a a good chance the repair is finished or 
dead. If a neighbour dies during a repair the node it was started on will wait 
for 48 hours(?) until it times out. Check the logs on the machines for errors, 
particularly from the AntiEntropyService. And see what compactionstats is 
saying on all the nodes involved in the repair.

Even Longer: um, 3 TB of data is *way* to much data per node, generally happy 
people have up to about 200 to 300GB per node. The reason for this 
recommendation is so that things like repair, compaction, node moves, etc are 
managable  and because the loss of a single node has less of an impact. I would 
not recommend running a live system with that much data per node. 

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 03:51, Adi wrote:

 We have a 4 node 0.7.6 cluster. RF=2 , 3 TB data per node. 
 A read repair was kicked off on node 4 last week and is still in progress. 
 Later I kicked of read repair on node 2 a few days back.
 We were writing(read/write/updates/NO deletes) data while the repair was in 
 progress but no data has been written for the past 3-4 days. 
 I was hoping the repair should get done in that time-frame before proceeding 
 with further writes/deletes.
 
 Would it be safe to stop it and kick it off per column family or do a full 
 scan of all keys as suggested in an earlier discussion? Any other suggestion 
 on hastening this repair.
 
 On both nodes the repair Thread is waiting at this stage for a long time(~60+ 
 hours)
  java.lang.Thread.State: WAITING
   at java.lang.Object.wait(Native Method)
   - waiting on 580857f3 (a org.apache.cassandra.utils.SimpleCondition)
   at java.lang.Object.wait(Object.java:485)
   at 
 org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:38)
   at 
 org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:791)
Locked ownable synchronizers:
   - None
 A CPU sampling for few minutes shows these methods as hot spots(mostly the 
 top two)
 org.apache.cassandra.db.ColumnFamilyStore.isKeyInRemainingSSTables( )
 org.apache.cassandra.utils.BloomFilter.getHashBuckets( ) 
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.echoData()
 
 netstats does not show anything streaming to/from any of the nodes.
 
 -Adi Pandit
 



Re: cassandra fatal error when compaction

2011-07-21 Thread aaron morton
Looks like nodetool drain has been run. 

Anything else in the logs ?

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 05:48, lebron james wrote:

 Why cassandra fall when i start comaction with nodetool on 35+gb database. 
 all parameter are default.
 
 ERROR [pool-2-thread-1] 2011-07-21 15:25:36,622 Cassandra.java (line 3294) 
 Internal error processing insert
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
 down



Re: Cassandra 0.8.1: request for a sub-column still deserializes all sub-columns for that super column?

2011-07-21 Thread aaron morton
Yes

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 10:06, Oleg Tsvinev wrote:

 Hi All,
 
 Cassandra documentation here:
 
 http://www.datastax.com/docs/0.8/data_model/supercolumns
 
 states that:
 
 Any request for a sub-column deserializes all sub-columns for that super 
 column, so you should avoid data models that rely on on large numbers of 
 sub-columns.
 
 Is this still true?
 
 Thank you,
   Oleg
 



Re: Repair fails with java.io.IOError: java.io.EOFException

2011-07-21 Thread aaron morton
Check /var/log/cassandra/output.log (assuming the default init scripts)

A
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 10:13, Sameer Farooqui wrote:

 Hmm. Just looked at the log more closely.
 
 So, what actually happened is while Repair was running on this specific node, 
 the Cassandra java process terminated itself automatically. The last entries 
 in the log are:
 
  INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java (line 128) 
 GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888 used; max is 
 4030726144
  INFO [ScheduledTasks:1] 2011-07-21 13:00:27,375 GCInspector.java (line 128) 
 GC for ParNew: 266 ms, 158835624 reclaimed leaving 1864471688 used; max is 
 4030726144
  INFO [ScheduledTasks:1] 2011-07-21 13:00:57,658 GCInspector.java (line 128) 
 GC for ParNew: 251 ms, 148861328 reclaimed leaving 193120 used; max is 
 4030726144
  INFO [ScheduledTasks:1] 2011-07-21 13:01:19,358 GCInspector.java (line 128) 
 GC for ParNew: 260 ms, 157638152 reclaimed leaving 1955746368 used; max is 
 4030726144
  INFO [ScheduledTasks:1] 2011-07-21 13:01:22,729 GCInspector.java (line 128) 
 GC for ParNew: 325 ms, 154157352 reclaimed leaving 1969361176 used; max is 
 4030726144
  INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line 128) 
 GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max is 
 4030726144
  
 When we came in this morning, nodetool ring from another node showed the 1st 
 node as down and OpsCenter also reported it as down.
 
 Next we ran sudo netstat -anp | grep 7199 from the 1st node to see the 
 status of the Cassandra PID and it was not running.
 
 We then started Cassandra:
 
 INFO [main] 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) 
 Logging initialized
  INFO [main] 2011-07-21 15:48:07,266 AbstractCassandraDaemon.java (line 96) 
 Heap size: 3894411264/3894411264
  INFO [main] 2011-07-21 15:48:11,678 CLibrary.java (line 106) JNA mlockall 
 successful
  INFO [main] 2011-07-21 15:48:11,702 DatabaseDescriptor.java (line 121) 
 Loading settings from 
 file:/home/ubuntu/brisk/resources/cassandra/conf/cassandra.yaml
 
 
 It was during this start process that the java.io.EOFException was seen, but 
 yes, like you said Jonathan, the Cassandra process started back up and joined 
 the ring. 
 
 We're now wondering why the Repair failed and why Cassandra crashed in the 
 first place. We only had default level logging enabled. Is there something 
 else I can check or that you suspect?
 
 Should we turn the logging up to debug and retry the Repair?
 
 
 - Sameer
 
 
 On Thu, Jul 21, 2011 at 12:37 PM, Jonathan Ellis jbel...@gmail.com wrote:
 Looks harmless to me.
 
 On Thu, Jul 21, 2011 at 1:41 PM, Sameer Farooqui
 cassandral...@gmail.com wrote:
  While running Repair on a 0.8.1 node, we got this error in the system.log:
 
  ERROR [Thread-23] 2011-07-21 15:48:43,868 AbstractCassandraDaemon.java (line
  113) Fatal exception in thread Thread[Thread-23,5,main]
  java.io.IOError: java.io.EOFException
  at
  org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78)
  Caused by: java.io.EOFException
  at java.io.DataInputStream.readInt(DataInputStream.java:375)
  at
  org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
 
  There's just a bunch of informational messages about Gossip before this.
 
  Looks like the file or stream unexpectedly ended?
  http://download.oracle.com/javase/1.4.2/docs/api/java/io/EOFException.html
 
  Is this a bug or something wrong in our environment?
 
 
  - Sameer
 
 
 
 
 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com
 



Re: Repair fails with java.io.IOError: java.io.EOFException

2011-07-21 Thread aaron morton
The default init.d script will direct std out/err to that file, how are you 
starting brisk / cassandra ?

Check the syslog and other logs in /var/log to see if the OS killed cassandra. 

Also, what was the last thing in the casandra log before INFO [main] 2011-07-21 
15:48:07,233 AbstractCassandraDaemon.java (line 78) Logging initialised ?


Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 10:50, Sameer Farooqui wrote:

 Hey Aaron,
 
 I don't have any output.log files in that folder:
 
 ubuntu@ip-10-2-x-x:~$ cd /var/log/cassandra
 ubuntu@ip-10-2-x-x:/var/log/cassandra$ ls
 system.log system.log.11  system.log.4  system.log.7
 system.log.1   system.log.2   system.log.5  system.log.8
 system.log.10  system.log.3   system.log.6  system.log.9
 
 
 
 On Thu, Jul 21, 2011 at 3:40 PM, aaron morton aa...@thelastpickle.com wrote:
 Check /var/log/cassandra/output.log (assuming the default init scripts)
 
 A
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 22 Jul 2011, at 10:13, Sameer Farooqui wrote:
 
 Hmm. Just looked at the log more closely.
 
 So, what actually happened is while Repair was running on this specific 
 node, the Cassandra java process terminated itself automatically. The last 
 entries in the log are:
 
  INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java (line 128) 
 GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888 used; max is 
 4030726144
  INFO [ScheduledTasks:1] 2011-07-21 13:00:27,375 GCInspector.java (line 128) 
 GC for ParNew: 266 ms, 158835624 reclaimed leaving 1864471688 used; max is 
 4030726144
  INFO [ScheduledTasks:1] 2011-07-21 13:00:57,658 GCInspector.java (line 128) 
 GC for ParNew: 251 ms, 148861328 reclaimed leaving 193120 used; max is 
 4030726144
  INFO [ScheduledTasks:1] 2011-07-21 13:01:19,358 GCInspector.java (line 128) 
 GC for ParNew: 260 ms, 157638152 reclaimed leaving 1955746368 used; max is 
 4030726144
  INFO [ScheduledTasks:1] 2011-07-21 13:01:22,729 GCInspector.java (line 128) 
 GC for ParNew: 325 ms, 154157352 reclaimed leaving 1969361176 used; max is 
 4030726144
  INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line 128) 
 GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max is 
 4030726144
  
 When we came in this morning, nodetool ring from another node showed the 1st 
 node as down and OpsCenter also reported it as down.
 
 Next we ran sudo netstat -anp | grep 7199 from the 1st node to see the 
 status of the Cassandra PID and it was not running.
 
 We then started Cassandra:
 
 INFO [main] 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) 
 Logging initialized
  INFO [main] 2011-07-21 15:48:07,266 AbstractCassandraDaemon.java (line 96) 
 Heap size: 3894411264/3894411264
  INFO [main] 2011-07-21 15:48:11,678 CLibrary.java (line 106) JNA mlockall 
 successful
  INFO [main] 2011-07-21 15:48:11,702 DatabaseDescriptor.java (line 121) 
 Loading settings from 
 file:/home/ubuntu/brisk/resources/cassandra/conf/cassandra.yaml
 
 
 It was during this start process that the java.io.EOFException was seen, but 
 yes, like you said Jonathan, the Cassandra process started back up and 
 joined the ring. 
 
 We're now wondering why the Repair failed and why Cassandra crashed in the 
 first place. We only had default level logging enabled. Is there something 
 else I can check or that you suspect?
 
 Should we turn the logging up to debug and retry the Repair?
 
 
 - Sameer
 
 
 On Thu, Jul 21, 2011 at 12:37 PM, Jonathan Ellis jbel...@gmail.com wrote:
 Looks harmless to me.
 
 On Thu, Jul 21, 2011 at 1:41 PM, Sameer Farooqui
 cassandral...@gmail.com wrote:
  While running Repair on a 0.8.1 node, we got this error in the system.log:
 
  ERROR [Thread-23] 2011-07-21 15:48:43,868 AbstractCassandraDaemon.java 
  (line
  113) Fatal exception in thread Thread[Thread-23,5,main]
  java.io.IOError: java.io.EOFException
  at
  org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78)
  Caused by: java.io.EOFException
  at java.io.DataInputStream.readInt(DataInputStream.java:375)
  at
  org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
 
  There's just a bunch of informational messages about Gossip before this.
 
  Looks like the file or stream unexpectedly ended?
  http://download.oracle.com/javase/1.4.2/docs/api/java/io/EOFException.html
 
  Is this a bug or something wrong in our environment?
 
 
  - Sameer
 
 
 
 
 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com
 
 
 



Re: Stress test using Java-based stress utility

2011-07-22 Thread aaron morton
UnavailableException is raised server side when there is less than CL nodes UP 
when the request starts. 

It seems odd to get it in this case because the default replication factor used 
by stress test is 1. How many nodes do you have and have you made any changes 
to the RF ?

Also check the server side logs as Kirk says. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 18:37, Kirk True wrote:

 Have you checked the logs on the nodes to see if there are any errors?
 
 On 7/21/11 10:43 PM, Nilabja Banerjee wrote:
 
 Hi All,
 
 I am following this following link  
 http://www.datastax.com/docs/0.7/utilities/stress_java  for a stress test. 
 I am getting this notification after running this command 
 
 xxx.xxx.xxx.xx= my ip
 contrib/stress/bin/stress -d xxx.xxx.xxx.xx
 
 Created keyspaces. Sleeping 1s for propagation.
 total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
 Operation [44] retried 10 times - error inserting key 044 
 ((UnavailableException))
 
 Operation [49] retried 10 times - error inserting key 049 
 ((UnavailableException))
 
 Operation [7] retried 10 times - error inserting key 007 
 ((UnavailableException))
 
 Operation [6] retried 10 times - error inserting key 006 
 ((UnavailableException))
 
 
 Any idea why I am getting these things?
 
 
 Thank You
 
 
 
 
 
 -- 
 Kirk True 
 Founder, Principal Engineer 
 
 mustardgrain.gif 
 
 Expert Engineering Firepower 
 
 About us: twitter.gif linkedin.gif



Re: b-tree

2011-07-22 Thread aaron morton
You can use something like Zoo Keeper to coordinate processes doing page splits.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 19:05, Eldad Yamin wrote:

 In order order to split the nodes.
 SimpleGeo have max 1,000 recods (i.e places) on each node in the tree, if the 
 number is 1,000 they split the node.
 In order to avoid that more then 1 process will edit/split the node - 
 transaction is needed.
 
 On Jul 22, 2011 1:01 AM, aaron morton aa...@thelastpickle.com wrote:
  But how will you be able to maintain it while it evolves and new data is 
  added without transactions?
  
  What is the situation you think you need transactions for ?
  
  Cheers
  
  -
  Aaron Morton
  Freelance Cassandra Developer
  @aaronmorton
  http://www.thelastpickle.com
  
  On 22 Jul 2011, at 00:06, Eldad Yamin wrote:
  
  Aaron,
  Nested set is exactly what I had in mind.
  But how will you be able to maintain it while it evolves and new data is 
  added without transactions?
  
  Thanks!
  
  On Thu, Jul 21, 2011 at 1:44 AM, aaron morton aa...@thelastpickle.com 
  wrote:
  Just throwing out a (half baked) idea, perhaps the Nested Set Model of 
  trees would work http://en.wikipedia.org/wiki/Nested_set_model
  
  * Ever row would represent a set with a left and right encoded into the key
  * Members are inserted as columns into *every* set / row they are a 
  member. So we are de-normalising and trading space for time. 
  * May need to maintain a custom secondary index of the materialised sets. 
  e.g. slice a row to get the first column = the left value you are 
  interested in, that is the key for the set. 
  
  I've not thought it through much further than that, a lot would depend on 
  your data. The top sets may get very big, . 
  
  Cheers
  
  -
  Aaron Morton
  Freelance Cassandra Developer
  @aaronmorton
  http://www.thelastpickle.com
  
  On 21 Jul 2011, at 08:33, Jeffrey Kesselman wrote:
  
  Im not sure if I have an answer for you, anyway, but I'm curious
  
  A b-tree and a binary tree are not the same thing. A binary tree is a 
  basic fundamental data structure, A b-tree is an approach to storing and 
  indexing data on disc for a database.
  
  Which do you mean?
  
  On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin elda...@gmail.com wrote:
  Hello,
  Is there any good way of storing a binary-tree in Cassandra?
  I wonder if someone already implement something like that and how 
  accomplished that without transaction supports (while the tree keep 
  evolving)?
  
  I'm asking that becouse I want to save geospatial-data, and SimpleGeo did 
  it using b-tree:
  http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php
  
  Thanks!
  
  
  
  -- 
  It's always darkest just before you are eaten by a grue.
  
  
  



Re: cassandra fatal error when compaction

2011-07-22 Thread aaron morton
Something has shutdown the mutation stage thread pool. This happens during 
drain or decommission / move. 

Restart the service and it should be ok.

if it happens again without anyone running something like drain, decommission 
or move let us know. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 19:41, lebron james wrote:

 ERROR [pool-2-thread-3] 2011-07-22 10:34:59,102 Cassandra.java (line 3294) 
 Internal error processing insert
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
 down
 at 
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
 at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
 at 
 org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360)
 at 
 org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241)
 at org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62)
 at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99)
 at 
 org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210)
 at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154)
 at 
 org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560)
 at 
 org.apache.cassandra.thrift.CassandraServer.internal_insert(CassandraServer.java:436)
 at 
 org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:444)
 at 
 org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:3286)
 at 
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
 at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 ERROR [pool-2-thread-6] 2011-07-22 10:34:59,102 Cassandra.java (line 3294) 
 Internal error processing insert
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
 down
 at 
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
 at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
 at 
 org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360)
 at 
 org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241)
 at org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62)
 at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99)
 at 
 org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210)
 at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154)
 at 
 org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560)
 at 
 org.apache.cassandra.thrift.CassandraServer.internal_insert(CassandraServer.java:436)
 at 
 org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:444)
 at 
 org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:3286)
 at 
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
 at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 ERROR [pool-2-thread-3] 2011-07-22 10:34:59,102 Cassandra.java (line 3294) 
 Internal error processing insert
 java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
 down
 at 
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73)
 at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
 at 
 org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360)
 at 
 org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241)
 at org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62)
 at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99)
 at 
 org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210)
 at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154)
 at 
 org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560)
 at 
 org.apache.cassandra.thrift.CassandraServer.internal_insert(CassandraServer.java:436)
 at 
 org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:444

Re: eliminate need to repair by using column TTL??

2011-07-22 Thread aaron morton
Read repair will only repair data that is read on the nodes that are up at that 
time, and does not guarantee that any changes it detects will be written back 
to the nodes. The diff mutations are async fire and forget messages which may 
go missing or be dropped or ignored by the recipient just like any other 
message. 

Also getting hit with a bunch of read repair operations is pretty painful. The 
normal read runs, the coordinator detects the digest mis-match, the read runs 
again from all nodes and they all have to return their full data (no digests 
this time), the coordinator detects the diffs, mutations are sent back to each 
node that needs them. All this happens sync to the read request when the CL  
ONE. Thats 2 reads with more network IO and up to RF mutations . 

The delete thing is important but repair also reduces the chance of reads 
getting hit with RR and gives me confidence when it's necessary to nuke a bad 
node. 

Your plan may work but it feels risky to me. You may end up with worse read 
performance and unpleasent emotions if you ever have to nuke a node. Others may 
disagree. 

Not ignoring the fact the repair can take a long time, fail, hurt performance 
etc. There are plans to improve it though. 

Cheers
  
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 22 Jul 2011, at 19:55, jonathan.co...@gmail.com wrote:

 One of the main reasons for regularly running repair is to make sure deletes 
 are propagated in the cluster, i.e., data is not resurrected if a node never 
 received the delete call.
 
 And repair-on-read takes care of repairing inconsistencies on-the-fly.
 
 So if I were to set a universal TTL on all columns - so everything would only 
 live for a certain age, would I be able to get away without having to do 
 regular repairs with nodetool?
 
 I realize this scenario would not be applicable for everyone, but our data 
 model would allow us to do this. 
 
 So could this be an alternative to running the (resource-intensive, 
 long-running) repairs with nodetool?
 
 Thanks.



Re: Predictable low RW latency, SLABS and STW GC

2011-07-24 Thread aaron morton
Restarting the service will drop all the memmapped caches, cassandra caches are 
saved / persistent and you can also use memcachd if you want. 

Are you experiencing stop the world pauses? There are some things that can be 
done to reduce the chance of them happening. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23 Jul 2011, at 05:34, Milind Parikh wrote:

 In order to be predicable @ big data scale, the intensity and periodicity of 
 STW Garbage Collection has to be brought down. Assume that SLABS (Cass 2252) 
 will be available in the main line at some time and assume that this will 
 have the impact that other projects (hbase etc) are reporting. I womder 
 whether avoiding GC by restarting the servers before GC will be a feasible 
 approach (of course while knowing the workload)
 
 Regards
 Milind
 



Re: question on setup for writes into 2 datacenters

2011-07-24 Thread aaron morton
Quick reminder, with RF == 2 the QUORUM is 2 as well. So when using 
LOCAL_QUORUM with RF 2+2 you will effectively be using LOCAL_ALL which may not 
be what you want. As De La Soul sang, 3 is the magic number for minimum fault 
tolerance (QUORUM is then 2). 

Cheers
  
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23 Jul 2011, at 10:04, Sameer Farooqui wrote:

 It sounds like what you're looking for is write consistency of local_quorum:
 http://www.datastax.com/docs/0.8/consistency/index#write-consistency
 
 local_quorum would mean the write has to be successful on a majority of nodes 
 in DC1 (so 2) before it is considered successful.
 
 If you use just quorum write, it'll have to be committed to 3 replicas out of 
 the 4 before it's considered successful.
 
 
 
 
 On Fri, Jul 22, 2011 at 1:57 PM, Dean Hiller d...@alvazan.com wrote:
 Ideally, we would want to have a replication factor of 4, and a minimum write 
 consistency of 2 (which looking at the default in cassandra.yaml is to memory 
 first with asynch to disk...perfect so far!!!)
 
 Now, obviously, I can get the partitioner setup to make sure I get 2 replicas 
 in each data center.  The next thing I would want to guarantee however is 
 that if a write came into datacenter 1, it would write to the two nodes in 
 datacenter 1 and asynchronously replicate to datacenter 2.  Is this possible? 
  Does cassandra already handle that or is there something I could do to get 
 cassandra to do that?
 
 In this mode, I believe I can have both datacenters be live as well as be 
 backup for the other not wasting resources.
 
 thanks,
 Dean
 



Re: select * from A join B using(common_id) where A.id == a and B.id == b

2011-07-24 Thread aaron morton
 my fall-back approach is, since A and B do not change a lot, I'll
 pre-generate the join of A and B (not very large) keyed on A.id +
 B.id,
 then do the get(a+b)

+1 materialise views / joins you know you want ahead of time. Trade space for 
time. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23 Jul 2011, at 10:41, Yang wrote:

 this is a common pattern used in RDMS,
 is there some existing idiom to do it in cassandra ?
 
 
 if the size of select * from A where id == a  is very large, and
 similarly for B, while the join of A.id == a and B.id==b is small,
 then doing a get() for both and then merging seems excessively slow.
 
 
 my fall-back approach is, since A and B do not change a lot, I'll
 pre-generate the join of A and B (not very large) keyed on A.id +
 B.id,
 then do the get(a+b)
 
 
 thanks
 Yang



<    5   6   7   8   9   10   11   12   13   14   >