Re: Rolling upgrade from 1.1.12 to 1.2.5 visibility issue

2013-06-21 Thread Polytron Feng
Hi aaron, Thank you for your reply. We tried to increase PHI threshold but still met same issue. We used Ec2Snitch and PropertyFileSnitch instead and they work without this problem. It seems only happened with Ec2MultiRegionSnitch config. Although we can workaround this problem by

Re: Heap is not released and streaming hangs at 0%

2013-06-21 Thread aaron morton
nodetool -h localhost flush didn't do much good. Do you have 100's of millions of rows ? If so see recent discussions about reducing the bloom_filter_fp_chance and index_sampling. If this is an old schema you may be using the very old setting of 0.000744 which creates a lot of bloom filters.

Re: Joining distinct clusters with the same schema together

2013-06-21 Thread aaron morton
Question 2: is this a sane strategy? On its face my answer is not... really? I'd go with a solid no. Just because the the three independent clusters have a schema that looks the same does not make them the same. The schema is a versioned document, you will not be able to merge them by

Re: error on startup: unable to find sufficient sources for streaming range

2013-06-21 Thread aaron morton
On some of my nodes, I'm getting the following exception when cassandra starts How many nodes? Is this a new node or an old one and this problem just started ? What version are you on ? Do you have this error from system.log ? It includes the thread name which is handy to debug things. Also

Re: Performance Difference between Cassandra version

2013-06-21 Thread aaron morton
I am trying to see whether there will be any performance difference between Cassandra 1.0.8 vs Cassandra 1.2.2 for reading the data mainly? 1.0 has key and row caches defined per CF, 1.1 has global ones which are better utilised and easier to manage. 1.2 moves bloom filters and compression

Re: Unit Testing Cassandra

2013-06-21 Thread aaron morton
2) Second (in which I am more interested in) is for performance (stress/load) testing. Sometimes you can get cassandra-stress (shipped in the bin distro) to approximate the expected work load. It's then pretty easy to benchmark and tests you configuration changes. Cheers

Re: Get fragments of big files (videos)

2013-06-21 Thread aaron morton
You should split the large blobs into multiple rows, and I would use 10MB per row as a good rule of thumb. See http://www.datastax.com/dev/blog/cassandra-file-system-design for a description of blob store in cassandra Cheers - Aaron Morton Freelance Cassandra Consultant New

Re: Compaction not running

2013-06-21 Thread aaron morton
Do you think it's worth posting an issue, or not enough traceable evidence ? If you can reproduce it then certainly file a bug. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/06/2013, at 9:41 PM, Franc Carter

Re: Confirm with cqlsh of Cassandra-1.2.5, the behavior of the export/import

2013-06-21 Thread aaron morton
That looks like it may be a bug, can you raise a ticket at https://issues.apache.org/jira/browse/CASSANDRA Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 21/06/2013, at 1:56 AM, hiroshi.kise...@hitachi.com wrote:

Re: block size

2013-06-21 Thread aaron morton
If I have a data in column of size 500KB, Also some information here http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ The data files are memory mapped so it's sort of OS dependant. A - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton

Re: Compaction not running

2013-06-21 Thread Franc Carter
On Fri, Jun 21, 2013 at 6:16 PM, aaron morton aa...@thelastpickle.comwrote: Do you think it's worth posting an issue, or not enough traceable evidence ? If you can reproduce it then certainly file a bug. I'll keep my eye on it to see if it happens again and there is a pattern cheers

Re: nodetool ring showing different 'Load' size

2013-06-21 Thread Rodrigo Felix
Ok. Thank you all you guys. Att. *Rodrigo Felix de Almeida* LSBD - Universidade Federal do Ceará Project Manager MBA, CSM, CSPO, SCJP On Wed, Jun 19, 2013 at 2:26 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jun 19, 2013 at 5:47 AM, Michal Michalski mich...@opera.com wrote: You can

Re: [Cassandra] Replacing a cassandra node

2013-06-21 Thread Eric Stevens
Is there a way to replace a failed server using vnodes? I only had occasion to do this once, on a relatively small cluster. At the time I just needed to get the new server online and wasn't concerned about the performance implications, so I just removed the failed server from the cluster and

Re: timeuuid and cql3 query

2013-06-21 Thread Eric Stevens
It's my understanding that if cardinality of the first part of the primary key has low cardinality, you will struggle with cluster balance as (unless you use WITH COMPACT STORAGE) the first entry of the primary key equates to the row key from the traditional interface, thus all entries related to

Re: Heap is not released and streaming hangs at 0%

2013-06-21 Thread srmore
On Fri, Jun 21, 2013 at 2:53 AM, aaron morton aa...@thelastpickle.comwrote: nodetool -h localhost flush didn't do much good. Do you have 100's of millions of rows ? If so see recent discussions about reducing the bloom_filter_fp_chance and index_sampling. Yes, I have 100's of millions of

Re: timeuuid and cql3 query

2013-06-21 Thread Ryan, Brent
Yes. The problem is that I can't use counter as the partition key otherwise I'd wind up with hot spots in my cluster where majority of the data is being written to single node in the cluster. The only real way around this problem with Cassandra is to follow along with what this blog does:

NREL has released open source Databus on github for time series data

2013-06-21 Thread Hiller, Dean
NREL has released their open source databus. They spin it as energy data (and a system for campus energy/building energy) but it is very general right now and probably will stay pretty general. More information can be found here http://www.nrel.gov/analysis/databus/ The source code can be

Cassandra terminates with OutOfMemory (OOM) error

2013-06-21 Thread Mohammed Guller
We have a 3-node cassandra cluster on AWS. These nodes are running cassandra 1.2.2 and have 8GB memory. We didn't change any of the default heap or GC settings. So each node is allocating 1.8GB of heap space. The rows are wide; each row stores around 260,000 columns. We are reading the data

Re: Cassandra terminates with OutOfMemory (OOM) error

2013-06-21 Thread Jabbar Azam
Hello Mohammed, You should increase the heap space. You should also tune the garbage collection so young generation objects are collected faster, relieving pressure on heap We have been using jdk 7 and it uses G1 as the default collector. It does a better job than me trying to optimise the JDK 6

Cassandra driver performance question...

2013-06-21 Thread Tony Anecito
Hi All, I am using jdbc driver and noticed that if I run the same query twice the second time it is much faster. I setup the row cache and column family cache and it not seem to make a difference. I am wondering how to setup cassandra such that the first query is always as fast as the second

Re: Cassandra driver performance question...

2013-06-21 Thread Jabbar Azam
Hello Tony, I would guess that the first queries data is put into the row cache and the filesystem cache. The second query gets the data from the row cache and or the filesystem cache so it'll be faster. If you want to make it consistently faster having a key cache will definitely help. The

Re: [Cassandra] Replacing a cassandra node with one of the same IP

2013-06-21 Thread Mahony, Robin
Please note that I am currently using version 1.2.2 of Cassandra. Also we are using virtual nodes. My question mainly stems from the fact that the nodes appear to be aware that the node uuid changes for the IP (from reading the logs), so I am just wondering if this means the hinted handoffs

crashed while running repair

2013-06-21 Thread Franc Carter
Hi, I am experimenting with Cassandra-1.2.4, and got a crash while running repair. The nodes has 24GB of ram with an 8GB heap. Any ideas on my I may have missed in the config ? Log is below ERROR [Thread-136019] 2013-06-22 06:30:05,861 CassandraDaemon.java (line 174) Exception in thread

Re: Heap is not released and streaming hangs at 0%

2013-06-21 Thread Bryan Talbot
bloom_filter_fp_chance = 0.7 is probably way too large to be effective and you'll probably have issues compacting deleted rows and get poor read performance with a value that high. I'd guess that anything larger than 0.1 might as well be 1.0. -Bryan On Fri, Jun 21, 2013 at 5:58 AM, srmore

Updated sstable size for LCS, ran upgradesstables, file sizes didn't change

2013-06-21 Thread Andrew Bialecki
We're potentially considering increasing the size of our sstables for some column families from 10MB to something larger. In test, we've been trying to verify that the sstable file sizes change and then doing a bit of benchmarking. However when we run alter the column family and then run nodetool

Re: Updated sstable size for LCS, ran upgradesstables, file sizes didn't change

2013-06-21 Thread Robert Coli
On Fri, Jun 21, 2013 at 4:40 PM, Andrew Bialecki andrew.biale...@gmail.com wrote: However when we run alter the column family and then run nodetool upgradesstables -a keyspace columnfamily, the files in the data directory have been re-written, but the file sizes are the same. Is this the

Re: Updated sstable size for LCS, ran upgradesstables, file sizes didn't change

2013-06-21 Thread Wei Zhu
I think the new SSTable will be in the new size. In order to do that, you need to trigger a compaction so that the new SSTables will be generated. for LCS, there is no major compaction though. You can run a nodetool repair and hopefully you will bring some new SSTables and compactions will kick

Re: Updated sstable size for LCS, ran upgradesstables, file sizes didn't change

2013-06-21 Thread sankalp kohli
I think you can remove the json file which stores the mapping of which sstable is in which level. This will be treated by cassandra as all sstables in level 0 which will trigger a compaction. But if you have lot of data, it will be very slow as you will keep compacting data between L1 and L0. This

Re: Heap is not released and streaming hangs at 0%

2013-06-21 Thread sankalp kohli
I will take a heap dump and see whats in there rather than guessing. On Fri, Jun 21, 2013 at 4:12 PM, Bryan Talbot btal...@aeriagames.comwrote: bloom_filter_fp_chance = 0.7 is probably way too large to be effective and you'll probably have issues compacting deleted rows and get poor read

Re: crashed while running repair

2013-06-21 Thread sankalp kohli
Looks like memory map failed. In a 64 bit system, you should have unlimited virtual memory but Linux has a limit on the number of maps. Looks at these two places. http://stackoverflow.com/questions/8892143/error-when-opening-a-lucene-index-map-failed

Re: Cassandra terminates with OutOfMemory (OOM) error

2013-06-21 Thread sankalp kohli
Looks like you are putting lot of pressure on the heap by doing a slice query on a large row. Do you have lot of deletes/tombstone on the rows? That might be causing a problem. Also why are you returning so many columns as once, you can use auto paginate feature in Astyanax. Also do you see lot

Re: Cassandra driver performance question...

2013-06-21 Thread Tony Anecito
Thanks Jabbar,   I ran nodetool as suggested and it 0 latency for the row count I have.   I also ran cli list command for the table hit by my JDBC perparedStatement and it was slow like 121msecs the first time I ran it and second time I ran it it was 40msecs versus jdbc call of 38msecs to start

Re: Cassandra driver performance question...

2013-06-21 Thread Tony Anecito
Hi Jabbar,   I think I know what is going on. I happened accross a change mentioned by the jdbc driver developers regarding metadata caching. Seems the metadata caching was moved from the connection object to the preparedStatement object. So I am wondering if the time difference I am seeing on