Re: problem with sliceQuery with composite column

2012-02-16 Thread Deno Vichas
so i flipped the composite around to; create column family StockHistory with comparator = 'CompositeType(UTF8Type,LongType)' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type'; and now i'm getting what i expected the first time. i can get a range of typ

about the compaction and read performance

2012-02-16 Thread zhangcheng
Cassandra has no way of knowing that all the data is in the most recent sstable, and will have to check the others too, and this bring a lot of difficulty to data compaction. I have a question that if I want a high performance data compaction, how can I implement that all the columns are all

Re: Key cache hit rate issue

2012-02-16 Thread Eran Chinthaka Withana
Hi Jonathan, > > > For some reason 16637958 (the keys cached) has become a golden number > and I > > don't see key cache increasing beyond that. > > 16637958 is your configured cache capacity according to the cfstats you > pasted. this is another weird part. If you look at the schema[1] (pasted

Re: Re: Key cache hit rate issue

2012-02-16 Thread zhangcheng
Yes, Cassandra has no way of knowing that all the data is in the most recent sstable, and will have to check the others too, and this bring a lot of difficulty to data compaction. I have a question that if I want a high performance data compaction, how can I implement that all the columns are

Re: Re: Key cache hit rate issue

2012-02-16 Thread zhangcheng
Thanks, Jonathan. I got it. 2012-02-17 zhangcheng 发件人: Jonathan Ellis 发送时间: 2012-02-17 10:15:05 收件人: user 抄送: 主题: Re: Key cache hit rate issue Look for this code in SSTableReader.getPosition: Pair unifiedKey = new Pair(descriptor, decoratedKey); Long cached

Re: Re: Key cache hit rate issue

2012-02-16 Thread zhangcheng
according to the read process, the key of the keycache should be the row key. 2012-02-17 zhangcheng 发件人: Todd Burruss 发送时间: 2012-02-17 06:23:47 收件人: user@cassandra.apache.org 抄送: 主题: Re: Key cache hit rate issue jonathan, you said the key to the cache is key + sstable? looking at

Re: 1.0.2 - nodetool ring and info reports wrong load after compact

2012-02-16 Thread Jonathan Ellis
CASSANDRA-3496, fixed in 1.0.4+ On Thu, Feb 16, 2012 at 8:27 AM, Bill Au wrote: > I am running 1.0.2 with the default tiered compaction.  After running a > "nodetool compact", I noticed that on about half of the machines in my > cluster, both "nodetool ring" and "nodetool info" report that the lo

Re: Key cache hit rate issue

2012-02-16 Thread Jonathan Ellis
Look for this code in SSTableReader.getPosition: Pair unifiedKey = new Pair(descriptor, decoratedKey); Long cachedPosition = getCachedPosition(unifiedKey, true); On Thu, Feb 16, 2012 at 4:23 PM, Todd Burruss wrote: > jonathan, you said the key to the cache is key + sstabl

Re: problem with sliceQuery with composite column

2012-02-16 Thread Deno Vichas
On 2/16/2012 12:46 AM, aaron morton wrote: split this CF into two? Or change the order of the column components as suggested. as suggested - where? are you saying if the flip the composite i'll be able to ask for a range by type? and cassandra is going to order by columns like; ticks:1 t

Re: Key cache hit rate issue

2012-02-16 Thread Jonathan Ellis
On Thu, Feb 16, 2012 at 3:52 PM, Eran Chinthaka Withana wrote: > Thanks for the reply. Yes there is a possibility that the keys can be > distributed in multiple SSTables, but my data access patterns are such that > I always read/write the whole row. So I expect all the data to be in the > same SST

Re: Wide row column slicing - row size shard limit

2012-02-16 Thread Data Craftsman
Hi Aaron Morton and R. Verlangen, Thanks for the quick answer. It's good to know Thrift's limit on the amount of data it will accept / send. I know the hard limit is 2 billion columns per row. My question is at what size it will slowdown read/write performance and maintenance. The blog I refer

Re: 1.0.2 - nodetool ring and info reports wrong load after compact

2012-02-16 Thread Bill Au
No, I am not using compression. Bill On Thu, Feb 16, 2012 at 2:05 PM, aaron morton wrote: > Are you using compression ? > > I remember some issues with compression and reported load, cannot remember > the details. > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmort

Re: Key cache hit rate issue

2012-02-16 Thread Eran Chinthaka Withana
Hi Todd, Thanks for the reply. But I don't think the settings that you mentioned is not playing any role as those are set to 0.85 and 0.6 in my cassandra.yaml and the proportion between the number I see as space used and the amount I set is much less than those numbers. Thanks, Eran Chinthaka Wit

Re: Key cache hit rate issue

2012-02-16 Thread Todd Burruss
jonathan, you said the key to the cache is key + sstable? looking at the code it looks like a DecoratedKey is the "row key". how does sstable come into play? On 2/16/12 1:20 PM, "Jonathan Ellis" wrote: >So, you have roughly 1/6 of your (physical) row keys cached and about >1/4 cache hit rate,

Re: Key cache hit rate issue

2012-02-16 Thread Todd Burruss
there is a setting in the yaml file that helps relieve memory pressure by reducing the row cache. it is based on the percent of memory used by the JVM the setting are, reduce_cache_sizes_at and reduce_cache_capacity_to. see how much free memory you have and if the numbers suggest that you have

Re: Key cache hit rate issue

2012-02-16 Thread Franc Carter
On 17/02/2012 8:53 AM, "Eran Chinthaka Withana" wrote: > > Hi Jonathan, > > Thanks for the reply. Yes there is a possibility that the keys can be distributed in multiple SSTables, but my data access patterns are such that I always read/write the whole row. So I expect all the data to be in the sam

Re: Key cache hit rate issue

2012-02-16 Thread Eran Chinthaka Withana
Hi Jonathan, Thanks for the reply. Yes there is a possibility that the keys can be distributed in multiple SSTables, but my data access patterns are such that I always read/write the whole row. So I expect all the data to be in the same SSTable (please correct me if I'm wrong). For some reason 16

Re: Key cache hit rate issue

2012-02-16 Thread Jonathan Ellis
So, you have roughly 1/6 of your (physical) row keys cached and about 1/4 cache hit rate, which doesn't sound unreasonable to me. Remember, each logical key may be spread across multiple physical sstables -- each (key, sstable) pair is one entry in the key cache. On Thu, Feb 16, 2012 at 1:48 PM,

Re: Key cache hit rate issue

2012-02-16 Thread Eran Chinthaka Withana
Hi Aaron, Here it is. Keyspace: Read Count: 1123637972 Read Latency: 5.757938114343114 ms. Write Count: 128201833 Write Latency: 0.0682576607387509 ms. Pending Tasks: 0 Column Family: YY SSTable count: 18 Space used (live): 103318720685 Space used (total): 103318720685 Number of Keys (est

Re: Replication factor per column family

2012-02-16 Thread aaron morton
yes. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/02/2012, at 10:15 PM, R. Verlangen wrote: > Hmm ok. This means if I want to have a CF with RF = 3 and another CF with RF > = 1 (e.g. some debug logging) I will have to create 2 keyspaces? >

Best way to store and index time series items with multiple other dimensions?

2012-02-16 Thread Nate Sammons
I'm trying to figure out the best way to store items for query based on multiple dimensions. I've got a large volume (many 100s of millions per day) of time-ordered objects with 10+ properties each that I need to support arbitrary query expressions on. So I may need to support a query based on

Re: ops center - join cluster operation

2012-02-16 Thread aaron morton
Try here http://www.datastax.com/support-forums/forum/opscenter Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/02/2012, at 7:01 AM, Radim Kolar wrote: > Is there way in ops center gui to make node with agent to join specified > cluste

Re: 1.0.2 - nodetool ring and info reports wrong load after compact

2012-02-16 Thread aaron morton
Are you using compression ? I remember some issues with compression and reported load, cannot remember the details. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/02/2012, at 3:27 AM, Bill Au wrote: > I am running 1.0.2 with the def

Getting HexToBytes Error while reading from Cassandra

2012-02-16 Thread PJunk
Hello, We are trying to read data from cassandra via pig. The version of cassandra is 1.0.7 and pig is 0.9.0. We get the following error when we try to load the data from the cassandra keyspace and columnfamily. [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate

ops center - join cluster operation

2012-02-16 Thread Radim Kolar
Is there way in ops center gui to make node with agent to join specified cluster? I am thinking about: click on node. select join cluster - type IP address of existing cluster member and data will be replicated into new node.

Re: problem about cassandra columns

2012-02-16 Thread Jonathan Ellis
[moving to users list] See http://wiki.apache.org/cassandra/CassandraLimitations 2012/2/15 晓峰 : > I want to insert more and more columns into the super column,is there any > problem? > > > > > 晓峰 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for profess

1.0.2 - nodetool ring and info reports wrong load after compact

2012-02-16 Thread Bill Au
I am running 1.0.2 with the default tiered compaction. After running a "nodetool compact", I noticed that on about half of the machines in my cluster, both "nodetool ring" and "nodetool info" report that the load is actually higher than before when I expect it to be lower. It is almost twice as m

Cassandra Europe

2012-02-16 Thread Konrad Kennedy
Hi, We're organising Europe's first Apache Cassandra conference - Cassandra Europe. It takes place on March 28th in London. We'll be having some of the top PMCs, committers and users attending. I'm trying to get people's input and was wondering if you had any suggestions for what you'd like to see

Re: Replication factor per column family

2012-02-16 Thread R. Verlangen
Hmm ok. This means if I want to have a CF with RF = 3 and another CF with RF = 1 (e.g. some debug logging) I will have to create 2 keyspaces? 2012/2/16 aaron morton > Multiple CF mutations for a row are treated atomically in the commit log, > and they are sent together to the replicas. Replicati

Re: Replication factor per column family

2012-02-16 Thread aaron morton
Multiple CF mutations for a row are treated atomically in the commit log, and they are sent together to the replicas. Replication occurs at the row level, not the row+cf level. If each CF had it's own RF, odd things may happen. Like sending a batch mutation for one row and two CF's that fails

Re: CQL query issue when fetching data from Cassandra

2012-02-16 Thread aaron morton
> 1). The "IN" operator is not working > SELECT * FROM TestCF WHERE status IN ('Failed', 'Success') IN is only valid for filtering on the row KEY http://www.datastax.com/docs/1.0/references/cql/SELECT e.g. it generates this error using cqlsh cqlsh:dev> SELECT * FROM TestCF WHERE status IN ('Fa

Re: Anybody using Cassandra/DataStax Distribution with Java Service Wrapper?

2012-02-16 Thread Oleg Anastasyev
> Is anyone using it with Cassandra? Yes, we use it with cassandra 0.6. Had to implement service wrapper tanuki style for cassandra by myself to make it shudown correctly.

Replication factor per column family

2012-02-16 Thread R. Verlangen
Hi there, As the subject states: "Is it possible to set a replication factor per column family?" Could not find anything of recent releases. I'm running Cassandra 1.0.7 and I think it should be possible on a per CF basis instead of the whole keyspace. With kind regards, Robin

Re: problem with sliceQuery with composite column

2012-02-16 Thread aaron morton
> but it still seem a bit strange coming from years and years of sql. Think of the composite column name as a composite key. You want to write an efficient query that uses a seek and partial scan of the index b-tree, rather than a full scan. > split this CF into two? Or change the order of t

Re: Wide row column slicing - row size shard limit

2012-02-16 Thread aaron morton
> Based on this blog of Basic Time Series with Cassandra data modeling, > http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/ I've not read that one but it sounds right. Mat Dennis knows his stuff http://www.slideshare.net/mattdennis/cassandra-nyc-2011-data-modeling > There

Re: Wide row column slicing - row size shard limit

2012-02-16 Thread R. Verlangen
Things you should know: - Thrift has a limit on the amount of data it will accept / send, you can configure this in Cassandra: 64MB's should still work find (1) - Rows should not become huge: this will make "perfect" load balancing impossible in your cluster - A single row should fit on a disk - T

Re: Key cache hit rate issue

2012-02-16 Thread aaron morton
> Its in the order of 261 to 8000 and the ratio is 0.00. But i guess 8000 is > bit high. Is there a way to fix/improve it? Sorry I don't understand what you mean. But if the ratio is 0.0 all is good. Could you include the full output from cfstats for the CF you are looking at ? Cheers

Re: CQL query issue when fetching data from Cassandra

2012-02-16 Thread R. Verlangen
I'm not sure about your first 2 questions. The third might be an exception: check your Cassandra logs. About the "like"-thing: there's no such query possibiliy in Cassandra / CQL. You can take a look at Hadoop / Hive to tackle those problems. 2012/2/16 Roshan > Hi > > I am using Cassandra 1.0.

Re: problem with sliceQuery with composite column

2012-02-16 Thread Deno Vichas
thanks for the reply. i understand why, but it still seem a bit strange coming from years and years of sql. so if i want to avoid the extra load from fetching way more than i needed would i be best off split this CF into two? thanks, deno On 2/13/2012 10:41 AM, aaron morton wrote: My unde

Re: Analysis of performance benchmarking - unexpected results

2012-02-16 Thread Peter Schuller
> 1. Changing consistency level configurations from Write.ALL + Read.ONE > to Write.ALL + Read.ALL increases write latency (expected) and > decrease read latency (unexpected). When you tested at CL.ONE, was read repair turned on? The two ways I can think of right now, by which read latency might