Re: Why data is not even distributed.
Hi Andrey, while the data values you generated might be following a true random distribution, your row key, UUID, is not (because it is created on the same machines by the same software with a certain window of time) For example, if you were using the UUID class in Java, these would be composed from several components (related to dimensions such as time and version), so you can not expect a random distribution over the whole space. Cheers Tom On Wed, Oct 3, 2012 at 5:39 PM, Andrey Ilinykh ailin...@gmail.com wrote: Hello, everybody! I'm observing very strange behavior. I have 3 node cluster with ByteOrderPartitioner. (I run 1.1.5) I created a key space with replication factor of 1. Then I created one column family and populated it with random data. I use UUID as a row key, and Integer as a column name. Row keys were generated as UUID uuid = UUID.randomUUID(); I populated about 10 rows with 100 column each. I would expect equal load on each node, but the result is totally different. This is what nodetool gives me: Address DC RackStatus State Load Effective-Ownership Token Token(bytes[56713727820156410577229101238628035242]) 127.0.0.1 datacenter1 rack1 Up Normal 27.61 MB 33.33% Token(bytes[00]) 127.0.0.3 datacenter1 rack1 Up Normal 206.47 KB 33.33% Token(bytes[0113427455640312821154458202477256070485]) 127.0.0.2 datacenter1 rack1 Up Normal 13.86 MB 33.33% Token(bytes[56713727820156410577229101238628035242]) one node (127.0.0.3) is almost empty. Any ideas what is wrong? Thank you, Andrey
Re: Simple data model for 1 simple range query?
Hi Dean, Thank you for your reply, i appreciate the help. I managed to get my data model in cassandra and already inserted data and ran the query, but don't yet have enough data to do correct benchmarking. I'm now trying to load a huge amount of data using SSTableSimpleUnsortedWriter cause doing it with insert queries takes quite a while, but is is quite challenging to get this one working. Kind regards, 2012/10/3 Hiller, Dean dean.hil...@nrel.gov Is timeframe/date your composite key? Where timeframe is the first time of a partition of time (ie. If you partition by month, it is the very first time of that month). If so, then, yes, it will be very fast. The smaller your partitions are, the smaller your indexes are as well(ie. B-trees which you can grow pretty big). Realize you always have to have timeframe with equals(=) NOT , ,=,= but the other columns you can use the other operators. Also, if you ever find a need to partition the same data twice, you can always look into PlayOrm with multi-partitioning and it's Scalable SQL which can do joins when necessary. Later, Dean From: T Akhayo t.akh...@gmail.commailto:t.akh...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, October 3, 2012 1:00 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Simple data model for 1 simple range query? Good evening, I have a quite simple data model. Pseudo CQL code: create table bars( timeframe int, date Date, info1 double, info2 double, .. primary key( timeframe, date ) ) My most important query is (which might be the only one actually): select * from bars where timeframe=X and dateY and date Z I came to this model because i did read in the past (when 0.7 came out) was very fast at range queries (using a slice method) when the fields were keys. And now with cql all the nasty details are hidden ( i have not tested this yet ;-) ) Is it correct that the above model is a good and fast solution for my query? Kind regards.
RE: Remove node from cluster and have it run as a single node cluster by itself
Thanks, Aaron and Tim Yes. I am trying to decommision a seed node. Looks like the only way to prevent a seed node automatically join the previous cluster on backup is to change its cluster_name Zaili Xu From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, October 01, 2012 5:34 PM To: user@cassandra.apache.org Subject: Re: Remove node from cluster and have it run as a single node cluster by itself The other nodes may be trying to connect to it - it may be listed as a seed node on the other machines? The other nodes will be looking for it. Change the Cluster Name in the yaml file. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 30/09/2012, at 12:04 AM, Tim Wintle timwin...@gmail.commailto:timwin...@gmail.com wrote: On Fri, 2012-09-28 at 18:53 +, Xu, Zaili wrote: Hi, I have an existing Cassandra Cluster. I removed a node from the cluster. Then I decommissioned the removed node, stopped it, updated its config so that it only has itself as the seed and in the cassandra-topology.properties file, even deleted the data, commitlog, and saved_caches. But as soon as I start it backup it is able to join back to the cluster. How does this node know the information of the existing cluster and was able to join it ? The other nodes may be trying to connect to it - it may be listed as a seed node on the other machines? Tim ** IMPORTANT: Any information contained in this communication is intended for the use of the named individual or entity. All information contained in this communication is not intended or construed as an offer, solicitation, or a recommendation to purchase any security. Advice, suggestions or views presented in this communication are not necessarily those of Pershing LLC nor do they warrant a complete or accurate statement. If you are not an intended party to this communication, please notify the sender and delete/destroy any and all copies of this communication. Unintended recipients shall not review, reproduce, disseminate nor disclose any information contained in this communication. Pershing LLC reserves the right to monitor and retain all incoming and outgoing communications as permitted by applicable law. Email communications may contain viruses or other defects. Pershing LLC does not accept liability nor does it warrant that email communications are virus or defect free. **
[RELEASE] Apache Cassandra 1.0.12 released
The Cassandra team is pleased to announce the release of Apache Cassandra version 1.0.12. Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. You can read more here: http://cassandra.apache.org/ Downloads of source and binary distributions are listed in our download section: http://cassandra.apache.org/download/ This version is maintenance/bug fix release[1]. As always, please pay attention to the release notes[2] and Let us know[3] if you were to encounter any problem. Have fun! [1]: http://goo.gl/XtyBQ (CHANGES.txt) [2]: http://goo.gl/lzhEv (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA
Importing sstable with Composite key? (without is working)
Good evening, Today i managed to get a small cluster running of 2 computers. I also managed to get my data model working and are able to import sstables created with SSTableSimpleUnsortedWriter with sstableloader. The only problem is when i try to use the composite key in my datamodel, after i import my sstables and issue a simple select the cassandra crashes: === ava.lang.IllegalArgumentException at java.nio.Buffer.limit(Unknown Source) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:51) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:76) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31) at java.util.TreeMap.put(Unknown Source) at org.apache.cassandra.db.TreeMapBackedSortedColumns.addColumn(TreeMapBackedSortedColumns.java:95) at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:109) ... at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:108) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:121) at org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1237) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) === Now i can get everything running again by removing the data directories on both nodes. I suspect cassandra crashes because the sstable that is being imported has a different schema when it comes to composite key (without composite key import works fine). My schema with composite key is: === create table bars2( id uuid, timeframe int, datum timestamp, open double, high double, low double, close double, bartype int, PRIMARY KEY (timeframe, datum) ); === create column family bars2 with column_type = 'Standard' and comparator = 'CompositeType(org.apache.cassandra.db.marshal.DateType,org.apache.cassandra.db.marshal.UTF8Type)' and default_validation_class = 'UTF8Type' and key_validation_class = 'Int32Type' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 864000 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' and caching = 'KEYS_ONLY' and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'}; === My code to create the sstable is (only the interested parts): === sstWriter = new SSTableSimpleUnsortedWriter(new File(c:\\cassandra\\newtables\\), new RandomPartitioner(), readtick, bars2, UTF8Type.instance, null, 64); CompositeType.Builder cb=new CompositeType.Builder(CompositeType.getInstance(compositeList)); cb.add( bytes(curMinuteBar.getDatum().getTime())); cb.add(bytes(1)); sstWriter.newRow(cb.build()); (... add columns...) === I highly suspect that the problem can be at 2 locations: - In the SSTableSimpleUnsortedWriter i use a UTF8Type.instance as comparator, i'm not sure if that is right with a composite key? - When calling sstWriter.newRow i use CompositeType.Builder to build the composite key, i'm not sure if i'm doing this the right way? (i did try different combinations) Does somebody know how i can continue on my journey?
Re: unsubscribe
Mike Mike Li Lead Database Engineer Thomson Reuters Phone: 314-468-8128 mike...@thomsonreuters.com www.thomsonreuters.com This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
Re: Why data is not even distributed.
It was my first thought. Then I md5 uuid and used the digest as a key: MessageDigest md = MessageDigest.getInstance(MD5); //in the loop UUID uuid = UUID.randomUUID(); byte[] bytes = md.digest(asByteArray(uuid)); the result is exactly the same, first node takes 66%, second 33% and third one is empty. for some reason rows which should be placed on third node moved to first one. Address DC RackStatus State Load Effective-Ownership Token Token(bytes[56713727820156410577229101238628035242]) 127.0.0.1 datacenter1 rack1 Up Normal 7.68 MB 33.33% Token(bytes[00]) 127.0.0.3 datacenter1 rack1 Up Normal 79.17 KB 33.33% Token(bytes[0113427455640312821154458202477256070485]) 127.0.0.2 datacenter1 rack1 Up Normal 3.81 MB 33.33% Token(bytes[56713727820156410577229101238628035242]) On Thu, Oct 4, 2012 at 12:33 AM, Tom fivemile...@gmail.com wrote: Hi Andrey, while the data values you generated might be following a true random distribution, your row key, UUID, is not (because it is created on the same machines by the same software with a certain window of time) For example, if you were using the UUID class in Java, these would be composed from several components (related to dimensions such as time and version), so you can not expect a random distribution over the whole space. Cheers Tom On Wed, Oct 3, 2012 at 5:39 PM, Andrey Ilinykh ailin...@gmail.com wrote: Hello, everybody! I'm observing very strange behavior. I have 3 node cluster with ByteOrderPartitioner. (I run 1.1.5) I created a key space with replication factor of 1. Then I created one column family and populated it with random data. I use UUID as a row key, and Integer as a column name. Row keys were generated as UUID uuid = UUID.randomUUID(); I populated about 10 rows with 100 column each. I would expect equal load on each node, but the result is totally different. This is what nodetool gives me: Address DC RackStatus State Load Effective-Ownership Token Token(bytes[56713727820156410577229101238628035242]) 127.0.0.1 datacenter1 rack1 Up Normal 27.61 MB 33.33% Token(bytes[00]) 127.0.0.3 datacenter1 rack1 Up Normal 206.47 KB 33.33% Token(bytes[0113427455640312821154458202477256070485]) 127.0.0.2 datacenter1 rack1 Up Normal 13.86 MB 33.33% Token(bytes[56713727820156410577229101238628035242]) one node (127.0.0.3) is almost empty. Any ideas what is wrong? Thank you, Andrey
schema change management tools
I have been looking to see if there are any schema change management tools for Cassandra. I have not come across any so far. I figured I would check to see if anyone can point me to something before I start trying to implement something on my own. I have used liquibase ( http://www.liquibase.org) for relational databases. Earlier today I tried using it with the cassandra-jdbc driver, but ran into some exceptions due to the SQL generated. I am not looking specifically for something CQL-based. Something that uses the Thrift API via CLI scripts for example would work as well. Thanks - John
Re: schema change management tools
Not that I know of. I've always been really strict about dumping my schemas (to start) and keeping my changes in migration files. I don't do a ton of schema changes so I haven't had a need to really automate it. Even with MySQL I never bothered. Jon On Thu, Oct 4, 2012 at 6:27 PM, John Sanda john.sa...@gmail.com wrote: I have been looking to see if there are any schema change management tools for Cassandra. I have not come across any so far. I figured I would check to see if anyone can point me to something before I start trying to implement something on my own. I have used liquibase ( http://www.liquibase.org) for relational databases. Earlier today I tried using it with the cassandra-jdbc driver, but ran into some exceptions due to the SQL generated. I am not looking specifically for something CQL-based. Something that uses the Thrift API via CLI scripts for example would work as well. Thanks - John -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: schema change management tools
For the project I work on and for previous projects as well that support multiple upgrade paths, this kind of tooling is a necessity. And I would prefer to avoid duplicating effort if there is already something out there. If not though, I will be sure to post back to the list with whatever I wind up doing. On Thu, Oct 4, 2012 at 9:34 PM, Jonathan Haddad j...@jonhaddad.com wrote: Not that I know of. I've always been really strict about dumping my schemas (to start) and keeping my changes in migration files. I don't do a ton of schema changes so I haven't had a need to really automate it. Even with MySQL I never bothered. Jon On Thu, Oct 4, 2012 at 6:27 PM, John Sanda john.sa...@gmail.com wrote: I have been looking to see if there are any schema change management tools for Cassandra. I have not come across any so far. I figured I would check to see if anyone can point me to something before I start trying to implement something on my own. I have used liquibase ( http://www.liquibase.org) for relational databases. Earlier today I tried using it with the cassandra-jdbc driver, but ran into some exceptions due to the SQL generated. I am not looking specifically for something CQL-based. Something that uses the Thrift API via CLI scripts for example would work as well. Thanks - John -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: schema change management tools
Awesome - keep me posted. Jon On Thu, Oct 4, 2012 at 6:42 PM, John Sanda john.sa...@gmail.com wrote: For the project I work on and for previous projects as well that support multiple upgrade paths, this kind of tooling is a necessity. And I would prefer to avoid duplicating effort if there is already something out there. If not though, I will be sure to post back to the list with whatever I wind up doing. On Thu, Oct 4, 2012 at 9:34 PM, Jonathan Haddad j...@jonhaddad.com wrote: Not that I know of. I've always been really strict about dumping my schemas (to start) and keeping my changes in migration files. I don't do a ton of schema changes so I haven't had a need to really automate it. Even with MySQL I never bothered. Jon On Thu, Oct 4, 2012 at 6:27 PM, John Sanda john.sa...@gmail.com wrote: I have been looking to see if there are any schema change management tools for Cassandra. I have not come across any so far. I figured I would check to see if anyone can point me to something before I start trying to implement something on my own. I have used liquibase ( http://www.liquibase.org) for relational databases. Earlier today I tried using it with the cassandra-jdbc driver, but ran into some exceptions due to the SQL generated. I am not looking specifically for something CQL-based. Something that uses the Thrift API via CLI scripts for example would work as well. Thanks - John -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade