Re: Why data is not even distributed.

2012-10-04 Thread Tom
Hi Andrey,

while the data values you generated might be following a true random
distribution, your row key, UUID, is not (because it is created on the same
machines by the same software with a certain window of time)

For example, if you were using the UUID class in Java, these would be
composed from several components (related to dimensions such as time and
version), so you can not expect a random distribution over the whole space.


Cheers
Tom



On Wed, Oct 3, 2012 at 5:39 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 Hello, everybody!

 I'm observing very strange behavior. I have 3 node cluster with
 ByteOrderPartitioner. (I run 1.1.5)
 I created a key space with replication factor of 1.
 Then I created one column family and populated it with random data.
 I use UUID as a row key, and Integer as a column name.
 Row keys were generated as

 UUID uuid = UUID.randomUUID();

 I populated about 10 rows with 100 column each.

 I would expect equal load on each node, but the result is totally
 different. This is what nodetool gives me:

 Address DC  RackStatus State   Load
 Effective-Ownership Token


 Token(bytes[56713727820156410577229101238628035242])
 127.0.0.1   datacenter1 rack1   Up Normal  27.61 MB
 33.33%  Token(bytes[00])
 127.0.0.3   datacenter1 rack1   Up Normal  206.47 KB
 33.33%
 Token(bytes[0113427455640312821154458202477256070485])
 127.0.0.2   datacenter1 rack1   Up Normal  13.86 MB
 33.33%
 Token(bytes[56713727820156410577229101238628035242])


 one node (127.0.0.3) is almost empty.
 Any ideas what is wrong?


 Thank you,
   Andrey



Re: Simple data model for 1 simple range query?

2012-10-04 Thread T Akhayo
Hi Dean,

Thank you for your reply, i appreciate the help. I managed to get my data
model in cassandra and already inserted data and ran the query, but don't
yet have enough data to do correct benchmarking. I'm now trying to load a
huge amount of data using SSTableSimpleUnsortedWriter cause doing it with
insert queries takes quite a while, but is is quite challenging to get this
one working.

Kind regards,

2012/10/3 Hiller, Dean dean.hil...@nrel.gov

 Is timeframe/date your composite key? Where timeframe is the first time of
 a partition of time (ie. If you partition by month, it is the very first
 time of that month).  If so, then, yes, it will be very fast.  The smaller
 your partitions are, the smaller your indexes are as well(ie. B-trees which
 you can grow pretty big).  Realize you always have to have timeframe with
 equals(=) NOT , ,=,= but  the other columns you can use the other
 operators.

 Also, if you ever find a need to partition the same data twice, you can
 always look into PlayOrm with multi-partitioning and it's Scalable SQL
 which can do joins when necessary.

 Later,
 Dean

 From: T Akhayo t.akh...@gmail.commailto:t.akh...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Wednesday, October 3, 2012 1:00 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Simple data model for 1 simple range query?

 Good evening,

 I have a quite simple data model. Pseudo CQL code:

 create table bars(
 timeframe int,
 date Date,
 info1 double,
 info2 double,
 ..
 primary key( timeframe, date )
 )

 My most important query is (which might be the only one actually):
 select * from bars where timeframe=X and dateY and date Z

 I came to this model because i did read in the past (when 0.7 came out)
 was very fast at range queries (using a slice method) when the fields were
 keys. And now with cql all the nasty details are hidden ( i have not tested
 this yet ;-) )

 Is it correct that the above model is a good and fast solution for my
 query?

 Kind regards.




RE: Remove node from cluster and have it run as a single node cluster by itself

2012-10-04 Thread Xu, Zaili
Thanks, Aaron and Tim

Yes. I am trying to decommision a seed node. Looks like the only way to prevent 
a seed node automatically join the previous cluster on backup is to change its 
cluster_name

Zaili Xu

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Monday, October 01, 2012 5:34 PM
To: user@cassandra.apache.org
Subject: Re: Remove node from cluster and have it run as a single node cluster 
by itself

The other nodes may be trying to connect to it - it may be listed as a
seed node on the other machines?
The other nodes will be looking for it.

Change the Cluster Name in the yaml file.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 30/09/2012, at 12:04 AM, Tim Wintle 
timwin...@gmail.commailto:timwin...@gmail.com wrote:


On Fri, 2012-09-28 at 18:53 +, Xu, Zaili wrote:

Hi,

I have an existing Cassandra Cluster. I removed a node from the cluster. Then I 
decommissioned the removed node, stopped it,  updated its config so that it 
only has itself as the seed and in the cassandra-topology.properties file, even 
deleted the data, commitlog, and saved_caches. But as soon as I start it backup 
it is able to join back to the cluster.  How does this node know the 
information of the existing cluster and was able to join it ?

The other nodes may be trying to connect to it - it may be listed as a
seed node on the other machines?

Tim



**
IMPORTANT: Any information contained in this communication is intended for the 
use of the named individual or entity. All information contained in this 
communication is not intended or construed as an offer, solicitation, or a 
recommendation to purchase any security. Advice, suggestions or views presented 
in this communication are not necessarily those of Pershing LLC nor do they 
warrant a complete or accurate statement. 

If you are not an intended party to this communication, please notify the 
sender and delete/destroy any and all copies of this communication. Unintended 
recipients shall not review, reproduce, disseminate nor disclose any 
information contained in this communication. Pershing LLC reserves the right to 
monitor and retain all incoming and outgoing communications as permitted by 
applicable law.

Email communications may contain viruses or other defects. Pershing LLC does 
not accept liability nor does it warrant that email communications are virus or 
defect free.
**

[RELEASE] Apache Cassandra 1.0.12 released

2012-10-04 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.0.12.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is maintenance/bug fix release[1]. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to encounter
any problem.

Have fun!

[1]: http://goo.gl/XtyBQ (CHANGES.txt)
[2]: http://goo.gl/lzhEv (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Importing sstable with Composite key? (without is working)

2012-10-04 Thread T Akhayo
Good evening,

Today i managed to get a small cluster running of 2 computers. I also
managed to get my data model working and are able to import sstables
created with SSTableSimpleUnsortedWriter with sstableloader.

The only problem is when i try to use the composite key in my datamodel,
after i import my sstables and issue a simple select the cassandra crashes:
===
ava.lang.IllegalArgumentException
at java.nio.Buffer.limit(Unknown Source)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:51)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:76)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31)
at java.util.TreeMap.put(Unknown Source)
at
org.apache.cassandra.db.TreeMapBackedSortedColumns.addColumn(TreeMapBackedSortedColumns.java:95)
at
org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:109)
...
at
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:108)
at
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:121)
at
org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1237)
at
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542)
at
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530)
at
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
===

Now i can get everything running again by removing the data directories on
both nodes.

I suspect cassandra crashes because the sstable that is being imported has
a different schema when it comes to composite key (without composite key
import works fine).

My schema with composite key is:
===
create table bars2(
id uuid,
timeframe int,
datum timestamp,
open double,
high double,
low double,
close double,
bartype int,
PRIMARY KEY (timeframe, datum)
);
===
create column family bars2
  with column_type = 'Standard'
  and comparator =
'CompositeType(org.apache.cassandra.db.marshal.DateType,org.apache.cassandra.db.marshal.UTF8Type)'
  and default_validation_class = 'UTF8Type'
  and key_validation_class = 'Int32Type'
  and read_repair_chance = 0.1
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
  and caching = 'KEYS_ONLY'
  and compression_options = {'sstable_compression' :
'org.apache.cassandra.io.compress.SnappyCompressor'};
===

My code to create the sstable is (only the interested parts):
===
sstWriter = new SSTableSimpleUnsortedWriter(new
File(c:\\cassandra\\newtables\\), new RandomPartitioner(), readtick,
bars2, UTF8Type.instance, null, 64);


CompositeType.Builder cb=new
CompositeType.Builder(CompositeType.getInstance(compositeList));
cb.add( bytes(curMinuteBar.getDatum().getTime()));
cb.add(bytes(1));
sstWriter.newRow(cb.build());

(... add columns...)
===

I highly suspect that the problem can be at 2 locations:
- In the SSTableSimpleUnsortedWriter i use a UTF8Type.instance as
comparator, i'm not sure if that is right with a composite key?
- When calling sstWriter.newRow i use CompositeType.Builder to build
the composite key, i'm not sure if i'm doing this the right way? (i did try
different combinations)

Does somebody know how i can continue on my journey?


Re: unsubscribe

2012-10-04 Thread mike.li
Mike
Mike Li
Lead Database Engineer
Thomson Reuters
Phone: 314-468-8128
mike...@thomsonreuters.com
www.thomsonreuters.com



This email was sent to you by Thomson Reuters, the global news and information 
company. Any views expressed in this message are those of the individual 
sender, except where the sender specifically states them to be the views of 
Thomson Reuters.

Re: Why data is not even distributed.

2012-10-04 Thread Andrey Ilinykh
It was my first thought.
Then I md5 uuid and used the digest as a key:

MessageDigest md = MessageDigest.getInstance(MD5);

//in the loop
UUID uuid = UUID.randomUUID();
byte[] bytes = md.digest(asByteArray(uuid));

the result is exactly the same, first node takes 66%, second 33% and
third one is empty. for some reason rows which should be placed on
third node moved to first one.

Address DC  RackStatus State   Load
Effective-Ownership Token


Token(bytes[56713727820156410577229101238628035242])
127.0.0.1   datacenter1 rack1   Up Normal  7.68 MB
33.33%  Token(bytes[00])
127.0.0.3   datacenter1 rack1   Up Normal  79.17 KB
33.33%
Token(bytes[0113427455640312821154458202477256070485])
127.0.0.2   datacenter1 rack1   Up Normal  3.81 MB
33.33%
Token(bytes[56713727820156410577229101238628035242])



On Thu, Oct 4, 2012 at 12:33 AM, Tom fivemile...@gmail.com wrote:
 Hi Andrey,

 while the data values you generated might be following a true random
 distribution, your row key, UUID, is not (because it is created on the same
 machines by the same software with a certain window of time)

 For example, if you were using the UUID class in Java, these would be
 composed from several components (related to dimensions such as time and
 version), so you can not expect a random distribution over the whole space.


 Cheers
 Tom




 On Wed, Oct 3, 2012 at 5:39 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 Hello, everybody!

 I'm observing very strange behavior. I have 3 node cluster with
 ByteOrderPartitioner. (I run 1.1.5)
 I created a key space with replication factor of 1.
 Then I created one column family and populated it with random data.
 I use UUID as a row key, and Integer as a column name.
 Row keys were generated as

 UUID uuid = UUID.randomUUID();

 I populated about 10 rows with 100 column each.

 I would expect equal load on each node, but the result is totally
 different. This is what nodetool gives me:

 Address DC  RackStatus State   Load
 Effective-Ownership Token


 Token(bytes[56713727820156410577229101238628035242])
 127.0.0.1   datacenter1 rack1   Up Normal  27.61 MB
 33.33%  Token(bytes[00])
 127.0.0.3   datacenter1 rack1   Up Normal  206.47 KB
 33.33%
 Token(bytes[0113427455640312821154458202477256070485])
 127.0.0.2   datacenter1 rack1   Up Normal  13.86 MB
 33.33%
 Token(bytes[56713727820156410577229101238628035242])


 one node (127.0.0.3) is almost empty.
 Any ideas what is wrong?


 Thank you,
   Andrey




schema change management tools

2012-10-04 Thread John Sanda
I have been looking to see if there are any schema change management tools
for Cassandra. I have not come across any so far. I figured I would check
to see if anyone can point me to something before I start trying to
implement something on my own. I have used liquibase (
http://www.liquibase.org) for relational databases. Earlier today I tried
using it with the cassandra-jdbc driver, but ran into some exceptions due
to the SQL generated. I am not looking specifically for something
CQL-based. Something that uses the Thrift API via CLI scripts for example
would work as well.

Thanks

- John


Re: schema change management tools

2012-10-04 Thread Jonathan Haddad
Not that I know of.  I've always been really strict about dumping my
schemas (to start) and keeping my changes in migration files.  I don't do a
ton of schema changes so I haven't had a need to really automate it.

Even with MySQL I never bothered.

Jon

On Thu, Oct 4, 2012 at 6:27 PM, John Sanda john.sa...@gmail.com wrote:

 I have been looking to see if there are any schema change management tools
 for Cassandra. I have not come across any so far. I figured I would check
 to see if anyone can point me to something before I start trying to
 implement something on my own. I have used liquibase (
 http://www.liquibase.org) for relational databases. Earlier today I tried
 using it with the cassandra-jdbc driver, but ran into some exceptions due
 to the SQL generated. I am not looking specifically for something
 CQL-based. Something that uses the Thrift API via CLI scripts for example
 would work as well.

 Thanks

 - John




-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: schema change management tools

2012-10-04 Thread John Sanda
For the project I work on and for previous projects as well that support
multiple upgrade paths, this kind of tooling is a necessity. And I would
prefer to avoid duplicating effort if there is already something out there.
If not though, I will be sure to post back to the list with whatever I wind
up doing.

On Thu, Oct 4, 2012 at 9:34 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Not that I know of.  I've always been really strict about dumping my
 schemas (to start) and keeping my changes in migration files.  I don't do a
 ton of schema changes so I haven't had a need to really automate it.

 Even with MySQL I never bothered.

 Jon


 On Thu, Oct 4, 2012 at 6:27 PM, John Sanda john.sa...@gmail.com wrote:

 I have been looking to see if there are any schema change management
 tools for Cassandra. I have not come across any so far. I figured I would
 check to see if anyone can point me to something before I start trying to
 implement something on my own. I have used liquibase (
 http://www.liquibase.org) for relational databases. Earlier today I
 tried using it with the cassandra-jdbc driver, but ran into some exceptions
 due to the SQL generated. I am not looking specifically for something
 CQL-based. Something that uses the Thrift API via CLI scripts for example
 would work as well.

 Thanks

 - John




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade




Re: schema change management tools

2012-10-04 Thread Jonathan Haddad
Awesome - keep me posted.

Jon

On Thu, Oct 4, 2012 at 6:42 PM, John Sanda john.sa...@gmail.com wrote:

 For the project I work on and for previous projects as well that support
 multiple upgrade paths, this kind of tooling is a necessity. And I would
 prefer to avoid duplicating effort if there is already something out there.
 If not though, I will be sure to post back to the list with whatever I wind
 up doing.


 On Thu, Oct 4, 2012 at 9:34 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 Not that I know of.  I've always been really strict about dumping my
 schemas (to start) and keeping my changes in migration files.  I don't do a
 ton of schema changes so I haven't had a need to really automate it.

 Even with MySQL I never bothered.

 Jon


 On Thu, Oct 4, 2012 at 6:27 PM, John Sanda john.sa...@gmail.com wrote:

 I have been looking to see if there are any schema change management
 tools for Cassandra. I have not come across any so far. I figured I would
 check to see if anyone can point me to something before I start trying to
 implement something on my own. I have used liquibase (
 http://www.liquibase.org) for relational databases. Earlier today I
 tried using it with the cassandra-jdbc driver, but ran into some exceptions
 due to the SQL generated. I am not looking specifically for something
 CQL-based. Something that uses the Thrift API via CLI scripts for example
 would work as well.

 Thanks

 - John




 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade





-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade