Re: 答复: two problems about opscenter 3.2

2013-08-01 Thread Alain RODRIGUEZ
You have to log in and then you can create topic only below some sections
(opscenter, feedbacks, ...) if I remember well.

Alain


2013/8/1 yue.zhang yue.zh...@chinacache.com

 thanks Alain

 ** **

 I don’t know why not permited to create topic on datastax forum*.*

 ** **

 ** **

 *发件人:* Alain RODRIGUEZ [mailto:arodr...@gmail.com]
 *发送时间:* 2013年7月31日 18:11
 *收件人:* user@cassandra.apache.org
 *主题:* Re: two problems about opscenter 3.2

 ** **

 Here is the pointer to the topic on the DS support forum.

 ** **


 http://www.datastax.com/support-forums/topic/some-32-bugs-reported-in-the-c-user-ml
 

 ** **

 2013/7/31 aaron morton aa...@thelastpickle.com

 You'll get better Ops Centre support on the DS site
 http://www.datastax.com/support-forums/

 cheers

 -
 Aaron Morton
 Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com


 On 30/07/2013, at 9:23 PM, Alain RODRIGUEZ arodr...@gmail.com wrote:

  I also see Waiting for agent information... In the top of the
 dashboard page, but again nothing on the logs.
 
 
  2013/7/30 Alain RODRIGUEZ arodr...@gmail.com
  I also have the following message in the dashboard :
 
  Error loading events: Cannot call method 'slice' of null
 
  With events and alerts not showing up. No error in the logs (opscenterd
 and agents)
 
 
  2013/7/30 yue.zhang yue.zh...@chinacache.com
 
  problem-1:
 
  -
 
  when I edit any “schema- kespacke- cf”,then report “Error saving
 column family: required argument is not a float”
 
 
 
  problem-2:
 
  -
 
  from OpsCenter 3.2 release note(
 http://www.datastax.com/docs/opscenter/release_notes#opscenter-3-2), it
 tell me “Added support for CQL3 column families (limited Data Explorer
 support).”,but it not works.
 
 
 
  The message below:
 
 
 
  Viewing data in column families created with CQL3 is not currently
 supported. We recommend using one of the DataStax drivers or cqlsh as an
 alternative.
 
 
 
 
 
  thx
 
  -heipark
 
 
 

 ** **



Re: CQL and undefined columns

2013-08-01 Thread Jon Ribbens
On Wed, Jul 31, 2013 at 03:10:54PM -0700, Jonathan Haddad wrote:
It's advised you do not use compact storage, as it's primarily for
backwards compatibility.

Yes indeed, I understand what it does and why now, but only because
I was pointed to the thrift-to-cql document. The CQL documentation
itself doesn't make it at all clear, I was originally under the
impression that the way 'COMPACT STORAGE' works was the way CQL
works by default, because that's the natural assumption until it's
explained why it doesn't work that way.

I was pointing out that either the thrift-to-cql document must be
wrong, or the CQL document must be wrong, because they contradict
each other.


Re: CQL and undefined columns

2013-08-01 Thread Alain RODRIGUEZ
I am glad this document helped you.

I like to point to this 'thrift-to-cql' document, since it was really
useful to me when I found it, even if I had to read it at least 3 times
entirely and still need to refer to some piece of it sometimes because of
the complexity of what is explained in it.

@Sylvain, you did a real good job with this blog post. Thanks a lot, be
sure I will continue sharing it.

Alain


2013/8/1 Jon Ribbens jon-cassan...@unequivocal.co.uk

 On Wed, Jul 31, 2013 at 03:10:54PM -0700, Jonathan Haddad wrote:
 It's advised you do not use compact storage, as it's primarily for
 backwards compatibility.

 Yes indeed, I understand what it does and why now, but only because
 I was pointed to the thrift-to-cql document. The CQL documentation
 itself doesn't make it at all clear, I was originally under the
 impression that the way 'COMPACT STORAGE' works was the way CQL
 works by default, because that's the natural assumption until it's
 explained why it doesn't work that way.

 I was pointing out that either the thrift-to-cql document must be
 wrong, or the CQL document must be wrong, because they contradict
 each other.



Adding my first node to another one...

2013-08-01 Thread Morgan Segalis
Hi everyone,

I'm trying to wrap my head around Cassandra great ability to expand…

I have set up my first Cassandra node a while ago… it was working great, and 
data wasn't so important back then.
Since I had a great experience with Cassandra I decided to migrate step by step 
my MySQL data to Cassandra.

Now data start to be important, so I would like to create another node, and add 
it.
Since I had some issue with my DataCenter, I wanted to have a copy (of sensible 
data only) on another DataCenter.

Quite frankly I'm still a newbie on Cassandra and need your guys help.

First things first… 
Already up and Running Cassandra (Called A): 
- Do I need to change anything to the cassandra.yaml to make sure that 
another node can connect ? if yes, should I restart the node (because I would 
have to warn users about downtime) ?
- Since this node should be a seed, the seed list is already set to 
localhost, is that good enough ?

The new node I want to add (Called B): 
- I know that before starting this node, I should modify the seed list 
in cassandra.yaml… Is that the only thing I need to do ?

It is my first time doing this, so please be gentle ;-)

Thank you all,

Morgan.

How often to run `nodetool repair`

2013-08-01 Thread Carl Lerche
Hello,

I read in the docs that `nodetool repair` should be regularly run unless no
delete is ever performed. In my app, I never delete, but I heavily use the
ttl feature. Should repair still be run regularly? Also, does repair take
less time if it is run regularly? If not, is there a way to incrementally
run it? It seems that when I do run repair, it takes a long time and causes
high amounts CPU usage and iowait.

Thoughts?

Thanks,
Carl


Cassandra Counter Family

2013-08-01 Thread Kanwar Sangha
Hi - We are struggling to understand how the counter family maintains 
consistency in Cassandra.

Say Counter1 value is 1 and it is read by 2 clients at the same time who want 
to update the value. After both write, it will become 3 ?


Re: How often to run `nodetool repair`

2013-08-01 Thread rash aroskar
We observed the same behavior. During last repair the data distribution on
nodes was imbalanced as well resulting in one node bloating.
On Aug 1, 2013 12:36 PM, Carl Lerche m...@carllerche.com wrote:

 Hello,

 I read in the docs that `nodetool repair` should be regularly run unless
 no delete is ever performed. In my app, I never delete, but I heavily use
 the ttl feature. Should repair still be run regularly? Also, does repair
 take less time if it is run regularly? If not, is there a way to
 incrementally run it? It seems that when I do run repair, it takes a long
 time and causes high amounts CPU usage and iowait.

 Thoughts?

 Thanks,
 Carl



Re: How often to run `nodetool repair`

2013-08-01 Thread Arthur Zubarev
Hi Carl,

The ‘repair’ is for data reads. Compaction will take care of the expired data.

The fact a repair runs long makes me think the nodes receive unbalanced amounts 
of writes rather.

Regards,

Arthur

From: Carl Lerche 
Sent: Thursday, August 01, 2013 12:35 PM
To: user@cassandra.apache.org 
Subject: How often to run `nodetool repair`

Hello, 

I read in the docs that `nodetool repair` should be regularly run unless no 
delete is ever performed. In my app, I never delete, but I heavily use the ttl 
feature. Should repair still be run regularly? Also, does repair take less time 
if it is run regularly? If not, is there a way to incrementally run it? It 
seems that when I do run repair, it takes a long time and causes high amounts 
CPU usage and iowait.

Thoughts?

Thanks,
Carl

Re: Adding my first node to another one...

2013-08-01 Thread Arthur Zubarev

Hi Morgan,

The scaling out depends on several factors. The most intricate is perhaps 
calculating the tokens.


Also the Cassandra version is important.

At this point in time I suggest you read section Adding Capacity to an 
Existing Cluster at 
http://www.datastax.com/docs/1.0/operations/cluster_management

and come back here with questions and more details.

Regards,

Arthur

-Original Message- 
From: Morgan Segalis

Sent: Thursday, August 01, 2013 11:24 AM
To: user@cassandra.apache.org
Subject: Adding my first node to another one...

Hi everyone,

I'm trying to wrap my head around Cassandra great ability to expand…

I have set up my first Cassandra node a while ago… it was working great, and 
data wasn't so important back then.
Since I had a great experience with Cassandra I decided to migrate step by 
step my MySQL data to Cassandra.


Now data start to be important, so I would like to create another node, and 
add it.
Since I had some issue with my DataCenter, I wanted to have a copy (of 
sensible data only) on another DataCenter.


Quite frankly I'm still a newbie on Cassandra and need your guys help.

First things first…
Already up and Running Cassandra (Called A):
- Do I need to change anything to the cassandra.yaml to make sure that 
another node can connect ? if yes, should I restart the node (because I 
would have to warn users about downtime) ?
- Since this node should be a seed, the seed list is already set to 
localhost, is that good enough ?


The new node I want to add (Called B):
- I know that before starting this node, I should modify the seed list in 
cassandra.yaml… Is that the only thing I need to do ?


It is my first time doing this, so please be gentle ;-)

Thank you all,

Morgan. 



Re: How often to run `nodetool repair`

2013-08-01 Thread Carl Lerche
Arthur,

Yes, my use case for this Cassandra cluster is analytics. I am building a
google dapper (application tracing) like system. I collect application
traces and write them to Cassandra. Then, I have periodic rollup tasks that
read the data, do some summarization and write it back.

Thoughts on how to manage a write heavy cluster?

Thanks,
Carl


On Thu, Aug 1, 2013 at 11:28 AM, Arthur Zubarev arthur.zuba...@aol.comwrote:

   Hi Carl,

 The ‘repair’ is for data reads. Compaction will take care of the expired
 data.

 The fact a repair runs long makes me think the nodes receive unbalanced
 amounts of writes rather.

 Regards,

 Arthur

  *From:* Carl Lerche m...@carllerche.com
 *Sent:* Thursday, August 01, 2013 12:35 PM
 *To:* user@cassandra.apache.org
 *Subject:* How often to run `nodetool repair`

  Hello,

 I read in the docs that `nodetool repair` should be regularly run unless
 no delete is ever performed. In my app, I never delete, but I heavily use
 the ttl feature. Should repair still be run regularly? Also, does repair
 take less time if it is run regularly? If not, is there a way to
 incrementally run it? It seems that when I do run repair, it takes a long
 time and causes high amounts CPU usage and iowait.

 Thoughts?

 Thanks,
 Carl



deb packages (and older versions)

2013-08-01 Thread David McNelis
Hey folks,

Because 1.2.8 hasn't been pushed to the repo yet, I see that I can pick
up the package at http://people.apache.org/~eevans/ and install it
manually.  This is great.  I'm wondering though, is there a place where
I can pick up Debian packages for older releases?  I definitely prefer
the package install, but sometimes when we add new nodes to our ring the
only package available is the newest one, which doesn't match the rest
of our nodes.


Re: deb packages (and older versions)

2013-08-01 Thread Blair Zajac

On 08/01/2013 12:27 PM, David McNelis wrote:

Hey folks,

Because 1.2.8 hasn't been pushed to the repo yet, I see that I can pick
up the package at http://people.apache.org/~eevans/ and install it
manually.  This is great.  I'm wondering though, is there a place where
I can pick up Debian packages for older releases?  I definitely prefer
the package install, but sometimes when we add new nodes to our ring the
only package available is the newest one, which doesn't match the rest
of our nodes.


DataStax maintains a repo that has old versions:

http://debian.datastax.com/community/pool/

Blair




Re: deb packages (and older versions)

2013-08-01 Thread David McNelis
Thanks, fwiw, did I just blatantly miss some documentation saying those
existed there?


On Thu, Aug 1, 2013 at 3:32 PM, Blair Zajac bl...@orcaware.com wrote:

 On 08/01/2013 12:27 PM, David McNelis wrote:

 Hey folks,

 Because 1.2.8 hasn't been pushed to the repo yet, I see that I can pick
 up the package at 
 http://people.apache.org/~**eevans/http://people.apache.org/~eevans/and 
 install it
 manually.  This is great.  I'm wondering though, is there a place where
 I can pick up Debian packages for older releases?  I definitely prefer
 the package install, but sometimes when we add new nodes to our ring the
 only package available is the newest one, which doesn't match the rest
 of our nodes.


 DataStax maintains a repo that has old versions:

 http://debian.datastax.com/**community/pool/http://debian.datastax.com/community/pool/

 Blair





Secondary Indexes On Partitioned Time Series Data Question

2013-08-01 Thread Gareth Collins
Hello,

Say I have time series data for a table like this:

CREATE TABLE mytimeseries (
pk_part1  text,
partition bigint,  e.g. partition per day or per hour
pk_part2  text,  this is part of the partition key so I can
split write load
message_id  timeuuid,
secondary_key1  text,
secondary_key2   text,
.
more columns
.
PRIMARY KEY ((pk_part1, partition, pk_part2), message_id));

Most of the time I will need to do queries with
pk_part1/partition/pk_part2/message_id range. So this is what I
optimize for.

Sometimes, however, I will need to do queries with
pk_part1/partition/message_id range and some combination of
secondary_key1 (95% of the time there is a one-to-one relationship
with pk_part1) or secondary_key2 (for each secondary_key2 there will
be many pk_part2 values).

In this time series scenario, to efficiently make use of
secondary_key1/secondary_key2 as Cassandra secondary indexes for these
queries I assume that secondary_key1/secondary_key_2 would really need
to be composites combined into one column (in SQL I would create
multi-column indexes)? i.e.:

secondary_key_1 - pk_part1 + partition_key + real_secondary_key_1
secondary_key_2 - pl_part2 + partition_key + real_secondary_key_2

Would this be correct? Just making sure I understand how to best use
secondary indexes in Cassandra with time series data.

thanks in advance,
Gareth


Re: Secondary Indexes On Partitioned Time Series Data Question

2013-08-01 Thread Robert Coli
On Thu, Aug 1, 2013 at 12:49 PM, Gareth Collins
gareth.o.coll...@gmail.comwrote:

 Would this be correct? Just making sure I understand how to best use
 secondary indexes in Cassandra with time series data.


In general unless you ABSOLUTELY NEED the one unique feature of built-in
Secondary Indexes (atomic update of base row and index) you should just use
a normal column family for secondary index cases.

=Rob


Re: deb packages (and older versions)

2013-08-01 Thread Blair Zajac
I don't think they are listed from the cassandra.apache.org.  BTW, 
instructions are here:


http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/install/installDeb_t.html

Blair

On 08/01/2013 12:36 PM, David McNelis wrote:

Thanks, fwiw, did I just blatantly miss some documentation saying those
existed there?


On Thu, Aug 1, 2013 at 3:32 PM, Blair Zajac bl...@orcaware.com
mailto:bl...@orcaware.com wrote:

On 08/01/2013 12:27 PM, David McNelis wrote:

Hey folks,

Because 1.2.8 hasn't been pushed to the repo yet, I see that I
can pick
up the package at http://people.apache.org/~__eevans/
http://people.apache.org/~eevans/ and install it
manually.  This is great.  I'm wondering though, is there a
place where
I can pick up Debian packages for older releases?  I definitely
prefer
the package install, but sometimes when we add new nodes to our
ring the
only package available is the newest one, which doesn't match
the rest
of our nodes.


DataStax maintains a repo that has old versions:

http://debian.datastax.com/__community/pool/
http://debian.datastax.com/community/pool/

Blair







Re: How often to run `nodetool repair`

2013-08-01 Thread Andrey Ilinykh
On Thu, Aug 1, 2013 at 12:26 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Aug 1, 2013 at 9:35 AM, Carl Lerche m...@carllerche.com wrote:

 I read in the docs that `nodetool repair` should be regularly run unless
 no delete is ever performed. In my app, I never delete, but I heavily use
 the ttl feature. Should repair still be run regularly? Also, does repair
 take less time if it is run regularly? If not, is there a way to
 incrementally run it? It seems that when I do run repair, it takes a long
 time and causes high amounts CPU usage and iowait.


 TTL is effectively DELETE; you need to run a repair once every
 gc_grace_seconds. If you don't, data might un-delete itself.


How is it possible? Every replica has TTL, so it when it expires every
replica has tombstone. I don't see how you can get data with no tombstone.
What do I miss?

Andrey


Re: How often to run `nodetool repair`

2013-08-01 Thread Arthur Zubarev
Cassandra is an excellent choice for write heavy applications.

Reading large sets of data is not as fast and not as easy, you may need to have 
your client paging thru it and you may need slice queries and proper PK+Indexes 
to think of in advance.

Regards,

Arthur

From: Carl Lerche 
Sent: Thursday, August 01, 2013 3:03 PM
To: user@cassandra.apache.org ; Arthur Zubarev 
Subject: Re: How often to run `nodetool repair`

Arthur, 

Yes, my use case for this Cassandra cluster is analytics. I am building a 
google dapper (application tracing) like system. I collect application traces 
and write them to Cassandra. Then, I have periodic rollup tasks that read the 
data, do some summarization and write it back.

Thoughts on how to manage a write heavy cluster?

Thanks,
Carl



On Thu, Aug 1, 2013 at 11:28 AM, Arthur Zubarev arthur.zuba...@aol.com wrote:

  Hi Carl,

  The ‘repair’ is for data reads. Compaction will take care of the expired data.

  The fact a repair runs long makes me think the nodes receive unbalanced 
amounts of writes rather.

  Regards,

  Arthur

  From: Carl Lerche 
  Sent: Thursday, August 01, 2013 12:35 PM
  To: user@cassandra.apache.org 
  Subject: How often to run `nodetool repair`

  Hello, 

  I read in the docs that `nodetool repair` should be regularly run unless no 
delete is ever performed. In my app, I never delete, but I heavily use the ttl 
feature. Should repair still be run regularly? Also, does repair take less time 
if it is run regularly? If not, is there a way to incrementally run it? It 
seems that when I do run repair, it takes a long time and causes high amounts 
CPU usage and iowait.

  Thoughts?

  Thanks,
  Carl


Re: How often to run `nodetool repair`

2013-08-01 Thread horschi
 TTL is effectively DELETE; you need to run a repair once every
 gc_grace_seconds. If you don't, data might un-delete itself.


The undelete part is not true. btw: With CASSANDRA-4917 TTLed columns will
not even create a tombstone (assuming ttl  gc_grace).

The rest of your mail I agree with :-)


Re: Adding my first node to another one...

2013-08-01 Thread Morgan Segalis
Hi Arthur,

Thank you for your answer.
I have read the section Adding Capacity to an Existing Cluster prior to 
posting my question.

Actually I was thinking I would like Cassandra choose by itself the token.

Since I want only some column family to be an ALL cluster, and other column 
family to be where they are, no matter balancing…

I do not find anything on the configuration that I should make on the very 
first (and only node so far) to start the replication. (The configuration of my 
Node A is pretty basic, almost out of the box, I might changed the name)
How to make this node know that it will be a Seed.

My current Node A is using Cassandra 1.1.0

Is it compatible if I install a new node with Cassandra 1.2.8 ? or should I 
fetch 1.1.0 for Node B ?

Thank you.

Morgan.


Le 1 août 2013 à 20:32, Arthur Zubarev arthur.zuba...@aol.com a écrit :

 Hi Morgan,
 
 The scaling out depends on several factors. The most intricate is perhaps 
 calculating the tokens.
 
 Also the Cassandra version is important.
 
 At this point in time I suggest you read section Adding Capacity to an 
 Existing Cluster at 
 http://www.datastax.com/docs/1.0/operations/cluster_management
 and come back here with questions and more details.
 
 Regards,
 
 Arthur
 
 -Original Message- From: Morgan Segalis
 Sent: Thursday, August 01, 2013 11:24 AM
 To: user@cassandra.apache.org
 Subject: Adding my first node to another one...
 
 Hi everyone,
 
 I'm trying to wrap my head around Cassandra great ability to expand…
 
 I have set up my first Cassandra node a while ago… it was working great, and 
 data wasn't so important back then.
 Since I had a great experience with Cassandra I decided to migrate step by 
 step my MySQL data to Cassandra.
 
 Now data start to be important, so I would like to create another node, and 
 add it.
 Since I had some issue with my DataCenter, I wanted to have a copy (of 
 sensible data only) on another DataCenter.
 
 Quite frankly I'm still a newbie on Cassandra and need your guys help.
 
 First things first…
 Already up and Running Cassandra (Called A):
 - Do I need to change anything to the cassandra.yaml to make sure that 
 another node can connect ? if yes, should I restart the node (because I would 
 have to warn users about downtime) ?
 - Since this node should be a seed, the seed list is already set to 
 localhost, is that good enough ?
 
 The new node I want to add (Called B):
 - I know that before starting this node, I should modify the seed list in 
 cassandra.yaml… Is that the only thing I need to do ?
 
 It is my first time doing this, so please be gentle ;-)
 
 Thank you all,
 
 Morgan. 



Re: How often to run `nodetool repair`

2013-08-01 Thread Robert Coli
On Thu, Aug 1, 2013 at 1:16 PM, Andrey Ilinykh ailin...@gmail.com wrote:


 On Thu, Aug 1, 2013 at 12:26 PM, Robert Coli rc...@eventbrite.com wrote:

 TTL is effectively DELETE; you need to run a repair once every
 gc_grace_seconds. If you don't, data might un-delete itself.


 How is it possible? Every replica has TTL, so it when it expires every
 replica has tombstone. I don't see how you can get data with no tombstone.
 What do I miss?


I knew I had heard of cases where repair is required despite TTL, but
didn't recall the specifics. Thanks for the opportunity to go look it up...

http://comments.gmane.org/gmane.comp.db.cassandra.user/21008

quoting Sylvain Lebresne :

The initial question was about can I use inserting with ttl=1 instead of
issuing deletes, ***so that would be a case where you do shadow a previous
version with a very small ttl and so repair is important.*** (EMPHASIS
rcoli)

But you're right that if you only issue data with expiration (no deletes)
and
that you
  * either do not overwrite columns
  * or are sure that when you do overwrite, the value you're overwriting has
 a ttl that is lesser or equal than the ttl of the value you're
overwriting with
 (+gc_grace to be precise)
then yes, ***repair is not necessary because you can't have shadowed value
resurfacing.*** (EMPHASIS rcoli)


So, to be more precise with my initial statement :

TTL is like DELETE in some cases, so unless you are certain that you are
not (and will not be) in those cases, you should run repair when using TTL.

Also you will be unable to repair entire keyspaces, you will have to repair
on a per column family basis, manually excluding CFs matching these
criteria, increasing management complexity.

=Rob


Re: How often to run `nodetool repair`

2013-08-01 Thread Erik Forkalsud

On 08/01/2013 01:16 PM, Andrey Ilinykh wrote:


TTL is effectively DELETE; you need to run a repair once every
gc_grace_seconds. If you don't, data might un-delete itself.


How is it possible? Every replica has TTL, so it when it expires every 
replica has tombstone. I don't see how you can get data with no 
tombstone. What do I miss?




The only way I can think of is this scenario:

   - value A for some key is written with ttl=30days, to all 
replicas   (i.e a long ttl or no ttl at all)
   - value B for the same key is written with ttl=1day, but doesn't 
reach all replicas

   - one day passes and the ttl=1day values turn into deletes
   - gc_grace passes and the tombstones are purged

at this point, the replica that didn't get the ttl=1day value will think 
the older value A is live.


I'm no expert on this so I may be mistaken, but in any case it's a 
corner case as overwriting columns with shorter ttls would be unusual.



- Erik -



Re: Secondary Indexes On Partitioned Time Series Data Question

2013-08-01 Thread Shahab Yunus
Hi Robert,

Can you shed some more light (or point towards some other resource) that
why you think built-in Secondary Indexes should not be used easily or
without much consideration? Thanks.

Regards,
Shahab


On Thu, Aug 1, 2013 at 3:53 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Aug 1, 2013 at 12:49 PM, Gareth Collins 
 gareth.o.coll...@gmail.com wrote:

 Would this be correct? Just making sure I understand how to best use
 secondary indexes in Cassandra with time series data.


 In general unless you ABSOLUTELY NEED the one unique feature of built-in
 Secondary Indexes (atomic update of base row and index) you should just use
 a normal column family for secondary index cases.

 =Rob



Re: Adding my first node to another one...

2013-08-01 Thread Jonathan Haddad
I recommend you do not add 1.2 nodes to a 1.1 cluster.   We tried this, and
ran into many issues.  Specifically, the data will not correctly stream
from the 1.1 nodes to the 1.2, and it will never bootstrap correctly.


On Thu, Aug 1, 2013 at 2:07 PM, Morgan Segalis msega...@gmail.com wrote:

 Hi Arthur,

 Thank you for your answer.
 I have read the section Adding Capacity to an Existing Cluster prior to
 posting my question.

 Actually I was thinking I would like Cassandra choose by itself the token.

 Since I want only some column family to be an ALL cluster, and other
 column family to be where they are, no matter balancing…

 I do not find anything on the configuration that I should make on the very
 first (and only node so far) to start the replication. (The configuration
 of my Node A is pretty basic, almost out of the box, I might changed the
 name)
 How to make this node know that it will be a Seed.

 My current Node A is using Cassandra 1.1.0

 Is it compatible if I install a new node with Cassandra 1.2.8 ? or should
 I fetch 1.1.0 for Node B ?

 Thank you.

 Morgan.


 Le 1 août 2013 à 20:32, Arthur Zubarev arthur.zuba...@aol.com a écrit
 :

  Hi Morgan,
 
  The scaling out depends on several factors. The most intricate is
 perhaps calculating the tokens.
 
  Also the Cassandra version is important.
 
  At this point in time I suggest you read section Adding Capacity to an
 Existing Cluster at
 http://www.datastax.com/docs/1.0/operations/cluster_management
  and come back here with questions and more details.
 
  Regards,
 
  Arthur
 
  -Original Message- From: Morgan Segalis
  Sent: Thursday, August 01, 2013 11:24 AM
  To: user@cassandra.apache.org
  Subject: Adding my first node to another one...
 
  Hi everyone,
 
  I'm trying to wrap my head around Cassandra great ability to expand…
 
  I have set up my first Cassandra node a while ago… it was working great,
 and data wasn't so important back then.
  Since I had a great experience with Cassandra I decided to migrate step
 by step my MySQL data to Cassandra.
 
  Now data start to be important, so I would like to create another node,
 and add it.
  Since I had some issue with my DataCenter, I wanted to have a copy (of
 sensible data only) on another DataCenter.
 
  Quite frankly I'm still a newbie on Cassandra and need your guys help.
 
  First things first…
  Already up and Running Cassandra (Called A):
  - Do I need to change anything to the cassandra.yaml to make sure that
 another node can connect ? if yes, should I restart the node (because I
 would have to warn users about downtime) ?
  - Since this node should be a seed, the seed list is already set to
 localhost, is that good enough ?
 
  The new node I want to add (Called B):
  - I know that before starting this node, I should modify the seed list
 in cassandra.yaml… Is that the only thing I need to do ?
 
  It is my first time doing this, so please be gentle ;-)
 
  Thank you all,
 
  Morgan.




-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: Adding my first node to another one...

2013-08-01 Thread Robert Coli
On Thu, Aug 1, 2013 at 2:07 PM, Morgan Segalis msega...@gmail.com wrote:

 Actually I was thinking I would like Cassandra choose by itself the token.


You NEVER want Cassandra to choose its own token in production. There is no
advantage to doing so and significant risk when used as a matter of course.
The conf file even says you should manually specify tokens in production..


 How to make this node know that it will be a Seed.


The only thing that makes a node a Seed is that any other node has it in
its seed list.

My current Node A is using Cassandra 1.1.0


You should not run 1.1.0, it contains significant and serious bugs. You
should upgrade to the top of 1.1 series ASAP.


 Is it compatible if I install a new node with Cassandra 1.2.8 ? or should
 I fetch 1.1.0 for Node B ?


It is not compatible, use 1.1.x with 1.1.x.

=Rob


Re: Adding my first node to another one...

2013-08-01 Thread Morgan Segalis
Hi Rob,

Le 2 août 2013 à 00:15, Robert Coli rc...@eventbrite.com a écrit :

 On Thu, Aug 1, 2013 at 2:07 PM, Morgan Segalis msega...@gmail.com wrote:
 Actually I was thinking I would like Cassandra choose by itself the token.
 
 You NEVER want Cassandra to choose its own token in production. There is no 
 advantage to doing so and significant risk when used as a matter of course. 
 The conf file even says you should manually specify tokens in production..

Ok, then I'll try to understand this token thing.

  
 How to make this node know that it will be a Seed.
 
 The only thing that makes a node a Seed is that any other node has it in its 
 seed list. 

Good to know, thanks !

 
 My current Node A is using Cassandra 1.1.0
 
 You should not run 1.1.0, it contains significant and serious bugs. You 
 should upgrade to the top of 1.1 series ASAP.

Of course I need to upgrade Cassandra, but I won't do that until I have another 
node than can take the relay while I'm upgrading.

  
 Is it compatible if I install a new node with Cassandra 1.2.8 ? or should I 
 fetch 1.1.0 for Node B ?
 
 It is not compatible, use 1.1.x with 1.1.x. 

Yeah, that's what I though !

 
 =Rob


Thank you for your tips.

Re: Secondary Indexes On Partitioned Time Series Data Question

2013-08-01 Thread Robert Coli
On Thu, Aug 1, 2013 at 2:34 PM, Shahab Yunus shahab.yu...@gmail.com wrote:

 Can you shed some more light (or point towards some other resource) that
 why you think built-in Secondary Indexes should not be used easily or
 without much consideration? Thanks.


1) Secondary indexes are more or less modeled like a manual pseudo
Secondary Index CF would be.
2) Except they are more opaque than doing it yourself. For example you
cannot see information on them in nodetool cfstats.
3) And there have been a steady trickle of bugs which relate to their
implementation, in many cases resulting in them not returning the data they
should. [1]
4) These bugs would not apply to a manual pseudo Secondary Index CF.
5) And the only benefits you get are the marginal convenience of querying
the secondary index instead of a second CF, and atomic synchronized update.
6) Which most people do not actually need.

tl;dr : unless you need the atomic update property, just use a manual
pseudo secondary index CF

=Rob

[1] https://issues.apache.org/jira/browse/CASSANDRA-4785 ,
https://issues.apache.org/jira/browse/CASSANDRA-5540 ,
https://issues.apache.org/jira/browse/CASSANDRA-2897 , etc.


Re: Secondary Indexes On Partitioned Time Series Data Question

2013-08-01 Thread Shahab Yunus
Thanks a lot.

Regards,
Shahab


On Thu, Aug 1, 2013 at 8:32 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Aug 1, 2013 at 2:34 PM, Shahab Yunus shahab.yu...@gmail.comwrote:

 Can you shed some more light (or point towards some other resource) that
 why you think built-in Secondary Indexes should not be used easily or
 without much consideration? Thanks.


 1) Secondary indexes are more or less modeled like a manual pseudo
 Secondary Index CF would be.
 2) Except they are more opaque than doing it yourself. For example you
 cannot see information on them in nodetool cfstats.
 3) And there have been a steady trickle of bugs which relate to their
 implementation, in many cases resulting in them not returning the data they
 should. [1]
 4) These bugs would not apply to a manual pseudo Secondary Index CF.
 5) And the only benefits you get are the marginal convenience of querying
 the secondary index instead of a second CF, and atomic synchronized update.
 6) Which most people do not actually need.

 tl;dr : unless you need the atomic update property, just use a manual
 pseudo secondary index CF

 =Rob

 [1] https://issues.apache.org/jira/browse/CASSANDRA-4785 ,
 https://issues.apache.org/jira/browse/CASSANDRA-5540 ,
 https://issues.apache.org/jira/browse/CASSANDRA-2897 , etc.