Re: [DISCUSS] changing default token behavior for 4.0

2018-09-21 Thread Dikang Gu
We are using 8 or 16 tokens internally, with the token allocation algorithm
enabled. The range distribution is good for us.

Dikang.

On Fri, Sep 21, 2018 at 9:30 PM Dinesh Joshi 
wrote:

> Jon, thanks for starting this thread!
>
> I have created CASSANDRA-14784 to track this.
>
> Dinesh
>
> > On Sep 21, 2018, at 9:18 PM, Sankalp Kohli 
> wrote:
> >
> > Putting it on JIRA is to make sure someone is assigned to it and it is
> tracked. Changes should be discussed over ML like you are saying.
> >
> > On Sep 21, 2018, at 21:02, Jonathan Haddad  wrote:
> >
> >>> We should create a JIRA to find what other defaults we need revisit.
> >>
> >> Changing a default is a pretty big deal, I think we should discuss any
> >> changes to defaults here on the ML before moving it into JIRA.  It's
> nice
> >> to get a bit more discussion around the change than what happens in
> JIRA.
> >>
> >> We (TLP) did some testing on 4 tokens and found it to work surprisingly
> >> well.   It wasn't particularly formal, but we verified the load stays
> >> pretty even with only 4 tokens as we added nodes to the cluster.  Higher
> >> token count hurts availability by increasing the number of nodes any
> given
> >> node is a neighbor with, meaning any 2 nodes that fail have an increased
> >> chance of downtime when using QUORUM.  In addition, with the recent
> >> streaming optimization it seems the token counts will give a greater
> chance
> >> of a node streaming entire sstables (with LCS), meaning we'll do a
> better
> >> job with node density out of the box.
> >>
> >> Next week I can try to put together something a little more convincing.
> >> Weekend time.
> >>
> >> Jon
> >>
> >>
> >> On Fri, Sep 21, 2018 at 8:45 PM sankalp kohli 
> >> wrote:
> >>
> >>> +1 to lowering it.
> >>> Thanks Jon for starting this.We should create a JIRA to find what other
> >>> defaults we need revisit. (Please keep this discussion for "default
> token"
> >>> only.  )
> >>>
>  On Fri, Sep 21, 2018 at 8:26 PM Jeff Jirsa  wrote:
> 
>  Also agree it should be lowered, but definitely not to 1, and probably
>  something closer to 32 than 4.
> 
>  --
>  Jeff Jirsa
> 
> 
> > On Sep 21, 2018, at 8:24 PM, Jeremy Hanna <
> jeremy.hanna1...@gmail.com>
>  wrote:
> >
> > I agree that it should be lowered. What I’ve seen debated a bit in
> the
>  past is the number but I don’t think anyone thinks that it should
> remain
>  256.
> >
> >> On Sep 21, 2018, at 7:05 PM, Jonathan Haddad 
> >>> wrote:
> >>
> >> One thing that's really, really bothered me for a while is how we
>  default
> >> to 256 tokens still.  There's no experienced operator that leaves it
> >>> as
>  is
> >> at this point, meaning the only people using 256 are the poor folks
> >>> that
> >> just got started using C*.  I've worked with over a hundred clusters
> >>> in
>  the
> >> last couple years, and I think I only worked with one that had
> lowered
>  it
> >> to something else.
> >>
> >> I think it's time we changed the default to 4 (or 8, up for debate).
> >>
> >> To improve the behavior, we need to change a couple other things.
> The
> >> allocate_tokens_for_keyspace setting is... odd.  It requires you
> have
> >>> a
> >> keyspace already created, which doesn't help on new clusters.  What
> >>> I'd
> >> like to do is add a new setting, allocate_tokens_for_rf, and set it
> to
>  3 by
> >> default.
> >>
> >> To handle clusters that are already using 256 tokens, we could
> prevent
>  the
> >> new node from joining unless a -D flag is set to explicitly allow
> >> imbalanced tokens.
> >>
> >> We've agreed to a trunk freeze, but I feel like this is important
> >>> enough
> >> (and pretty trivial) to do now.  I'd also personally characterize
> this
>  as a
> >> bug fix since 256 is horribly broken when the cluster gets to any
> >> reasonable size, but maybe I'm alone there.
> >>
> >> I honestly can't think of a use case where random tokens is a good
>  choice
> >> anymore, so I'd be fine / ecstatic with removing it completely and
> >> requiring either allocate_tokens_for_keyspace (for existing
> clusters)
> >> or allocate_tokens_for_rf
> >> to be set.
> >>
> >> Thoughts?  Objections?
> >> --
> >> Jon Haddad
> >> http://www.rustyrazorblade.com
> >> twitter: rustyrazorblade
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> 
>  -
>  To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>  For additional commands, e-mail: dev-h...@cassandra.apache.org
> 
> 
> >>>
> >>
> >>
> >> --
> >> Jon Haddad
> 

Re: [DISCUSS] changing default token behavior for 4.0

2018-09-21 Thread Dinesh Joshi
Jon, thanks for starting this thread!

I have created CASSANDRA-14784 to track this. 

Dinesh

> On Sep 21, 2018, at 9:18 PM, Sankalp Kohli  wrote:
> 
> Putting it on JIRA is to make sure someone is assigned to it and it is 
> tracked. Changes should be discussed over ML like you are saying. 
> 
> On Sep 21, 2018, at 21:02, Jonathan Haddad  wrote:
> 
>>> We should create a JIRA to find what other defaults we need revisit.
>> 
>> Changing a default is a pretty big deal, I think we should discuss any
>> changes to defaults here on the ML before moving it into JIRA.  It's nice
>> to get a bit more discussion around the change than what happens in JIRA.
>> 
>> We (TLP) did some testing on 4 tokens and found it to work surprisingly
>> well.   It wasn't particularly formal, but we verified the load stays
>> pretty even with only 4 tokens as we added nodes to the cluster.  Higher
>> token count hurts availability by increasing the number of nodes any given
>> node is a neighbor with, meaning any 2 nodes that fail have an increased
>> chance of downtime when using QUORUM.  In addition, with the recent
>> streaming optimization it seems the token counts will give a greater chance
>> of a node streaming entire sstables (with LCS), meaning we'll do a better
>> job with node density out of the box.
>> 
>> Next week I can try to put together something a little more convincing.
>> Weekend time.
>> 
>> Jon
>> 
>> 
>> On Fri, Sep 21, 2018 at 8:45 PM sankalp kohli 
>> wrote:
>> 
>>> +1 to lowering it.
>>> Thanks Jon for starting this.We should create a JIRA to find what other
>>> defaults we need revisit. (Please keep this discussion for "default token"
>>> only.  )
>>> 
 On Fri, Sep 21, 2018 at 8:26 PM Jeff Jirsa  wrote:
 
 Also agree it should be lowered, but definitely not to 1, and probably
 something closer to 32 than 4.
 
 --
 Jeff Jirsa
 
 
> On Sep 21, 2018, at 8:24 PM, Jeremy Hanna 
 wrote:
> 
> I agree that it should be lowered. What I’ve seen debated a bit in the
 past is the number but I don’t think anyone thinks that it should remain
 256.
> 
>> On Sep 21, 2018, at 7:05 PM, Jonathan Haddad 
>>> wrote:
>> 
>> One thing that's really, really bothered me for a while is how we
 default
>> to 256 tokens still.  There's no experienced operator that leaves it
>>> as
 is
>> at this point, meaning the only people using 256 are the poor folks
>>> that
>> just got started using C*.  I've worked with over a hundred clusters
>>> in
 the
>> last couple years, and I think I only worked with one that had lowered
 it
>> to something else.
>> 
>> I think it's time we changed the default to 4 (or 8, up for debate).
>> 
>> To improve the behavior, we need to change a couple other things.  The
>> allocate_tokens_for_keyspace setting is... odd.  It requires you have
>>> a
>> keyspace already created, which doesn't help on new clusters.  What
>>> I'd
>> like to do is add a new setting, allocate_tokens_for_rf, and set it to
 3 by
>> default.
>> 
>> To handle clusters that are already using 256 tokens, we could prevent
 the
>> new node from joining unless a -D flag is set to explicitly allow
>> imbalanced tokens.
>> 
>> We've agreed to a trunk freeze, but I feel like this is important
>>> enough
>> (and pretty trivial) to do now.  I'd also personally characterize this
 as a
>> bug fix since 256 is horribly broken when the cluster gets to any
>> reasonable size, but maybe I'm alone there.
>> 
>> I honestly can't think of a use case where random tokens is a good
 choice
>> anymore, so I'd be fine / ecstatic with removing it completely and
>> requiring either allocate_tokens_for_keyspace (for existing clusters)
>> or allocate_tokens_for_rf
>> to be set.
>> 
>> Thoughts?  Objections?
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org
 
 
>>> 
>> 
>> 
>> -- 
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] changing default token behavior for 4.0

2018-09-21 Thread Sankalp Kohli
Putting it on JIRA is to make sure someone is assigned to it and it is tracked. 
Changes should be discussed over ML like you are saying. 

On Sep 21, 2018, at 21:02, Jonathan Haddad  wrote:

>> We should create a JIRA to find what other defaults we need revisit.
> 
> Changing a default is a pretty big deal, I think we should discuss any
> changes to defaults here on the ML before moving it into JIRA.  It's nice
> to get a bit more discussion around the change than what happens in JIRA.
> 
> We (TLP) did some testing on 4 tokens and found it to work surprisingly
> well.   It wasn't particularly formal, but we verified the load stays
> pretty even with only 4 tokens as we added nodes to the cluster.  Higher
> token count hurts availability by increasing the number of nodes any given
> node is a neighbor with, meaning any 2 nodes that fail have an increased
> chance of downtime when using QUORUM.  In addition, with the recent
> streaming optimization it seems the token counts will give a greater chance
> of a node streaming entire sstables (with LCS), meaning we'll do a better
> job with node density out of the box.
> 
> Next week I can try to put together something a little more convincing.
> Weekend time.
> 
> Jon
> 
> 
> On Fri, Sep 21, 2018 at 8:45 PM sankalp kohli 
> wrote:
> 
>> +1 to lowering it.
>> Thanks Jon for starting this.We should create a JIRA to find what other
>> defaults we need revisit. (Please keep this discussion for "default token"
>> only.  )
>> 
>>> On Fri, Sep 21, 2018 at 8:26 PM Jeff Jirsa  wrote:
>>> 
>>> Also agree it should be lowered, but definitely not to 1, and probably
>>> something closer to 32 than 4.
>>> 
>>> --
>>> Jeff Jirsa
>>> 
>>> 
 On Sep 21, 2018, at 8:24 PM, Jeremy Hanna 
>>> wrote:
 
 I agree that it should be lowered. What I’ve seen debated a bit in the
>>> past is the number but I don’t think anyone thinks that it should remain
>>> 256.
 
> On Sep 21, 2018, at 7:05 PM, Jonathan Haddad 
>> wrote:
> 
> One thing that's really, really bothered me for a while is how we
>>> default
> to 256 tokens still.  There's no experienced operator that leaves it
>> as
>>> is
> at this point, meaning the only people using 256 are the poor folks
>> that
> just got started using C*.  I've worked with over a hundred clusters
>> in
>>> the
> last couple years, and I think I only worked with one that had lowered
>>> it
> to something else.
> 
> I think it's time we changed the default to 4 (or 8, up for debate).
> 
> To improve the behavior, we need to change a couple other things.  The
> allocate_tokens_for_keyspace setting is... odd.  It requires you have
>> a
> keyspace already created, which doesn't help on new clusters.  What
>> I'd
> like to do is add a new setting, allocate_tokens_for_rf, and set it to
>>> 3 by
> default.
> 
> To handle clusters that are already using 256 tokens, we could prevent
>>> the
> new node from joining unless a -D flag is set to explicitly allow
> imbalanced tokens.
> 
> We've agreed to a trunk freeze, but I feel like this is important
>> enough
> (and pretty trivial) to do now.  I'd also personally characterize this
>>> as a
> bug fix since 256 is horribly broken when the cluster gets to any
> reasonable size, but maybe I'm alone there.
> 
> I honestly can't think of a use case where random tokens is a good
>>> choice
> anymore, so I'd be fine / ecstatic with removing it completely and
> requiring either allocate_tokens_for_keyspace (for existing clusters)
> or allocate_tokens_for_rf
> to be set.
> 
> Thoughts?  Objections?
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: dev-h...@cassandra.apache.org
 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
>> 
> 
> 
> -- 
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] changing default token behavior for 4.0

2018-09-21 Thread Jonathan Haddad
> We should create a JIRA to find what other defaults we need revisit.

Changing a default is a pretty big deal, I think we should discuss any
changes to defaults here on the ML before moving it into JIRA.  It's nice
to get a bit more discussion around the change than what happens in JIRA.

We (TLP) did some testing on 4 tokens and found it to work surprisingly
well.   It wasn't particularly formal, but we verified the load stays
pretty even with only 4 tokens as we added nodes to the cluster.  Higher
token count hurts availability by increasing the number of nodes any given
node is a neighbor with, meaning any 2 nodes that fail have an increased
chance of downtime when using QUORUM.  In addition, with the recent
streaming optimization it seems the token counts will give a greater chance
of a node streaming entire sstables (with LCS), meaning we'll do a better
job with node density out of the box.

Next week I can try to put together something a little more convincing.
Weekend time.

Jon


On Fri, Sep 21, 2018 at 8:45 PM sankalp kohli 
wrote:

> +1 to lowering it.
> Thanks Jon for starting this.We should create a JIRA to find what other
> defaults we need revisit. (Please keep this discussion for "default token"
> only.  )
>
> On Fri, Sep 21, 2018 at 8:26 PM Jeff Jirsa  wrote:
>
> > Also agree it should be lowered, but definitely not to 1, and probably
> > something closer to 32 than 4.
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Sep 21, 2018, at 8:24 PM, Jeremy Hanna 
> > wrote:
> > >
> > > I agree that it should be lowered. What I’ve seen debated a bit in the
> > past is the number but I don’t think anyone thinks that it should remain
> > 256.
> > >
> > >> On Sep 21, 2018, at 7:05 PM, Jonathan Haddad 
> wrote:
> > >>
> > >> One thing that's really, really bothered me for a while is how we
> > default
> > >> to 256 tokens still.  There's no experienced operator that leaves it
> as
> > is
> > >> at this point, meaning the only people using 256 are the poor folks
> that
> > >> just got started using C*.  I've worked with over a hundred clusters
> in
> > the
> > >> last couple years, and I think I only worked with one that had lowered
> > it
> > >> to something else.
> > >>
> > >> I think it's time we changed the default to 4 (or 8, up for debate).
> > >>
> > >> To improve the behavior, we need to change a couple other things.  The
> > >> allocate_tokens_for_keyspace setting is... odd.  It requires you have
> a
> > >> keyspace already created, which doesn't help on new clusters.  What
> I'd
> > >> like to do is add a new setting, allocate_tokens_for_rf, and set it to
> > 3 by
> > >> default.
> > >>
> > >> To handle clusters that are already using 256 tokens, we could prevent
> > the
> > >> new node from joining unless a -D flag is set to explicitly allow
> > >> imbalanced tokens.
> > >>
> > >> We've agreed to a trunk freeze, but I feel like this is important
> enough
> > >> (and pretty trivial) to do now.  I'd also personally characterize this
> > as a
> > >> bug fix since 256 is horribly broken when the cluster gets to any
> > >> reasonable size, but maybe I'm alone there.
> > >>
> > >> I honestly can't think of a use case where random tokens is a good
> > choice
> > >> anymore, so I'd be fine / ecstatic with removing it completely and
> > >> requiring either allocate_tokens_for_keyspace (for existing clusters)
> > >> or allocate_tokens_for_rf
> > >> to be set.
> > >>
> > >> Thoughts?  Objections?
> > >> --
> > >> Jon Haddad
> > >> http://www.rustyrazorblade.com
> > >> twitter: rustyrazorblade
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: [DISCUSS] changing default token behavior for 4.0

2018-09-21 Thread sankalp kohli
+1 to lowering it.
Thanks Jon for starting this.We should create a JIRA to find what other
defaults we need revisit. (Please keep this discussion for "default token"
only.  )

On Fri, Sep 21, 2018 at 8:26 PM Jeff Jirsa  wrote:

> Also agree it should be lowered, but definitely not to 1, and probably
> something closer to 32 than 4.
>
> --
> Jeff Jirsa
>
>
> > On Sep 21, 2018, at 8:24 PM, Jeremy Hanna 
> wrote:
> >
> > I agree that it should be lowered. What I’ve seen debated a bit in the
> past is the number but I don’t think anyone thinks that it should remain
> 256.
> >
> >> On Sep 21, 2018, at 7:05 PM, Jonathan Haddad  wrote:
> >>
> >> One thing that's really, really bothered me for a while is how we
> default
> >> to 256 tokens still.  There's no experienced operator that leaves it as
> is
> >> at this point, meaning the only people using 256 are the poor folks that
> >> just got started using C*.  I've worked with over a hundred clusters in
> the
> >> last couple years, and I think I only worked with one that had lowered
> it
> >> to something else.
> >>
> >> I think it's time we changed the default to 4 (or 8, up for debate).
> >>
> >> To improve the behavior, we need to change a couple other things.  The
> >> allocate_tokens_for_keyspace setting is... odd.  It requires you have a
> >> keyspace already created, which doesn't help on new clusters.  What I'd
> >> like to do is add a new setting, allocate_tokens_for_rf, and set it to
> 3 by
> >> default.
> >>
> >> To handle clusters that are already using 256 tokens, we could prevent
> the
> >> new node from joining unless a -D flag is set to explicitly allow
> >> imbalanced tokens.
> >>
> >> We've agreed to a trunk freeze, but I feel like this is important enough
> >> (and pretty trivial) to do now.  I'd also personally characterize this
> as a
> >> bug fix since 256 is horribly broken when the cluster gets to any
> >> reasonable size, but maybe I'm alone there.
> >>
> >> I honestly can't think of a use case where random tokens is a good
> choice
> >> anymore, so I'd be fine / ecstatic with removing it completely and
> >> requiring either allocate_tokens_for_keyspace (for existing clusters)
> >> or allocate_tokens_for_rf
> >> to be set.
> >>
> >> Thoughts?  Objections?
> >> --
> >> Jon Haddad
> >> http://www.rustyrazorblade.com
> >> twitter: rustyrazorblade
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: [DISCUSS] changing default token behavior for 4.0

2018-09-21 Thread Jeff Jirsa
Also agree it should be lowered, but definitely not to 1, and probably 
something closer to 32 than 4.

-- 
Jeff Jirsa


> On Sep 21, 2018, at 8:24 PM, Jeremy Hanna  wrote:
> 
> I agree that it should be lowered. What I’ve seen debated a bit in the past 
> is the number but I don’t think anyone thinks that it should remain 256.
> 
>> On Sep 21, 2018, at 7:05 PM, Jonathan Haddad  wrote:
>> 
>> One thing that's really, really bothered me for a while is how we default
>> to 256 tokens still.  There's no experienced operator that leaves it as is
>> at this point, meaning the only people using 256 are the poor folks that
>> just got started using C*.  I've worked with over a hundred clusters in the
>> last couple years, and I think I only worked with one that had lowered it
>> to something else.
>> 
>> I think it's time we changed the default to 4 (or 8, up for debate).
>> 
>> To improve the behavior, we need to change a couple other things.  The
>> allocate_tokens_for_keyspace setting is... odd.  It requires you have a
>> keyspace already created, which doesn't help on new clusters.  What I'd
>> like to do is add a new setting, allocate_tokens_for_rf, and set it to 3 by
>> default.
>> 
>> To handle clusters that are already using 256 tokens, we could prevent the
>> new node from joining unless a -D flag is set to explicitly allow
>> imbalanced tokens.
>> 
>> We've agreed to a trunk freeze, but I feel like this is important enough
>> (and pretty trivial) to do now.  I'd also personally characterize this as a
>> bug fix since 256 is horribly broken when the cluster gets to any
>> reasonable size, but maybe I'm alone there.
>> 
>> I honestly can't think of a use case where random tokens is a good choice
>> anymore, so I'd be fine / ecstatic with removing it completely and
>> requiring either allocate_tokens_for_keyspace (for existing clusters)
>> or allocate_tokens_for_rf
>> to be set.
>> 
>> Thoughts?  Objections?
>> -- 
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] changing default token behavior for 4.0

2018-09-21 Thread Jeremy Hanna
I agree that it should be lowered. What I’ve seen debated a bit in the past is 
the number but I don’t think anyone thinks that it should remain 256.

> On Sep 21, 2018, at 7:05 PM, Jonathan Haddad  wrote:
> 
> One thing that's really, really bothered me for a while is how we default
> to 256 tokens still.  There's no experienced operator that leaves it as is
> at this point, meaning the only people using 256 are the poor folks that
> just got started using C*.  I've worked with over a hundred clusters in the
> last couple years, and I think I only worked with one that had lowered it
> to something else.
> 
> I think it's time we changed the default to 4 (or 8, up for debate).
> 
> To improve the behavior, we need to change a couple other things.  The
> allocate_tokens_for_keyspace setting is... odd.  It requires you have a
> keyspace already created, which doesn't help on new clusters.  What I'd
> like to do is add a new setting, allocate_tokens_for_rf, and set it to 3 by
> default.
> 
> To handle clusters that are already using 256 tokens, we could prevent the
> new node from joining unless a -D flag is set to explicitly allow
> imbalanced tokens.
> 
> We've agreed to a trunk freeze, but I feel like this is important enough
> (and pretty trivial) to do now.  I'd also personally characterize this as a
> bug fix since 256 is horribly broken when the cluster gets to any
> reasonable size, but maybe I'm alone there.
> 
> I honestly can't think of a use case where random tokens is a good choice
> anymore, so I'd be fine / ecstatic with removing it completely and
> requiring either allocate_tokens_for_keyspace (for existing clusters)
> or allocate_tokens_for_rf
> to be set.
> 
> Thoughts?  Objections?
> -- 
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] changing default token behavior for 4.0

2018-09-21 Thread dinesh.jo...@yahoo.com.INVALID
Logistics aside, I think it is a good idea to default 1 token (or a low 
number). Let the user understand what it means to go beyond 1 and tune things 
based on their needs.
Dinesh 

On Friday, September 21, 2018, 5:06:14 PM PDT, Jonathan Haddad 
 wrote:  
 
 One thing that's really, really bothered me for a while is how we default
to 256 tokens still.  There's no experienced operator that leaves it as is
at this point, meaning the only people using 256 are the poor folks that
just got started using C*.  I've worked with over a hundred clusters in the
last couple years, and I think I only worked with one that had lowered it
to something else.

I think it's time we changed the default to 4 (or 8, up for debate).

To improve the behavior, we need to change a couple other things.  The
allocate_tokens_for_keyspace setting is... odd.  It requires you have a
keyspace already created, which doesn't help on new clusters.  What I'd
like to do is add a new setting, allocate_tokens_for_rf, and set it to 3 by
default.

To handle clusters that are already using 256 tokens, we could prevent the
new node from joining unless a -D flag is set to explicitly allow
imbalanced tokens.

We've agreed to a trunk freeze, but I feel like this is important enough
(and pretty trivial) to do now.  I'd also personally characterize this as a
bug fix since 256 is horribly broken when the cluster gets to any
reasonable size, but maybe I'm alone there.

I honestly can't think of a use case where random tokens is a good choice
anymore, so I'd be fine / ecstatic with removing it completely and
requiring either allocate_tokens_for_keyspace (for existing clusters)
or allocate_tokens_for_rf
to be set.

Thoughts?  Objections?
-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade
  

[DISCUSS] changing default token behavior for 4.0

2018-09-21 Thread Jonathan Haddad
One thing that's really, really bothered me for a while is how we default
to 256 tokens still.  There's no experienced operator that leaves it as is
at this point, meaning the only people using 256 are the poor folks that
just got started using C*.  I've worked with over a hundred clusters in the
last couple years, and I think I only worked with one that had lowered it
to something else.

I think it's time we changed the default to 4 (or 8, up for debate).

To improve the behavior, we need to change a couple other things.  The
allocate_tokens_for_keyspace setting is... odd.  It requires you have a
keyspace already created, which doesn't help on new clusters.  What I'd
like to do is add a new setting, allocate_tokens_for_rf, and set it to 3 by
default.

To handle clusters that are already using 256 tokens, we could prevent the
new node from joining unless a -D flag is set to explicitly allow
imbalanced tokens.

We've agreed to a trunk freeze, but I feel like this is important enough
(and pretty trivial) to do now.  I'd also personally characterize this as a
bug fix since 256 is horribly broken when the cluster gets to any
reasonable size, but maybe I'm alone there.

I honestly can't think of a use case where random tokens is a good choice
anymore, so I'd be fine / ecstatic with removing it completely and
requiring either allocate_tokens_for_keyspace (for existing clusters)
or allocate_tokens_for_rf
to be set.

Thoughts?  Objections?
-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: Proposing an Apache Cassandra Management process

2018-09-21 Thread dinesh.jo...@yahoo.com.INVALID
I have created a sub-task - CASSANDRA-14783. Could we get some feedback before 
we begin implementing anything?

Dinesh 

On Thursday, September 20, 2018, 11:22:33 PM PDT, Dinesh Joshi 
 wrote:  
 
 I have updated the doc with a short paragraph providing the clarification. 
Sankalp's suggestion is already part of the doc. If there aren't further 
objections could we move this discussion over to the jira (CASSANDRA-14395)?

Dinesh

> On Sep 18, 2018, at 10:31 AM, sankalp kohli  wrote:
> 
> How about we start with a few basic features in side car. How about starting 
> with this
> 1. Bulk nodetool commands: User can curl any sidecar and be able to run a 
> nodetool command in bulk across the cluster. 
> :/bulk/nodetool/tablestats?arg0=keyspace_name.table_name=  required>
> 
> And later 
> 2: Health checks. 
> 
> On Thu, Sep 13, 2018 at 11:34 AM dinesh.jo...@yahoo.com.INVALID 
>  wrote:
> I will update the document to add that point. The document did not mean to 
> serve as a design or architectural document but rather something that would 
> spark a discussion on the idea.
> Dinesh 
> 
>    On Thursday, September 13, 2018, 10:59:34 AM PDT, Jonathan Haddad 
>mailto:j...@jonhaddad.com>> wrote:  
> 
>  Most of the discussion and work was done off the mailing list - there's a
> big risk involved when folks disappear for months at a time and resurface
> with big pile of code plus an agenda that you failed to loop everyone in
> on. In addition, by your own words the design document didn't accurately
> describe what was being built.  I don't write this to try to argue about
> it, I just want to put some perspective for those of us that weren't part
> of this discussion on a weekly basis over the last several months.  Going
> forward let's keep things on the ML so we can avoid confusion and
> frustration for all parties.
> 
> With that said - I think Blake made a really good point here and it's
> helped me understand the scope of what's being built better.  Looking at it
> from a different perspective it doesn't seem like there's as much overlap
> as I had initially thought.  There's the machinery that runs certain tasks
> (what Joey has been working on) and the user facing side of exposing that
> information in management tool.
> 
> I do appreciate (and like) the idea of not trying to boil the ocean, and
> working on things incrementally.  Putting a thin layer on top of Cassandra
> that can perform cluster wide tasks does give us an opportunity to move in
> the direction of a general purpose user-facing admin tool without
> committing to trying to write the full stack all at once (or even make
> decisions on it now).  We do need a sensible way of doing rolling restarts
> / scrubs / scheduling and Reaper wasn't built for that, and even though we
> can add it I'm not sure if it's the best mechanism for the long term.
> 
> So if your goal is to add maturity to the project by making cluster wide
> tasks easier by providing a framework to build on top of, I'm in favor of
> that and I don't see it as antithetical to what I had in mind with Reaper.
> Rather, the two are more complementary than I had originally realized.
> 
> Jon
> 
> 
> 
> 
> On Thu, Sep 13, 2018 at 10:39 AM dinesh.jo...@yahoo.com.INVALID
> mailto:dinesh.jo...@yahoo.com>.invalid> wrote:
> 
> > I have a few clarifications -
> > The scope of the management process is not to simply run repair
> > scheduling. Repair scheduling is one of the many features we could
> > implement or adopt from existing sources. So could we please split the
> > Management Process discussion and the repair scheduling?
> > After re-reading the management process proposal, I see we missed to
> > communicate a basic idea in the document. We wanted to take a pluggable
> > approach to various activities that the management process could perform.
> > This could accommodate different implementations of common activities such
> > as repair. The management process would provide the basic framework and it
> > would have default implementations for some of the basic activities. This
> > would allow for speedier iteration cycles and keep things extensible.
> > Turning to some questions that Jon and others have raised, when I +1, my
> > intention is to fully contribute and stay with this community. That said,
> > things feel rushed for some but for me it feels like analysis paralysis.
> > We're looking for actionable feedback and to discuss the management process
> > _not_ repair scheduling solutions.
> > Thanks,
> > Dinesh
> >
> >
> >
> > On Sep 12, 2018, at 6:24 PM, sankalp kohli  > > wrote:
> > Here is a list of open discussion points from the voting thread. I think
> > some are already answered but I will still gather these questions here.
> >
> > From several people:
> > 1. Vote is rushed and we need more time for discussion.
> >
> > From Sylvain
> > 2. About the voting process...I think that was addressed by Jeff Jirsa and
> > deserves a separate 

Re: Measuring Release Quality

2018-09-21 Thread Scott Andreas
Josh, thanks for reading and sharing feedback. Agreed with starting simple and 
measuring inputs that are high-signal; that’s a good place to begin.

To the challenge of building consensus, point taken + agreed. Perhaps the 
distinction is between producing something that’s “useful” vs. something that’s 
“authoritative” for decisionmaking purposes. My motivation is to work toward 
something “useful” (as measured by the value contributors find). I’d be happy 
to start putting some of these together as part of an experiment – and agreed 
on evaluating “value relative to cost” after we see how things play out.

To Benedict’s point on JIRA, agreed that plotting a value from messy input 
wouldn’t produce useful output. Some questions a small working group might take 
on toward better categorization might look like:

–––
– Revisiting the list of components: e.g., “Core” captures a lot right now.
– Revisiting which fields should be required when filing a ticket – and if 
there are any that should be removed from the form.
– Reviewing active labels: understanding what people have been trying to 
capture, and how they could be organized + documented better.
– Documenting “priority”: (e.g., a common standard we can point to, even if 
we’re pretty good now).
– Considering adding a "severity” field to capture the distinction between 
priority and severity.
–––

If there’s appetite for spending a little time on this, I’d put effort toward 
it if others are interested; is anyone?

Otherwise, I’m equally fine with an experiment to measure basics via the 
current structure as Josh mentioned, too.

– Scott


On September 20, 2018 at 8:22:55 AM, Benedict Elliott Smith 
(bened...@apache.org) wrote:

I think it would be great to start getting some high quality info out of JIRA, 
but I think we need to clean up and standardise how we use it to facilitate 
this.

Take the Component field as an example. This is the current list of options:

4.0
Auth
Build
Compaction
Configuration
Core
CQL
Distributed Metadata
Documentation and Website
Hints
Libraries
Lifecycle
Local Write-Read Paths
Materialized Views
Metrics
Observability
Packaging
Repair
SASI
Secondary Indexes
Streaming and Messaging
Stress
Testing
Tools

In some cases there's duplication (Metrics + Observability, Coordination 
(=“Storage Proxy, Hints, Batchlog, Counters…") + Hints, Local Write-Read Paths 
+ Core)
In others, there’s a lack of granularity (Streaming + Messaging, Core, 
Coordination, Distributed Metadata)
In others, there’s a lack of clarity (Core, Lifecycle, Coordination)
Others are probably missing entirely (Transient Replication, …?)

Labels are also used fairly haphazardly, and there’s no clear definition of 
“priority”

Perhaps we should form a working group to suggest a methodology for filling out 
JIRA, standardise the necessary components, labels etc, and put together a wiki 
page with step-by-step instructions on how to do it?


> On 20 Sep 2018, at 15:29, Joshua McKenzie  wrote:
>
> I've spent a good bit of time thinking about the above and bounced off both
> different ways to measure quality and progress as well as trying to
> influence community behavior on this topic. My advice: start small and
> simple (KISS, YAGNI, all that). Get metrics for pass/fail on
> utest/dtest/flakiness over time, perhaps also aggregate bug count by
> component over time. After spending a predetermined time doing that (a
> couple months?) as an experiment, we retrospect as a project and see if
> these efforts are adding value commensurate with the time investment
> required to perform the measurement and analysis.
>
> There's a lot of really good ideas in that linked wiki article / this email
> thread. The biggest challenge, and risk of failure, is in translating good
> ideas into action and selling project participants on the value of changing
> their behavior. The latter is where we've fallen short over the years;
> building consensus (especially regarding process /shudder) is Very Hard.
>
> Also - thanks for spearheading this discussion Scott. It's one we come back
> to with some regularity so there's real pain and opportunity here for the
> project imo.
>
> On Wed, Sep 19, 2018 at 9:32 PM Scott Andreas  wrote:
>
>> Hi everyone,
>>
>> Now that many teams have begun testing and validating Apache Cassandra
>> 4.0, it’s useful to think about what “progress” looks like. While metrics
>> alone may not tell us what “done” means, they do help us answer the
>> question, “are we getting better or worse — and how quickly”?
>>
>> A friend described to me a few attributes of metrics he considered useful,
>> suggesting that good metrics are actionable, visible, predictive, and
>> consequent:
>>
>> – Actionable: We know what to do based on them – where to invest, what to
>> fix, what’s fine, etc.
>> – Visible: Everyone who has a stake in a metric has full visibility into
>> it and participates in its definition.
>> – Predictive: Good metrics enable 

Re: QA signup

2018-09-21 Thread Dinesh Joshi
I favor versioned nightlies for testing so everyone is using the exact binary 
distribution.

As far as actually building the packages go, I would prefer a Docker based 
solution like Jon mentioned. It provides a controlled, reproducible, clean room 
environment. Ideally the build script should ensure that the git branch is 
clean and that there aren't any local changes if the packages are being 
published to maven.

Does anyone see a need to publish the git branch metadata in the build like the 
git-sha, branch and repo url? I am not sure if this is already captured 
somewhere. Its useful to trace a build's provenance.

Dinesh

> On Sep 20, 2018, at 2:26 PM, Jonathan Haddad  wrote:
> 
> Sure - I'm not disagreeing with you that pre-built packages would be nice
> to have.  That said, if someone's gone through the trouble of building an
> entire testing infrastructure and has hundreds of machines available,
> running `docker-compose up build-deb` is likely not a major issue.  If I'm
> trying to decide between solving the 2 problems I'd prefer to make builds
> easier as very few people actually know how to do it.  I'm also biased
> because I'm working on a tool that does _exactly_ that (build arbitrary C*
> debs and deploy them to AWS for perf testing with tlp-stress which we've
> already open sourced https://github.com/thelastpickle/tlp-stress).
> 
> I'll building it for internal TLP use but there's not much TLP specific
> stuff, we'll be open sourcing it as soon as we can.
> 
> TL;DR: we need both things
> 
> On Thu, Sep 20, 2018 at 2:12 PM Scott Andreas  wrote:
> 
>> Mick – Got it, thanks and sorry to have misunderstood. No fault in your
>> writing at all; that was my misreading.
>> 
>> Agreed with you and Kurt; I can’t think of a pressing need or immediate
>> use for the Maven artifacts. As you mentioned, all of the examples I’d
>> listed require binary artifacts only.
>> 
>> Re: Jon’s question:
>>> It seems to me that improving / simplifying the process of building the
>> packages might solve this problem better.
>> 
>> Agreed that making builds easy is important, and that manually-applied
>> patches were involved in a couple cases I’d cited. My main motivation is
>> toward making it easier for developers who’d like to produce
>> fully-automated test pipelines to do so using common artifacts, rather than
>> each replicating the build/packaging step for tarball artifacts themselves.
>> 
>> Publishing binary artifacts in a common location would enable developers
>> to configure testing and benchmarking pipelines to pick up those artifacts
>> on a daily basis without intervention. In the case of a build landing DOA
>> due to an issue with a commit, it’d be enough for zero-touch automation to
>> pick up a new build with the fix the following day and run an extended
>> suite across a large number of machines and publish results, for example.
>> 
>> 
>> On September 19, 2018 at 8:17:05 PM, kurt greaves (k...@instaclustr.com
>> ) wrote:
>> 
>> It's pretty much only third party plugins. I need it for the LDAP
>> authenticator, and StratIO's lucene plugin will also need it. I know there
>> are users out there with their own custom plugins that would benefit from
>> it as well (and various other open source projects). It would make it
>> easier, however it certainly is feasible for these devs to just build the
>> jars themselves (and I've done this so far). If it's going to be easy I
>> think there's value in generating and hosting nightly jars, but if it's
>> difficult I can just write some docs for DIY.
>> 
>> On Thu, 20 Sep 2018 at 12:20, Mick Semb Wever  wrote:
>> 
>>> Sorry about the terrible english in my last email.
>>> 
>>> 
 On the target audience:
 
 [snip]
 For developers building automation around testing and
 validation, it’d be great to have a common build to work from rather
 than each developer producing these builds themselves.
>>> 
>>> 
>>> Sure. My question was only in context of maven artefacts.
>>> It seems to me all the use-cases you highlight would be for the binary
>>> artefacts.
>>> 
>>> If that's the case we don't need to worry about publishing snapshots
>> maven
>>> artefacts, and can just focus on uploading nightly builds to
>>> https://dist.apache.org/repos/dist/dev/cassandra/
>>> 
>>> Or is there a use-case I'm missing that needs the maven artefacts?
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
>> 
> 
> 
> -- 
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Proposing an Apache Cassandra Management process

2018-09-21 Thread Dinesh Joshi
I have updated the doc with a short paragraph providing the clarification. 
Sankalp's suggestion is already part of the doc. If there aren't further 
objections could we move this discussion over to the jira (CASSANDRA-14395)?

Dinesh

> On Sep 18, 2018, at 10:31 AM, sankalp kohli  wrote:
> 
> How about we start with a few basic features in side car. How about starting 
> with this
> 1. Bulk nodetool commands: User can curl any sidecar and be able to run a 
> nodetool command in bulk across the cluster. 
> :/bulk/nodetool/tablestats?arg0=keyspace_name.table_name=  required>
> 
> And later 
> 2: Health checks. 
> 
> On Thu, Sep 13, 2018 at 11:34 AM dinesh.jo...@yahoo.com.INVALID 
>  wrote:
> I will update the document to add that point. The document did not mean to 
> serve as a design or architectural document but rather something that would 
> spark a discussion on the idea.
> Dinesh 
> 
> On Thursday, September 13, 2018, 10:59:34 AM PDT, Jonathan Haddad 
> mailto:j...@jonhaddad.com>> wrote:  
> 
>  Most of the discussion and work was done off the mailing list - there's a
> big risk involved when folks disappear for months at a time and resurface
> with big pile of code plus an agenda that you failed to loop everyone in
> on. In addition, by your own words the design document didn't accurately
> describe what was being built.  I don't write this to try to argue about
> it, I just want to put some perspective for those of us that weren't part
> of this discussion on a weekly basis over the last several months.  Going
> forward let's keep things on the ML so we can avoid confusion and
> frustration for all parties.
> 
> With that said - I think Blake made a really good point here and it's
> helped me understand the scope of what's being built better.  Looking at it
> from a different perspective it doesn't seem like there's as much overlap
> as I had initially thought.  There's the machinery that runs certain tasks
> (what Joey has been working on) and the user facing side of exposing that
> information in management tool.
> 
> I do appreciate (and like) the idea of not trying to boil the ocean, and
> working on things incrementally.  Putting a thin layer on top of Cassandra
> that can perform cluster wide tasks does give us an opportunity to move in
> the direction of a general purpose user-facing admin tool without
> committing to trying to write the full stack all at once (or even make
> decisions on it now).  We do need a sensible way of doing rolling restarts
> / scrubs / scheduling and Reaper wasn't built for that, and even though we
> can add it I'm not sure if it's the best mechanism for the long term.
> 
> So if your goal is to add maturity to the project by making cluster wide
> tasks easier by providing a framework to build on top of, I'm in favor of
> that and I don't see it as antithetical to what I had in mind with Reaper.
> Rather, the two are more complementary than I had originally realized.
> 
> Jon
> 
> 
> 
> 
> On Thu, Sep 13, 2018 at 10:39 AM dinesh.jo...@yahoo.com.INVALID
> mailto:dinesh.jo...@yahoo.com>.invalid> wrote:
> 
> > I have a few clarifications -
> > The scope of the management process is not to simply run repair
> > scheduling. Repair scheduling is one of the many features we could
> > implement or adopt from existing sources. So could we please split the
> > Management Process discussion and the repair scheduling?
> > After re-reading the management process proposal, I see we missed to
> > communicate a basic idea in the document. We wanted to take a pluggable
> > approach to various activities that the management process could perform.
> > This could accommodate different implementations of common activities such
> > as repair. The management process would provide the basic framework and it
> > would have default implementations for some of the basic activities. This
> > would allow for speedier iteration cycles and keep things extensible.
> > Turning to some questions that Jon and others have raised, when I +1, my
> > intention is to fully contribute and stay with this community. That said,
> > things feel rushed for some but for me it feels like analysis paralysis.
> > We're looking for actionable feedback and to discuss the management process
> > _not_ repair scheduling solutions.
> > Thanks,
> > Dinesh
> >
> >
> >
> > On Sep 12, 2018, at 6:24 PM, sankalp kohli  > > wrote:
> > Here is a list of open discussion points from the voting thread. I think
> > some are already answered but I will still gather these questions here.
> >
> > From several people:
> > 1. Vote is rushed and we need more time for discussion.
> >
> > From Sylvain
> > 2. About the voting process...I think that was addressed by Jeff Jirsa and
> > deserves a separate thread as it is not directly related to this thread.
> > 3. Does the project need a side car.
> >
> > From Jonathan Haddad
> > 4. Are people doing +1 willing to contribute
> >
> > From Jonathan Ellis
> > 5.