Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…

2020-12-15 Thread Mick Semb Wever
>
> Now that we have CASSANDRA-16079 created and work is proceeding, can we
> please commit CASSANDRA-13701?
>


Closing the loop on this.
CASSANDRA-16079 is ready to be committed. It involves a patch to ccm and to
cassandra-dtest, and takes advantage of CASSANDRA-16205. DTest run times
are on par (when run with and without CASSANDRA-13701). A spun off ticket
will be created for some additional dtest performance improvements that
have been identified by taking use of `ring_delay_ms`, but it will not
block 13701.

Speak up if you spot anything you think we may have missed.


Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…

2020-09-01 Thread Jeremy Hanna
Thanks Adam.

Now that we have CASSANDRA-16079 created and work is proceeding, can we please 
commit CASSANDRA-13701?  There are many clusters out there that use the default 
of 256 num_tokens and migrating to something more sane is a lot of work.  It 
would also be very helpful for having reasonable defaults getting tested for 
the release.  It's been a bad default for a long, long time and we need to get 
this in before we do more testing for the release.

> On Aug 28, 2020, at 5:25 AM, Adam Holmberg  wrote:
> 
> After discussing with a few stakeholders it sounds like folks agree that
> optimizing dtest speed is a worthy endeavor. What is less clear are
> concrete things that should be done. Since a brainstorming session failed
> to materialize on CASSANDRA-13701, we thought it would make sense to start
> with an open analysis.
> 
> A ticket[1] has been created for analysis and further follow-up. We welcome
> any concrete ideas people already have in mind.
> 
> Kind regards,
> Adam Holmberg
> 
> [1] https://issues.apache.org/jira/browse/CASSANDRA-16079
> 
> On Mon, Aug 24, 2020 at 12:17 PM Mick Semb Wever  wrote:
> 
>> I believe the speed of our CI runs
>>> is of common interest. What do you think? Does this sound feasible? Who
>> is
>>> in?*
>>> 
>> 
>> 
>> I agree. I'm in. I have some ideas to add to the mix. Thanks Ekaterina.
>> 
> 
> 
> -- 
> Adam Holmberg
> e. adam.holmb...@datastax.com
> w. www.datastax.com


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…

2020-08-27 Thread Adam Holmberg
After discussing with a few stakeholders it sounds like folks agree that
optimizing dtest speed is a worthy endeavor. What is less clear are
concrete things that should be done. Since a brainstorming session failed
to materialize on CASSANDRA-13701, we thought it would make sense to start
with an open analysis.

A ticket[1] has been created for analysis and further follow-up. We welcome
any concrete ideas people already have in mind.

Kind regards,
Adam Holmberg

[1] https://issues.apache.org/jira/browse/CASSANDRA-16079

On Mon, Aug 24, 2020 at 12:17 PM Mick Semb Wever  wrote:

>  I believe the speed of our CI runs
> > is of common interest. What do you think? Does this sound feasible? Who
> is
> > in?*
> >
>
>
> I agree. I'm in. I have some ideas to add to the mix. Thanks Ekaterina.
>


-- 
Adam Holmberg
e. adam.holmb...@datastax.com
w. www.datastax.com


Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…

2020-08-24 Thread Mick Semb Wever
 I believe the speed of our CI runs
> is of common interest. What do you think? Does this sound feasible? Who is
> in?*
>


I agree. I'm in. I have some ideas to add to the mix. Thanks Ekaterina.


Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…

2020-08-24 Thread Ekaterina Dimitrova
*Hello everyone,*
*I was thinking about the latest discussions on the topic. During our last
Cassandra Contributor meeting we agreed it would be good to cover in our
agenda discussions/brainstorming on different topics. So I would like to
suggest to organize a brainstorming session on CASSANDRA-13701 between all
people who are interested on the topic. I believe the speed of our CI runs
is of common interest. What do you think? Does this sound feasible? Who is
in?*
*Best regards,*
*Ekaterina*

On Sat, 22 Aug 2020 at 9:21, Joshua McKenzie  wrote:

> I think what Jordan is exploring, and I agree on, is that we need clear
>
> next steps to help reduce the 75% ish increase in dtest runtime. For
>
> sponsored contributors using circle to run the entire suites, throwing more
>
> money at the problem through parallelization isn't a long-term solution.
>
>
>
> On Sat, Aug 22, 2020 at 8:40 AM Jeremy Hanna 
>
> wrote:
>
>
>
> > I know the dtests take a long time and this will make them longer. As a
>
> > counter point most people run individual dtests locally and the full set
> on
>
> > dedicated test infrastructure. For the dedicated test infrastructure Mick
>
> > also improved the wall clock runtime when parallelizing the dtests on
>
> > https://issues.apache.org/jira/browse/CASSANDRA-16006.
>
> >
>
> > Even with the longer dtest full runtime, I firmly believe that for the
>
> > sake of new users and how hard it is to change num_tokens once data is
>
> > written, this change to the default of num_tokens is long overdue.
> Another
>
> > hidden benefit of this change is that the dtests will now run bootstraps
>
> > the way operators should run them in practice with the new defaults. That
>
> > will make the more common default case much more tested and hopefully
> catch
>
> > regressions in that execution path faster.
>
> >
>
> > So while it is not a trivial change in full dtest runtime, the benefits
> to
>
> > the community and project are also not trivial. I’m really grateful to
> all
>
> > who have put in effort to make this a reality and know that new users in
>
> > 4.0 will benefit from these improved defaults.
>
> >
>
> > In other words my non binding vote is to merge this and look to improve
>
> > execution time separately with that effort not being as urgent for the
>
> > reasons stated above.
>
> >
>
> > Jeremy
>
> >
>
> > > On Aug 20, 2020, at 2:49 AM, Mick Semb Wever  wrote:
>
> > >
>
> > > It was agreed¹ that 4.0 should have the new configuration defaults of
>
> > >  num_tokens: 16
>
> > >  allocate_tokens_for_local_replication_factor: 3
>
> > >
>
> > > 13701's patches: against cassandra, cassandra-builds, cassandra-dtest,
>
> > ccm;
>
> > > are reviewed, tested, and ready to commit. But the ccm and dtest
> patches
>
> > > required ccm having to now start nodes sequentially, and adding some
>
> > longer
>
> > > timeout values in the dtests.
>
> > >
>
> > > The consequence of this is CI runs now take longer. ci-cassandra.a.o's
>
> > > dtests take ~30% longer, and circleci's dtests (with vnodes) have gone
>
> > from
>
> > > ~22 to ~43 minutes. The general opinion (on slack²) is to commit, and
>
> > work
>
> > > on improving ccm and dtest startup times in a subsequent ticket.
>
> > >
>
> > > 13701 was intended to be committed before the first beta release
> because
>
> > of
>
> > > its user-facing changes. But these numbers are significant enough it
>
> > makes
>
> > > sense to touch base with dev@
>
> > >
>
> > > Does anyone (strongly) object to the "commit + follow up ticket"
>
> > approach?
>
> > >
>
> > > regards,
>
> > > Mick
>
> > >
>
> > >
>
> > > ¹ –
>
> > >
>
> >
> https://lists.apache.org/thread.html/ra829084fcf344e9e96fa5c61cb31e909c8629091993471594b65ea89%40%3Cdev.cassandra.apache.org%3E
>
> > > ² – https://the-asf.slack.com/archives/CK23JSY2K/p1597747395032600 and
>
> > >
>
> >
> https://the-asf.slack.com/archives/CK23JSY2K/p1597849774078200?thread_ts=1597762085.048300=CK23JSY2K
>
> >
>
>


Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…

2020-08-22 Thread Joshua McKenzie
I think what Jordan is exploring, and I agree on, is that we need clear
next steps to help reduce the 75% ish increase in dtest runtime. For
sponsored contributors using circle to run the entire suites, throwing more
money at the problem through parallelization isn't a long-term solution.

On Sat, Aug 22, 2020 at 8:40 AM Jeremy Hanna 
wrote:

> I know the dtests take a long time and this will make them longer. As a
> counter point most people run individual dtests locally and the full set on
> dedicated test infrastructure. For the dedicated test infrastructure Mick
> also improved the wall clock runtime when parallelizing the dtests on
> https://issues.apache.org/jira/browse/CASSANDRA-16006.
>
> Even with the longer dtest full runtime, I firmly believe that for the
> sake of new users and how hard it is to change num_tokens once data is
> written, this change to the default of num_tokens is long overdue. Another
> hidden benefit of this change is that the dtests will now run bootstraps
> the way operators should run them in practice with the new defaults. That
> will make the more common default case much more tested and hopefully catch
> regressions in that execution path faster.
>
> So while it is not a trivial change in full dtest runtime, the benefits to
> the community and project are also not trivial. I’m really grateful to all
> who have put in effort to make this a reality and know that new users in
> 4.0 will benefit from these improved defaults.
>
> In other words my non binding vote is to merge this and look to improve
> execution time separately with that effort not being as urgent for the
> reasons stated above.
>
> Jeremy
>
> > On Aug 20, 2020, at 2:49 AM, Mick Semb Wever  wrote:
> >
> > It was agreed¹ that 4.0 should have the new configuration defaults of
> >  num_tokens: 16
> >  allocate_tokens_for_local_replication_factor: 3
> >
> > 13701's patches: against cassandra, cassandra-builds, cassandra-dtest,
> ccm;
> > are reviewed, tested, and ready to commit. But the ccm and dtest patches
> > required ccm having to now start nodes sequentially, and adding some
> longer
> > timeout values in the dtests.
> >
> > The consequence of this is CI runs now take longer. ci-cassandra.a.o's
> > dtests take ~30% longer, and circleci's dtests (with vnodes) have gone
> from
> > ~22 to ~43 minutes. The general opinion (on slack²) is to commit, and
> work
> > on improving ccm and dtest startup times in a subsequent ticket.
> >
> > 13701 was intended to be committed before the first beta release because
> of
> > its user-facing changes. But these numbers are significant enough it
> makes
> > sense to touch base with dev@
> >
> > Does anyone (strongly) object to the "commit + follow up ticket"
> approach?
> >
> > regards,
> > Mick
> >
> >
> > ¹ –
> >
> https://lists.apache.org/thread.html/ra829084fcf344e9e96fa5c61cb31e909c8629091993471594b65ea89%40%3Cdev.cassandra.apache.org%3E
> > ² – https://the-asf.slack.com/archives/CK23JSY2K/p1597747395032600 and
> >
> https://the-asf.slack.com/archives/CK23JSY2K/p1597849774078200?thread_ts=1597762085.048300=CK23JSY2K
>


Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…

2020-08-22 Thread Jeremy Hanna
I know the dtests take a long time and this will make them longer. As a counter 
point most people run individual dtests locally and the full set on dedicated 
test infrastructure. For the dedicated test infrastructure Mick also improved 
the wall clock runtime when parallelizing the dtests on 
https://issues.apache.org/jira/browse/CASSANDRA-16006. 

Even with the longer dtest full runtime, I firmly believe that for the sake of 
new users and how hard it is to change num_tokens once data is written, this 
change to the default of num_tokens is long overdue. Another hidden benefit of 
this change is that the dtests will now run bootstraps the way operators should 
run them in practice with the new defaults. That will make the more common 
default case much more tested and hopefully catch regressions in that execution 
path faster.

So while it is not a trivial change in full dtest runtime, the benefits to the 
community and project are also not trivial. I’m really grateful to all who have 
put in effort to make this a reality and know that new users in 4.0 will 
benefit from these improved defaults.

In other words my non binding vote is to merge this and look to improve 
execution time separately with that effort not being as urgent for the reasons 
stated above.

Jeremy

> On Aug 20, 2020, at 2:49 AM, Mick Semb Wever  wrote:
> 
> It was agreed¹ that 4.0 should have the new configuration defaults of
>  num_tokens: 16
>  allocate_tokens_for_local_replication_factor: 3
> 
> 13701's patches: against cassandra, cassandra-builds, cassandra-dtest, ccm;
> are reviewed, tested, and ready to commit. But the ccm and dtest patches
> required ccm having to now start nodes sequentially, and adding some longer
> timeout values in the dtests.
> 
> The consequence of this is CI runs now take longer. ci-cassandra.a.o's
> dtests take ~30% longer, and circleci's dtests (with vnodes) have gone from
> ~22 to ~43 minutes. The general opinion (on slack²) is to commit, and work
> on improving ccm and dtest startup times in a subsequent ticket.
> 
> 13701 was intended to be committed before the first beta release because of
> its user-facing changes. But these numbers are significant enough it makes
> sense to touch base with dev@
> 
> Does anyone (strongly) object to the "commit + follow up ticket" approach?
> 
> regards,
> Mick
> 
> 
> ¹ –
> https://lists.apache.org/thread.html/ra829084fcf344e9e96fa5c61cb31e909c8629091993471594b65ea89%40%3Cdev.cassandra.apache.org%3E
> ² – https://the-asf.slack.com/archives/CK23JSY2K/p1597747395032600 and
> https://the-asf.slack.com/archives/CK23JSY2K/p1597849774078200?thread_ts=1597762085.048300=CK23JSY2K


Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…

2020-08-20 Thread Jordan West
What sort of commitment is there to the follow-up tickets? Are the
follow-ups "make this faster" or are there specific tasks we know will
help? I'm concerned by the increase in testing run times on circle but
don't think that should prevent a good/decided upon default from merging.

Jordan

On Wed, Aug 19, 2020 at 9:49 AM Mick Semb Wever  wrote:

> It was agreed¹ that 4.0 should have the new configuration defaults of
>   num_tokens: 16
>   allocate_tokens_for_local_replication_factor: 3
>
> 13701's patches: against cassandra, cassandra-builds, cassandra-dtest, ccm;
> are reviewed, tested, and ready to commit. But the ccm and dtest patches
> required ccm having to now start nodes sequentially, and adding some longer
> timeout values in the dtests.
>
> The consequence of this is CI runs now take longer. ci-cassandra.a.o's
> dtests take ~30% longer, and circleci's dtests (with vnodes) have gone from
> ~22 to ~43 minutes. The general opinion (on slack²) is to commit, and work
> on improving ccm and dtest startup times in a subsequent ticket.
>
> 13701 was intended to be committed before the first beta release because of
> its user-facing changes. But these numbers are significant enough it makes
> sense to touch base with dev@
>
> Does anyone (strongly) object to the "commit + follow up ticket" approach?
>
> regards,
> Mick
>
>
> ¹ –
>
> https://lists.apache.org/thread.html/ra829084fcf344e9e96fa5c61cb31e909c8629091993471594b65ea89%40%3Cdev.cassandra.apache.org%3E
> ² – https://the-asf.slack.com/archives/CK23JSY2K/p1597747395032600 and
>
> https://the-asf.slack.com/archives/CK23JSY2K/p1597849774078200?thread_ts=1597762085.048300=CK23JSY2K
>


Re: Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…

2020-08-19 Thread Anthony Grasso
Hi Mick,

No objections from me. It will be good to get this change into the 4.0
release.

Whilst the slow down of the dtests is annoying, I am happy to see this
change committed. As long as there are no regressions in the number of
tests that pass then it should be fine. The proposal to raise a follow up
ticket to accompany the commit is a good idea.

Regards,
Anthony

On Thu, 20 Aug 2020 at 02:49, Mick Semb Wever  wrote:

> It was agreed¹ that 4.0 should have the new configuration defaults of
>   num_tokens: 16
>   allocate_tokens_for_local_replication_factor: 3
>
> 13701's patches: against cassandra, cassandra-builds, cassandra-dtest, ccm;
> are reviewed, tested, and ready to commit. But the ccm and dtest patches
> required ccm having to now start nodes sequentially, and adding some longer
> timeout values in the dtests.
>
> The consequence of this is CI runs now take longer. ci-cassandra.a.o's
> dtests take ~30% longer, and circleci's dtests (with vnodes) have gone from
> ~22 to ~43 minutes. The general opinion (on slack²) is to commit, and work
> on improving ccm and dtest startup times in a subsequent ticket.
>
> 13701 was intended to be committed before the first beta release because of
> its user-facing changes. But these numbers are significant enough it makes
> sense to touch base with dev@
>
> Does anyone (strongly) object to the "commit + follow up ticket" approach?
>
> regards,
> Mick
>
>
> ¹ –
>
> https://lists.apache.org/thread.html/ra829084fcf344e9e96fa5c61cb31e909c8629091993471594b65ea89%40%3Cdev.cassandra.apache.org%3E
> ² – https://the-asf.slack.com/archives/CK23JSY2K/p1597747395032600 and
>
> https://the-asf.slack.com/archives/CK23JSY2K/p1597849774078200?thread_ts=1597762085.048300=CK23JSY2K
>


Committing `CASSANDRA-13701 Lower default num_tokens` and the dtest slowdown…

2020-08-19 Thread Mick Semb Wever
It was agreed¹ that 4.0 should have the new configuration defaults of
  num_tokens: 16
  allocate_tokens_for_local_replication_factor: 3

13701's patches: against cassandra, cassandra-builds, cassandra-dtest, ccm;
are reviewed, tested, and ready to commit. But the ccm and dtest patches
required ccm having to now start nodes sequentially, and adding some longer
timeout values in the dtests.

The consequence of this is CI runs now take longer. ci-cassandra.a.o's
dtests take ~30% longer, and circleci's dtests (with vnodes) have gone from
~22 to ~43 minutes. The general opinion (on slack²) is to commit, and work
on improving ccm and dtest startup times in a subsequent ticket.

13701 was intended to be committed before the first beta release because of
its user-facing changes. But these numbers are significant enough it makes
sense to touch base with dev@

Does anyone (strongly) object to the "commit + follow up ticket" approach?

regards,
Mick


¹ –
https://lists.apache.org/thread.html/ra829084fcf344e9e96fa5c61cb31e909c8629091993471594b65ea89%40%3Cdev.cassandra.apache.org%3E
² – https://the-asf.slack.com/archives/CK23JSY2K/p1597747395032600 and
https://the-asf.slack.com/archives/CK23JSY2K/p1597849774078200?thread_ts=1597762085.048300=CK23JSY2K