Re: [DISCUSSION] CEP-38: CQL Management API

2024-01-08 Thread Benedict Elliott Smith
Syntactically, if we’re updating settings like compaction throughput, I would 
prefer to simply update a virtual settings table

e.g. UPDATE system.settings SET compaction_throughput = 128

Some operations will no doubt require a stored procedure syntax, but perhaps it 
would be a good idea to split the work into two: one part to address settings 
like those above, and another for maintenance operations such as triggering 
major compactions, repair and the like?

I would like to see us move to decentralised structured settings management at 
the same time, so that we can set properties for the whole cluster, or data 
centres, or individual nodes via the same mechanism - all from any node in the 
cluster. I would be happy to help out with this work, if time permits.


> On 8 Jan 2024, at 11:42, Josh McKenzie  wrote:
> 
>> Fundamentally, I think it's better for the project if administration is 
>> fully done over CQL and we have a consistent, single way of doing things. 
> Strongly agree here. With 2 caveats:
> Supporting backwards compat, especially for automated ops (i.e. nodetool, 
> JMX, etc), is crucial. Painful, but crucial.
> We need something that's available for use before the node comes fully 
> online; the point Jeff always brings up when we discuss moving away from JMX. 
> So long as we have some kind of "out-of-band" access to nodes or 
> accommodation for that, we should be good.
> For context on point 2, see slack: 
> https://the-asf.slack.com/archives/CK23JSY2K/p1688745128122749?thread_ts=1688662169.018449=CK23JSY2K
> 
>> I point out that JMX works before and after the native protocol is running 
>> (startup, shutdown, joining, leaving), and also it's semi-common for us to 
>> disable the native protocol in certain circumstances, so at the very least, 
>> we'd then need to implement a totally different cql protocol interface just 
>> for administration, which nobody has committed to building yet.
> 
> I think this is a solvable problem, and I think the benefits of having a 
> single, elegant way of interacting with a cluster and configuring it 
> justifies the investment for us as a project. Assuming someone has the cycles 
> to, you know, actually do the work. :D
> 
> On Sun, Jan 7, 2024, at 10:41 PM, Jon Haddad wrote:
>> I like the idea of the ability to execute certain commands via CQL, but I 
>> think it only makes sense for the nodetool commands that cause an action to 
>> take place, such as compact or repair.  We already have virtual tables, I 
>> don't think we need another layer to run informational queries.  I see 
>> little value in having the following (I'm using exec here for simplicity):
>> 
>> cqlsh> exec tpstats
>> 
>> which returns a string in addition to:
>> 
>> cqlsh> select * from system_views.thread_pools
>> 
>> which returns structured data.  
>> 
>> I'd also rather see updatable configuration virtual tables instead of
>> 
>> cqlsh> exec setcompactionthroughput 128
>> 
>> Fundamentally, I think it's better for the project if administration is 
>> fully done over CQL and we have a consistent, single way of doing things.  
>> I'm not dead set on it, I just think less is more in a lot of situations, 
>> this being one of them.  
>> 
>> Jon
>> 
>> 
>> On Wed, Jan 3, 2024 at 2:56 PM Maxim Muzafarov > > wrote:
>> Happy New Year to everyone! I'd like to thank everyone for their
>> questions, because answering them forces us to move towards the right
>> solution, and I also like the ML discussions for the time they give to
>> investigate the code :-)
>> 
>> I'm deliberately trying to limit the scope of the initial solution
>> (e.g. exclude the agent part) to keep the discussion short and clear,
>> but it's also important to have a glimpse of what we can do next once
>> we've finished with the topic.
>> 
>> My view of the Command<> is that it is an abstraction in the broader
>> sense of an operation that can be performed on the local node,
>> involving one of a few internal components. This means that updating a
>> property in the settings virtual table via an update statement, or
>> executing e.g. the setconcurrentcompactors command are just aliases of
>> the same internal command via different APIs. Another example is the
>> netstats command, which simply aggregates the MessageService metrics
>> and returns them in a human-readable format (just another way of
>> looking at key-value metric pairs). More broadly, the command input is
>> Map and String as the result (or List).
>> 
>> As Abe mentioned, Command and CommandRegistry should be largely based
>> on the nodetool command set at the beginning. We have a few options
>> for how we can initially construct command metadata during the
>> registry implementation (when moving command metadata from the
>> nodetool to the core part), so I'm planning to consult with the
>> command representations of the k8cassandra project in the way of any
>> further registry adoptions have zero problems (by writing a test

Re: [DISCUSS] New data type for vector search

2023-04-26 Thread Benedict Elliott Smith
I think we need to briefly step back and think about what the syntax means and how it fits into existing syntax.It seems that the dimensionality verbiage assumes we’re logically introducing N vector fields, so that each row adopts a value for all of the vector fields or none. But in practice we are actually introducing a fixed-length frozen list in Cassandra terms, and our API treats this as a per-row array/vector rather than a number of column vectors.My inclination then would be to say you declare an ARRAY (which is semantic sugar for FROZEN>). This is very consistent with our existing style. We then simply permit such columns to define ANN indexes.Otherwise, I think we should lean into the idea that this is a set of N vectors, as “dimensions" makes limited sense when discussing an array length. In this case I would lean towards declaring e.g. 1500 FLOAT VECTORS, maybe. But then I think we should reconsider our presentation a little, and perhaps the result set should treat each vector as a separate field (or something like this).On 26 Apr 2023, at 15:31, Jonathan Ellis  wrote:Hi all,Splitting this out per the suggestion in the initial VS thread so we can work on driver support in parallel with the server-side changes.I propose adding a new data type for vector search indexes:FLOAT VECTOR[N_DIMENSIONS]In the initial commits and thread, this was DENSE FLOAT32. Nobody really loved that, so we considered a bunch of alternatives, including- `FLOAT[N]`: This minimal option resembles C and Java array syntax, which would make it familiar for many users. However, this syntax raises the question of why arrays cannot be created for other types.  Additionally, the expectation for an array is to provide random access to its contents, which is not supported for vectors.- `DENSE FLOAT[N]`: This option clarifies that we are supporting dense vectors, not sparse ones. However, since Lucene had sparse vector support in the past but removed it for lack of compelling use cases, it is unlikely that it will be added back, making the "DENSE" qualifier less relevant.- `DENSE FLOAT VECTOR[N]`: This is the most verbose option and aligns with the CQL/SQL spirit. However, the "DENSE" qualifier is unnecessary for the reasons mentioned above.- `VECTOR FLOAT[N]`: This option omits the "DENSE" qualifier, but has a less natural word order.`VECTOR`: This follows the syntax of our Collections, but again this would imply that random access is supported, which we want to avoid doing.- `VECTOR[N]`: This syntax is not very clear about the vector's contents and could make it difficult to add other vector types, such as byte vectors (already supported by Lucene), in the future.Finally, the original qualifier of 32 in `FLOAT32` was intended to allow consistency if we add other float types like FLOAT16 or FLOAT64, both of which are sometimes used in ML. However, we already have a CQL data type for a 64-bit float (`DOUBLE`), so it would make more sense to add future variants (which remain hypothetical at this point) along that line instead.Thus, we believe that `FLOAT VECTOR[N_DIMENSIONS]` provides the best balance of clarity, conciseness, and extensibility. It is more natural in its word order than the original proposal and avoids unnecessary qualifiers, while still being clear about the data type it represents. Finally, this syntax is straighforwardly extensible should we choose to support other vector types in the future.-- Jonathan Ellisco-founder, http://www.datastax.com@spyced


Re: [DISCUSS] API modifications and when to raise a thread on the dev ML

2023-02-02 Thread Benedict Elliott Smith
Closing the loop on seeking consensus for UX/UI/API changes, I see a few 
options. Can we rank choice vote please?

A - Jira suffices
B - Post a DISCUSS API thread prior to making changes
C - Periodically publish a list of API changes for retrospective consideration 
by the community

Points raised in the discussion included: lowering the bar for participation 
and avoiding unnecessary burden to developers.

I vote B (only) because I think broader participation is most important for 
these topics.


> On 7 Dec 2022, at 15:43, Mick Semb Wever  wrote:
> 
>> I think it makes sense to look into improving visibility of API changes, so 
>> people can more easily review a summary of API changes versus reading 
>> through the whole changelog (perhaps we need a summarized API change log?).
> 
> 
> Agree Paulo.
> 
> Observers should be able to see all API changes early. We can do better than 
> telling downstream users/devs "you have to listen to all jira tickets" or 
> "you have to watch the code and pick up changes". Watching CHANGES.txt or 
> NEWS.txt or CEPs doesn't solve the need either. 
> 
> Observing such changes as early as possible can save a significant amount of 
> effort and headache later on, and should be encouraged. If done correctly I 
> can imagine it will help welcome more contributors.
> 
> I can also see that we can improve at, and have a better shared understanding 
> of, categorising the types of API changes: 
> addition/change/deprecation/removal, signature/output/behavioural, API/SPI. 
> So I can see value here for both observers and for ourselves.
> 



Re: CASSANDRA-14227 removing the 2038 limit

2022-09-29 Thread Benedict Elliott Smith
My only slight concern with this approach is the additional memory pressure. 
Since 64yrs should be plenty at any moment in time, I wonder if it wouldn’t be 
better to represent these times as deltas from the nowInSec being used to 
process the query. So, long math would only be used to normalise the times to 
this nowInSec (from whatever is stored in the sstable) within a method, and 
ints would be stored in memtables and any objects used for processing.

This might admittedly be more work, but I don’t believe it should be too 
challenging - we can introduce a method deletionTime(int nowInSec) that returns 
a long value by adding nowInSec to the deletionTime, and make the underlying 
value private, refactoring call sites?

> On 29 Sep 2022, at 09:37, Berenguer Blasi  wrote:
> 
> Hi all,
> 
> I have taken a stab in a PR you can find attached in the ticket. Mainly:
> 
> - I have moved deletion times, gc and nowInSec timestamps to long. That 
> should get us past the 2038 limit.
> 
> - TTL is maxed now to 68y. Think CQL API compatibility and a sort of a 'free' 
> guardrail.
> 
> - A new NONE overflow policy is the default but everything is backwards 
> compatible by keeping the previous ones in place. Think upgrade scenarios or 
> apps relying on the previous behavior.
> 
> - The new limit is around year 292,471,208,677 which sounds ok given the Sun 
> will start collapsing in 3 to 5 billion years :-)
> 
> - Please feel free to drop by the ticket and take a look at the PR even if 
> it's cursory
> 
> Thx in advance.



Re: CEP-15 multi key transaction syntax

2022-08-15 Thread Benedict Elliott Smith

> I like Benedict's tuple deconstruction idea

For posterity, this was Avi’s idea!

> On 15 Aug 2022, at 18:59, Caleb Rackliffe  wrote:
> 
> Monday Morning Caleb has digested, and here's where I am...
> 
> 1.) I have no problem w/ having SELECT on the RHS of a LET assignment, and to 
> be honest, this may make some implementation things easier for me (i.e. the 
> encapsulation of SELECT within LET)
> 2.) I'm in favor of LET without a select, although I have no strong feeling 
> that it needs to be in v1.
> 3.) I like Benedict's tuple deconstruction idea, as it restores some of the 
> notational convenience of the previous proposal. Again, though, I don't have 
> a strong feeling this needs to be in v1.
> 3.b.) When we do implement tuple deconstruction, I'd be in favor of 
> supporting a single level of deconstruction to begin with.
> 
> Having said all that, on Friday I finished a prototype (based on some of 
> Blake's previous work) of the syntax/grammar we've more or less agreed upon 
> here, including an implementation of what I described as option #5 above: 
> https://github.com/maedhroz/cassandra/commits/CASSANDRA-17719-prototype
> 
> To look at specific examples, see these tests: 
> https://github.com/maedhroz/cassandra/blob/CASSANDRA-17719-prototype/test/distributed/org/apache/cassandra/distributed/test/accord/AccordIntegrationTest.java
> 
> There are only two things that aren't yet congruent w/ our discussion above, 
> but they should both be trivial to fix:
> 
> 1.) I'm still using EXISTS/NOT EXISTS instead of IS NOT NULL/IS NULL.
> 2.) I don't require SELECT on the RHS of LET yet.
> 
> If I were to just fix those two items, would we be in agreement on this being 
> both the core of the syntax we want and compatible w/ the wish list for 
> future items?
> 
> 
> On Sun, Aug 14, 2022 at 12:25 PM Benedict Elliott Smith  
> wrote:
>> 
>> 
>>> 
>>> Verbose version:
>>> LET (a) = SELECT val FROM table
>>> IF a > 1 THEN...
>>> 
>>> Less verbose version:
>>> LET a = SELECT val FROM table
>>> IF a.val > 1 THEN...
>> 
>> 
>> My intention is that these are actually two different ways of expressing the 
>> same thing, both supported and neither intended to be more or less verbose 
>> than the other. The advantage of permitting both is that you can also write
>> 
>> LET a = SELECT val FROM table
>> IF a IS NOT NULL AND a.val IS NULL THEN …
>> 
>>> Alternatively, for non-queries:
>>> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
>>> or less verbose:
>>> LET x = (someFunc() AS v1, someOtherFunc() as v2)
>>> LET (v1, v2) = (someFunc(), someOtherFunc())
>> 
>> I personally prefer clarity over any arbitrary verbosity/succinct 
>> distinction, but we’re in general “taste” territory here. Since this syntax 
>> includes the SELECT on the RHS, it makes sense to only require this for 
>> situations where a query is being performed. Though I think if SELECT 
>> without a FROM is supported then we will likely end up supporting all of the 
>> above.
>> 
>>> Weighing in on the "SELECT without a FROM," I think that is fine and, as 
>>> Avi stated
>> 
>> Yep, definitely fine. Question is just whether we bother to offer it. Also, 
>> evidently, whether we support LET without a SELECT on the RHS. I am strongly 
>> in favour of this, as requiring a SELECT even when there’s no table involved 
>> is counter-intuitive to me, as LET is now a distinct concept that looks like 
>> variable declaration in other languages.
>> 
>>> Nested:
>>> LET (x, y) = SELECT x, y FROM…
>> 
>> Deconstruction here refers to the above, i.e. extracting variables x and y 
>> from the tuple on the RHS
>> 
>> Nesting is just a question of whether we support either nested tuple 
>> declarations, or nested deconstruction, which might include any of the 
>> following:
>> 
>> LET (x, (y, z)) = SELECT (x, (y, z)) FROM…
>> LET (x, (y, z)) = SELECT x, someTuple FROM…
>> LET (x, (y, z)) = (SELECT x FROM.., SELECT y, x FROM…))
>> LET (x, (y, z)) = (someFunc(), SELECT y, z FROM…)
>> LET (x, yAndZ) = (someFunc(), SELECT y, z FROM…)
>> 
>> IMO, once you start supporting features they need to be sort of intuitively 
>> discoverable by users, so that a concept can be used in all places you might 
>> expect.
>> 
>> But I would be fine with an arbitrary restriction of at most one SELECT on 
>> the RHS, or even ONLY a SELECT or some other tuple, and at most one level of 
>> deconstruction of the RHS.
>

Re: CEP-15 multi key transaction syntax

2022-08-14 Thread Benedict Elliott Smith


> 
> Verbose version:
> LET (a) = SELECT val FROM table
> IF a > 1 THEN...
> 
> Less verbose version:
> LET a = SELECT val FROM table
> IF a.val > 1 THEN...


My intention is that these are actually two different ways of expressing the 
same thing, both supported and neither intended to be more or less verbose than 
the other. The advantage of permitting both is that you can also write

LET a = SELECT val FROM table
IF a IS NOT NULL AND a.val IS NULL THEN …

> Alternatively, for non-queries:
> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
> or less verbose:
> LET x = (someFunc() AS v1, someOtherFunc() as v2)
> LET (v1, v2) = (someFunc(), someOtherFunc())

I personally prefer clarity over any arbitrary verbosity/succinct distinction, 
but we’re in general “taste” territory here. Since this syntax includes the 
SELECT on the RHS, it makes sense to only require this for situations where a 
query is being performed. Though I think if SELECT without a FROM is supported 
then we will likely end up supporting all of the above.

> Weighing in on the "SELECT without a FROM," I think that is fine and, as Avi 
> stated

Yep, definitely fine. Question is just whether we bother to offer it. Also, 
evidently, whether we support LET without a SELECT on the RHS. I am strongly in 
favour of this, as requiring a SELECT even when there’s no table involved is 
counter-intuitive to me, as LET is now a distinct concept that looks like 
variable declaration in other languages.

> Nested:
> LET (x, y) = SELECT x, y FROM…

Deconstruction here refers to the above, i.e. extracting variables x and y from 
the tuple on the RHS

Nesting is just a question of whether we support either nested tuple 
declarations, or nested deconstruction, which might include any of the 
following:

LET (x, (y, z)) = SELECT (x, (y, z)) FROM…
LET (x, (y, z)) = SELECT x, someTuple FROM…
LET (x, (y, z)) = (SELECT x FROM.., SELECT y, x FROM…))
LET (x, (y, z)) = (someFunc(), SELECT y, z FROM…)
LET (x, yAndZ) = (someFunc(), SELECT y, z FROM…)

IMO, once you start supporting features they need to be sort of intuitively 
discoverable by users, so that a concept can be used in all places you might 
expect.

But I would be fine with an arbitrary restriction of at most one SELECT on the 
RHS, or even ONLY a SELECT or some other tuple, and at most one level of 
deconstruction of the RHS.





> On 14 Aug 2022, at 18:04, Patrick McFadin  wrote:
> 
> Let me just state my bias right up front. For any kind of QL I lean heavily 
> toward verbose and explicit based on their lifecycle. A CQL query will 
> probably need to be understood by the next person looking at it, and a few 
> seconds saved typing isn't worth the potential misunderstanding later.  My 
> opinion is formed by having to be the second person many times.  :D 
> 
> I just want to make sure I have the syntax you are proposing. 
> 
> Verbose version:
> LET (a) = SELECT val FROM table
> IF a > 1 THEN...
> 
> Less verbose version:
> LET a = SELECT val FROM table
> IF a.val > 1 THEN...
> 
> Alternatively, for non-queries:
> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
> or less verbose:
> LET x = (someFunc() AS v1, someOtherFunc() as v2)
> LET (v1, v2) = (someFunc(), someOtherFunc())
> 
> Weighing in on the "SELECT without a FROM," I think that is fine and, as Avi 
> stated, already present in the SQL world. I would prefer that over 'SELECT  
> func() FROM dual;' (Looking at you, Oracle)
> 
> Finally, on the topic of deconstructing SELECT statements instead of nesting. 
> If I understand the argument here, I would favor deconstructing over nesting 
> if there is a choice. I think this is what that choice would look like.
> 
> Deconstructed:
> LET x = SELECT x FROM ...
> LET y = SELECT y FROM ...
> 
> Nested:
> LET (x, y) = ((SELECT x FROM…), (SELECT y FROM))
> 
> I'm trying to summate but let me know if I missed something. I apologize in 
> advance to Monday morning Caleb, who will have to digest this thread. 
> 
> Patrick
> 
> On Sun, Aug 14, 2022 at 9:00 AM Benedict Elliott Smith  
> wrote:
>> 
>>> 
>>> I think SQL dialects require subqueries to be parenthesized (not sure). If 
>>> that's the case I think we should keep the tradition.
>>> 
>> 
>> This isn’t a sub-query though, since LET is not a query. If we permit at 
>> most one SELECT, and do not permit mixing SELECT with constant assignments, 
>> I don’t see why we would require parentheses.
>> 
>>> I see no harm in making FROM optional, as it's recognized by other SQL 
>>> dialects.
>>> 
>>> Absolutely, this just flows naturally from having tuples. There's no 
>>> difference between &qu

Re: CEP-15 multi key transaction syntax

2022-08-14 Thread Benedict Elliott Smith

> 
> I think SQL dialects require subqueries to be parenthesized (not sure). If 
> that's the case I think we should keep the tradition.
> 

This isn’t a sub-query though, since LET is not a query. If we permit at most 
one SELECT, and do not permit mixing SELECT with constant assignments, I don’t 
see why we would require parentheses.

> I see no harm in making FROM optional, as it's recognized by other SQL 
> dialects.
> 
> Absolutely, this just flows naturally from having tuples. There's no 
> difference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple”.

Neither of these things are supported today, and they’re no longer necessary 
with this syntax proposal. The downside of splitting SELECT and LET is that 
there’s no impetus to improve the former. So the question was really whether we 
bother to improve it anyway, not whether or not they would be good improvements 
(I think they obviously are).

> I think this can be safely deferred. Most people would again separate it into 
> separate LETs.
> 
That implies we’ll permit deconstructing a tuple variable in a LET. This makes 
sense to me, but is roughly equivalent to nested deconstruction. It might be 
that v1 we only support deconstructing SELECT statements, but I guess all of 
this is probably up to the implementor.
> I'd add (to the specification) that LETs cannot override a previously defined 
> variable, just to reduce ambiguity.
> 

Yep, this was already agreed way back with the earlier proposal.


> On 14 Aug 2022, at 16:30, Avi Kivity  wrote:
> 
> 
> 
> On 14/08/2022 17.50, Benedict Elliott Smith wrote:
>> 
>> > SELECT and LET incompatible once comparisons become valid selectors
>> 
>> I don’t think this would be ambiguous, as = is required in the LET syntax as 
>> we have to bind the result to a variable name.
>> 
>> But, I like the deconstructed tuple syntax improvement over   
>> “Option 6”. This would also seem to easily support assigning from non-query 
>> statements, such as LET (a, b) = (someFunc(), someOtherFunc(?))
>> 
>> I don’t think it is ideal to depend on relative position in the tuple for 
>> assigning results to a variable name, as it leaves more scope for errors. It 
>> would be nice to have a simple way to deconstruct safely. But, I think this 
>> proposal is good, and I’d be fine with it as an alternative if others 
>> concur. I agree that seeing the SELECT independently may be more easily 
>> recognisable to users.
>> 
>> With this approach there remains the question of how we handle single column 
>> results. I’d be inclined to treat in the following way:
>> 
>> LET (a) = SELECT val FROM table
>> IF a > 1 THEN...
>> 
>> LET a = SELECT val FROM table
>> IF a.val > 1 THEN...
>> 
> 
> I think SQL dialects require subqueries to be parenthesized (not sure). If 
> that's the case I think we should keep the tradition.
> 
> 
> 
>> 
>> There is also the question of whether we support SELECT without a FROM 
>> clause, e.g.
>> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
>> 
>> Or just LET (since they are no longer equivalent)
>> e.g.
>> LET x = (someFunc() AS v1, someOtherFunc() as v2)
>> LET (v1, v2) = (someFunc(), someOtherFunc())
>> 
> 
> I see no harm in making FROM optional, as it's recognized by other SQL 
> dialects.
> 
> 
> 
>> 
>> Also since LET is only binding variables, is there any reason we shouldn’t 
>> support multiple SELECT assignments in a single LET?, e.g.
>> LET (x, y) = ((SELECT x FROM…), (SELECT y FROM))
>> 
> 
> What if an inner select returns a tuple? Would y be a tuple?
> 
> 
> 
> I think this is redundant and atypical enough to not be worth   
> supporting. Most people would use separate LETs.
> 
> 
> 
>> 
>> Also whether we support tuples in SELECT statements anyway, e.g.
>> LET (tuple1, tuple2) = SELECT (a, b), (c, d) FROM..
>> IF tuple1.a > 1 AND tuple2.d > 1…
> 
> Absolutely, this just flows naturally from having tuples. There's no 
> difference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple".
> 
> 
> 
>> 
>> 
>> and whether we support nested deconstruction, e.g.
>> LET (a, b, (c, d)) = SELECT a, b, someTuple FROM..
>> IF a > 1 AND d > 1…
>> 
> 
> I think this can be safely deferred. Most people would again separate it into 
> separate LETs.
> 
> 
> 
> I'd add (to the specification) that LETs cannot override a previously defined 
> variable, just to reduce ambiguity.
> 
> 
> 
>> 
>> 
>> 
>> 
>&

Re: CEP-15 multi key transaction syntax

2022-08-14 Thread Benedict Elliott Smith

> SELECT and LET incompatible once comparisons become valid selectors

I don’t think this would be ambiguous, as = is required in the LET syntax as we 
have to bind the result to a variable name.

But, I like the deconstructed tuple syntax improvement over “Option 6”. This 
would also seem to easily support assigning from non-query statements, such as 
LET (a, b) = (someFunc(), someOtherFunc(?))

I don’t think it is ideal to depend on relative position in the tuple for 
assigning results to a variable name, as it leaves more scope for errors. It 
would be nice to have a simple way to deconstruct safely. But, I think this 
proposal is good, and I’d be fine with it as an alternative if others concur. I 
agree that seeing the SELECT independently may be more easily recognisable to 
users.

With this approach there remains the question of how we handle single column 
results. I’d be inclined to treat in the following way:

LET (a) = SELECT val FROM table
IF a > 1 THEN...

LET a = SELECT val FROM table
IF a.val > 1 THEN...


There is also the question of whether we support SELECT without a FROM clause, 
e.g.
LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2

Or just LET (since they are no longer equivalent)
e.g.
LET x = (someFunc() AS v1, someOtherFunc() as v2)
LET (v1, v2) = (someFunc(), someOtherFunc())


Also since LET is only binding variables, is there any reason we shouldn’t 
support multiple SELECT assignments in a single LET?, e.g.
LET (x, y) = ((SELECT x FROM…), (SELECT y FROM))


Also whether we support tuples in SELECT statements anyway, e.g.
LET (tuple1, tuple2) = SELECT (a, b), (c, d) FROM..
IF tuple1.a > 1 AND tuple2.d > 1…


and whether we support nested deconstruction, e.g.
LET (a, b, (c, d)) = SELECT a, b, someTuple FROM..
IF a > 1 AND d > 1…







> On 14 Aug 2022, at 13:55, Avi Kivity via dev  wrote:
> 
> 
> 
> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>> 
>> I’ll do my best to express with my thinking, as well as how I would explain 
>> the feature to a user.
>> 
>> My mental model for LET statements is that they are simply SELECT statements 
>> where the columns that are selected become variables accessible anywhere in 
>> the scope of the transaction. That is to say, you should be able to run 
>> something like s/LET/SELECT and s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the 
>> columns of a LET statement and produce a valid SELECT statement, and vice 
>> versa. Both should perform identically.
>> 
>> e.g. 
>> SELECT pk AS key, v AS value FROM table 
>> 
>> => 
>> LET key = pk, value = v FROM table
> 
> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL supports 
> selecting comparisons:
> 
> 
> 
> $ psql
> psql (14.3)
> Type "help" for help.
> 
> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>  ?column? | ?column? | ?column? 
> --+--+--
>  f| t| 
> (1 row)
> 
> 
> 
> Using "=" as a syntactic element in LET would make SELECT and LET 
> incompatible once comparisons become valid selectors. Unless they become 
> mandatory (and then you'd write "LET q = a = b" if you wanted to select a 
> comparison).
> 
> 
> 
> I personally prefer the nested query syntax:
> 
> 
> 
> LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
> 
> 
> 
> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is 
> immediately recognizable by everyone as a query, LET is not.
> 
> 
> 
>> 
>> Identical form, identical behaviour. Every statement should be directly 
>> translatable with some simple text manipulation.
>> 
>> We can then make this more powerful for users by simply expanding SELECT 
>> statements, e.g. by permitting them to declare constants and tuples in the 
>> column results. In this scheme LET x = * is simply syntactic sugar for LET x 
>> = (pk, ck, field1, …) This scheme then supports options 2, 4 and 5 all at 
>> once, consistently alongside each other.
>> 
>> Option 6 is in fact very similar, but is strictly less flexible for the user 
>> as they have no way to declare multiple scalar variables without scoping 
>> them inside a tuple.
>> 
>> e.g.
>> LET key = pk, value = v FROM table
>> IF key > 1 AND value > 1 THEN...
>> 
>> =>
>> LET row = SELECT pk AS key, v AS value FROM table
>> IF row.key > 1 AND row.value > 1 THEN…
>> 
>> However, both are expressible in the existing proposal, as if you prefer 
>> this naming scheme you can simply write
>> 
>> LET row = (pk AS key, v AS value) FROM table
>> IF row.key > 1 AND row.value > 1 THEN…

Re: CEP-15 multi key transaction syntax

2022-08-13 Thread Benedict Elliott Smith

I’ll do my best to express with my thinking, as well as how I would explain the 
feature to a user.

My mental model for LET statements is that they are simply SELECT statements 
where the columns that are selected become variables accessible anywhere in the 
scope of the transaction. That is to say, you should be able to run something 
like s/LET/SELECT and s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a 
LET statement and produce a valid SELECT statement, and vice versa. Both should 
perform identically.

e.g. 
SELECT pk AS key, v AS value FROM table 

=> 
LET key = pk, value = v FROM table

Identical form, identical behaviour. Every statement should be directly 
translatable with some simple text manipulation.

We can then make this more powerful for users by simply expanding SELECT 
statements, e.g. by permitting them to declare constants and tuples in the 
column results. In this scheme LET x = * is simply syntactic sugar for LET x = 
(pk, ck, field1, …) This scheme then supports options 2, 4 and 5 all at once, 
consistently alongside each other.

Option 6 is in fact very similar, but is strictly less flexible for the user as 
they have no way to declare multiple scalar variables without scoping them 
inside a tuple.

e.g.
LET key = pk, value = v FROM table
IF key > 1 AND value > 1 THEN...

=>
LET row = SELECT pk AS key, v AS value FROM table
IF row.key > 1 AND row.value > 1 THEN…

However, both are expressible in the existing proposal, as if you prefer this 
naming scheme you can simply write

LET row = (pk AS key, v AS value) FROM table
IF row.key > 1 AND row.value > 1 THEN…

With respect to auto converting single column results to a scalar, we do need a 
way for the user to say they care whether the row was null or the column. I 
think an implicit conversion here could be surprising. However we could 
implement tuple expressions anyway and let the user explicitly declare v as a 
tuple as Caleb has suggested for the existing proposal as well.

Assigning constants or other values not selected from a table would also be a 
little clunky:

LET v1 = someFunc(), v2 = someOtherFunc(?)
IF v1 > 1 AND v2 > 1 THEN…

=>
LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
IF row.v1 > 1 AND row.v2 > 1 THEN...

That said, the proposals are close to identical, it is just slightly more 
verbose and slightly less flexible.

Which one would be most intuitive to users is hard to predict. It might be that 
Option 6 would be slightly easier, but I’m unsure if there would be a huge 
difference.


> On 13 Aug 2022, at 16:59, Patrick McFadin  wrote:
> 
> I'm really happy to see CEP-15 getting closer to a final implementation. I'm 
> going to walk through my reasoning for your proposals wrt trying to explain 
> this to somebody new. 
> 
> Looking at all the options, the first thing that comes up for me is the 
> Cassandra project's complicated relationship with NULL.  We have prior art 
> with EXISTS/NOT EXISTS when creating new tables. IS NULL/IS NOT NULL is used 
> in materialized views similarly to proposals 2,4 and 5. 
> 
> CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
>   AS SELECT [ (column_list) ]
>   FROM [keyspace_name.]table_name
>   [ WHERE column_name IS NOT NULL
>   [ AND column_name IS NOT NULL ... ] ]
>   [ AND relation [ AND ... ] ] 
>   PRIMARY KEY ( column_list )
>   [ WITH [ table_properties ]
>   [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] ;
> 
>  Based on that, I believe 1 and 3 would just confuse users, so -1 on those. 
> 
> Trying to explain the difference between row and column operations with LET, 
> I can't see the difference between a row and column in #2. 
> 
> #4 introduces a boolean instead of column names and just adds more syntax.
> 
> #5 is verbose and, in my opinion, easier to reason when writing a query. 
> Thinking top down, I need to know if these exact rows and/or column values 
> exist before changing them, so I'll define them first. Then I'll iterate over 
> the state I created in my actual changes so I know I'm changing precisely 
> what I want. 
> 
> #5 could use a bit more to be clearer to somebody who doesn't write CQL 
> queries daily and wouldn't require memorizing subtle differences. It should 
> be similar to all the other syntax, so learning a little about CQL will let 
> you move into more without completely re-learning the new syntax.  
> 
> So I propose #6)
> BEGIN TRANSACTION
>   LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all columns
>   LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0;
>   SELECT row1, row2
>   IF row1 IS NULL AND row2.v = 3 THEN
> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>   END IF
> COMMIT TRANSACTION
> 
> I added the SELECT in the LET just so it's straightforward, you are reading, 
> and it's just like doing a regular select, but you are assigning it to a 
> variable. 
> 
> I removed the confusing 'row1.v' and replaced it with 'row1' I can't see why 
> 

Re: [DISCUSS] Remove Dead Pull Requests

2022-08-11 Thread Benedict Elliott Smith

Perhaps we just restrict “trivial” patches to trunk? If it requires several 
PRs/branches then a Jira is perhaps warranted, and perhaps if it is trivial and 
unimportant it’s better not to waste the project’s time managing the overhead.

This would also be simplified with a modified merge strategy, as it would be 
fine to simply open separate “trivial” PRs that could independently be closed 
in the UX with low overhead by maintainers.

> On 11 Aug 2022, at 11:10, Claude Warren, Jr via dev 
>  wrote:
> 
> I agree the amount of work is somewhat overwhelming for the proposed change, 
> but I was referring to the lack of a Jira ticket blocking the pull request.  
> At least that is how it looks to the new observer.  Perhaps we should add a 
> "trivial change" label for requests that do not have a ticket and are 
> trivial. 
> 
> How many branches do the changes currently need to be applied to?  I assume 
> this goes up by 1 after the next release.
> 
> On Thu, Aug 11, 2022 at 9:36 AM Benjamin Lerer  wrote:
>>> Is there an objection to accepting "typo" corrections without a ticket?
>> 
>> One problem to be aware of is that those pull requests need to be converted 
>> in patches and merged manually up to trunk if they were done on older 
>> branches. So it might not look like it at first but it can be quite time 
>> consuming.
>> 
>> Le jeu. 11 août 2022 à 10:07, Benedict  a écrit :
>>> Those all seem like good suggestions to me
>>> 
 On 11 Aug 2022, at 08:44, Claude Warren, Jr via dev 
  wrote:
 
 My original goal was to reduce the number of pull requests in the backlog 
 as it appears, from the outside, that the project does not really care for 
 outside contributions when there are over 200 pull requests pending and 
 many of them multiple years old.  I guess that is an optics issue.  Upon 
 looking at the older backlog, there were a few that I felt could be closed 
 because they didn't have tickets, or were trivial (i.e. typo correction), 
 or for which the original repository no longer exists.  However, from the 
 conversation here, it seems like the older pull requests are being used as 
 a long term storage for ideas that have not come to fruition and for which 
 the original developer may no longer be active.
 
 Looking through the pull request backlog there are a number of requests 
 that are not associated with a ticket.  Perhaps we should add pull request 
 template to github to request the associated ticket number when the pull 
 request is made.  The template can also request any other information we 
 this appropriate to speeding acceptance of the request.  I would add a 
 "This is a trivial change" checkbox for things like typo changes.  Is 
 there any documentation on the pull request process?  I think I saw 
 something that said patches were requested, but I can't find it now.  We 
 could add a link to any such documentation in the template as well.
 
 Is there an objection to accepting "typo" corrections without a ticket?
 
 
 
 Claude
 
 On Wed, Aug 10, 2022 at 5:08 PM Josh McKenzie  wrote:
> I think of this from a discoverability and workflow perspective at least 
> on the JIRA side, though many of the same traits apply to PR's. Some 
> questions that come to mind:
> 
> 1. Are people grepping through the backlog of open items for things to 
> work on they'd otherwise miss if they were closed out?
> 2. Are people grepping via specific text phrases in the summary, with or 
> without "resolution = unresolved",  to try and find things on a 
> particular topic to work on?
> 3. Relying on labels? Components? Something else?
> 
> My .02: folks that are new to the project probably need more guidance on 
> what to look for to get engaged with which is served by the LHF + 
> unresolved + status emails + @cassandra_mentors. Mid to long-timers are 
> probably more likely to search for specific topics, but may search for 
> open tickets with patches attached or Patch Available things (seems 
> unlikely as most of us have areas we're focused on but is possible?)
> 
> The status quo today (leave things open if work has been done on it 
> and/or it's an idea that clearly still has some relevance) seems to 
> satisfy the most use-cases and retain the most flexibility, so I'd 
> advocate for us not making a change just to make a change. While we could 
> add a tag or resolution that indicates something closed out due to it 
> being stale, my intuition is that people will just tack on "resolution = 
> unresolved OR labels = closed_stale" in the JIRA case or sift through all 
> things not merged in the PR case to effectively end up with the same body 
> of results they're getting today.
> 
> Given the ability of JQL to sort and slice based on updated times as 
> well, 

Re: Cassandra project status update 2022-08-03

2022-08-10 Thread Benedict Elliott Smith

> We can start by putting the bar at a lower level and raise the level over time

+1

> One simple approach that has been mentioned several time is to run the new 
> tests added by a given patch in a loop using one of the CircleCI tasks

I think if we want to do this, it should be extremely easy - by which I mean 
automatic, really. This shouldn’t be too tricky I think? We just need to 
produce a diff of new test classes and methods within existing classes. If 
there doesn’t already exist tooling to do this, I can probably help out by 
putting together something to output @Test annotated methods within a source 
tree, if others are able to turn this into a part of the CircleCI pre-commit 
task (i.e. to pick the common ancestor with trunk, 4.1 etc, and run this task 
for each of the outputs). We might want to start standardising branch naming 
structures to support picking the upstream branch.

> We should also probably revert newly committed patch if we detect that they 
> introduced flakies.

There should be a strict time limit for reverting a patch for this reason, as 
environments change and what is flaky now was not necessarily before.

> On 9 Aug 2022, at 12:57, Benjamin Lerer  wrote:
> 
> At this point it is clear that we will probably never be able to remove some 
> level of flakiness from our tests. For me the questions are: 1) Where do we 
> draw the line for a release ? and 2) How do we maintain that line over time?
> 
> In my opinion, not all flakies are equals. Some fails every 10 runs, some 
> fails 1 in a 1000 runs. I would personally draw the line based on that 
> metric. With the circleci tasks that Andres has added we can easily get that 
> information for a given test.
> We can start by putting the bar at a lower level and raise the level over 
> time when most of the flakies that we hit are above that level.
> 
> At the same time we should make sure that we do not introduce new flakies. 
> One simple approach that has been mentioned several time is to run the new 
> tests added by a given patch in a loop using one of the CircleCI tasks. That 
> would allow us to minimize the risk of introducing flaky tests. We should 
> also probably revert newly committed patch if we detect that they introduced 
> flakies.
> 
> What do you think?
> 
> 
> 
> 
> 
> Le dim. 7 août 2022 à 12:24, Mick Semb Wever  a écrit :
>> 
>> 
>>> With that said, I guess we can just revise on a regular basis what exactly 
>>> are the last flakes and not numbers which also change quickly up and down 
>>> with the first change in the Infra. 
>> 
>> 
>> 
>> +1, I am in favour of taking a pragmatic approach.
>> 
>> If flakies are identified and triaged enough that, with correlation from 
>> both CI systems, we are confident that no legit bugs are behind them, I'm in 
>> favour of going beta.
>> 
>> I still remain in favour of somehow incentivising reducing other flakies as 
>> well. Flakies that expose poor/limited CI infra, and/or tests that are not 
>> as resilient as they could be, are still noise that indirectly reduce our QA 
>> (and increase efforts to find and tackle those legit runtime problems). 
>> Interested in hearing input from others here that have been spending a lot 
>> of time on this front. 
>> 
>> Could it work if we say: all flakies must be ticketed, and test/infra 
>> related flakies do not block a beta release so long as there are fewer than 
>> the previous release? The intent here being pragmatic, but keeping us on a 
>> "keep the campground cleaner" trajectory… 



Re: Inclusive/exclusive endpoints when compacting token ranges

2022-07-26 Thread Benedict Elliott Smith

I think a change like this could be dangerous for a lot of existing automation 
built atop nodetool.

I’m not sure this change is worthwhile. I think it would be better to introduce 
e.g. -ste and -ete for “start token exclusive” and “end token exclusive” so 
that users can opt-in to whichever scheme they prefer for their tooling, 
without breaking existing users.

> On 26 Jul 2022, at 14:22, Brandon Williams  wrote:
> 
> +1, I think that makes the most sense.
> 
> Kind Regards,
> Brandon
> 
> On Tue, Jul 26, 2022 at 8:19 AM J. D. Jordan  
> wrote:
>> 
>> I like the third option, especially if it makes it consistent with repair, 
>> which has supported ranges longer and I would guess most people would think 
>> the compact ranges work the same as the repair ranges.
>> 
>> -Jeremiah Jordan
>> 
>>> On Jul 26, 2022, at 6:49 AM, Andrés de la Peña  wrote:
>>> 
>>> 
>>> Hi all,
>>> 
>>> CASSANDRA-17575 has detected that token ranges in nodetool compact are 
>>> interpreted as closed on both sides. For example, the command "nodetool 
>>> compact -st 10 -et 50" will compact the tokens in [10, 50]. This way of 
>>> interpreting token ranges is unusual since token ranges are usually 
>>> half-open, and I think that in the previous example one would expect that 
>>> the compacted tokens would be in (10, 50]. That's for example the way 
>>> nodetool repair works, and indeed the class org.apache.cassandra.dht.Range 
>>> is always half-open.
>>> 
>>> It's worth mentioning that, differently from nodetool repair, the help and 
>>> doc for nodetool compact doesn't specify whether the supplied start/end 
>>> tokens are inclusive or exclusive.
>>> 
>>> I think that ideally nodetool compact should interpret the provided token 
>>> ranges as half-open, to be consistent with how token ranges are usually 
>>> interpreted. However, this would change the way the tool has worked until 
>>> now. This change might be problematic for existing users relying on the old 
>>> behaviour. That would be especially severe for the case where the begin and 
>>> end token are the same, because interpreting [x, x] we would compact a 
>>> single token, whereas I think that interpreting (x, x] would compact all 
>>> the tokens. As for compacting ranges including multiple tokens, I think the 
>>> change wouldn't be so bad, since probably the supplied token ranges come 
>>> from tools that are already presenting the ranges as half-open. Also, if we 
>>> are splitting the full ring into smaller ranges, half-open intervals would 
>>> still work and would save us some repetitions.
>>> 
>>> So my question is: Should we change the behaviour of nodetool compact to 
>>> interpret the token ranges as half-opened, aligning it with the usual 
>>> interpretation of ranges? Or should we just document the current odd 
>>> behaviour to prevent compatibility issues?
>>> 
>>> A third option would be changing to half-opened ranges and also forbidding 
>>> ranges where the begin and end token are the same, to prevent the 
>>> accidental compaction of the entire ring. Note that nodetool repair also 
>>> forbids this type of token ranges.
>>> 
>>> What do you think?




Re: CEP-15 multi key transaction syntax

2022-06-16 Thread Benedict Elliott Smith
I like Postgres' approach of letting you declare an exceptional condition and 
failing if there is not precisely one result (though I would prefer to 
differentiate between 0 row->Null and 2 rows->first row), but once you permit 
coercing to NULL I think you have to then treat it like NULL and permit 
arithmetic (that itself yields NULL)

This is explicitly stipulated in ANSI SQL 92, in 6.12 :

General Rules

 1) If the value of any  simply contained in a
 is the null value, then the result of
the  is the null value.


On 2022/06/16 16:02:33 Blake Eggleston wrote:
> Yeah I'd say NULL is fine for condition evaluation. Reference assignment is a 
> little trickier. Assigning null to a column seems ok, but we should raise an 
> exception if they're doing math or something that expects a non-null value
> 
> > On Jun 16, 2022, at 8:46 AM, Benedict Elliott Smith  
> > wrote:
> > 
> > AFAICT that standard addresses server-side cursors, not the assignment of a 
> > query result to a variable. Could you point to where it addresses variable 
> > assignment?
> > 
> > Postgres has a similar concept, SELECT INTO[1], and it explicitly returns 
> > NULL if there are no result rows, unless STRICT is specified in which case 
> > an error is returned. My recollection is that T-SQL is also fine with 
> > coercing no results to NULL when assigning to a variable or using it in a 
> > sub-expression.
> > 
> > I'm in favour of expanding our functionality here, but I do not see 
> > anything fundamentally problematic about the proposal as it stands.
> > 
> > [1] 
> > https://www.postgresql.org/docs/current/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROW
> > 
> > 
> > 
> > On 2022/06/13 14:52:41 Konstantin Osipov wrote:
> >> * bened...@apache.org  [22/06/13 17:37]:
> >>> I believe that is a MySQL specific concept. This is one problem with 
> >>> mimicking SQL – it’s not one thing!
> >>> 
> >>> In T-SQL, a Boolean expression is TRUE, FALSE or UNKNOWN[1], and a NULL 
> >>> value submitted to a Boolean operator yields UNKNOWN.
> >>> 
> >>> IF (X) THEN Y does not run Y if X is UNKNOWN;
> >>> IF (X) THEN Y ELSE Z does run Z if X is UNKNOWN.
> >>> 
> >>> So, I think we have evidence that it is fine to interpret NULL
> >>> as “false” for the evaluation of IF conditions.
> >> 
> >> NOT FOUND handler is in ISO/IEC 9075-4:2003 13.2 
> >> 
> >> In Cassandra results, there is no way to distinguish null values
> >> from absence of a row. Branching, thus, without being able to
> >> branch based on the absence of a row, whatever specific syntax
> >> is used for such branching, is incomplete. 
> >> 
> >> More broadly, SQL/PSM has exception and condition statements, not
> >> just IF statements.
> >> 
> >> -- 
> >> Konstantin Osipov, Moscow, Russia
> >> 
> 
> 


Re: CEP-15 multi key transaction syntax

2022-06-16 Thread Benedict Elliott Smith
AFAICT that standard addresses server-side cursors, not the assignment of a 
query result to a variable. Could you point to where it addresses variable 
assignment?

Postgres has a similar concept, SELECT INTO[1], and it explicitly returns NULL 
if there are no result rows, unless STRICT is specified in which case an error 
is returned. My recollection is that T-SQL is also fine with coercing no 
results to NULL when assigning to a variable or using it in a sub-expression.

I'm in favour of expanding our functionality here, but I do not see anything 
fundamentally problematic about the proposal as it stands.

[1] 
https://www.postgresql.org/docs/current/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROW



On 2022/06/13 14:52:41 Konstantin Osipov wrote:
> * bened...@apache.org  [22/06/13 17:37]:
> > I believe that is a MySQL specific concept. This is one problem with 
> > mimicking SQL – it’s not one thing!
> > 
> > In T-SQL, a Boolean expression is TRUE, FALSE or UNKNOWN[1], and a NULL 
> > value submitted to a Boolean operator yields UNKNOWN.
> > 
> > IF (X) THEN Y does not run Y if X is UNKNOWN;
> > IF (X) THEN Y ELSE Z does run Z if X is UNKNOWN.
> > 
> > So, I think we have evidence that it is fine to interpret NULL
> > as “false” for the evaluation of IF conditions.
> 
> NOT FOUND handler is in ISO/IEC 9075-4:2003 13.2 
> 
> In Cassandra results, there is no way to distinguish null values
> from absence of a row. Branching, thus, without being able to
> branch based on the absence of a row, whatever specific syntax
> is used for such branching, is incomplete. 
> 
> More broadly, SQL/PSM has exception and condition statements, not
> just IF statements.
> 
> -- 
> Konstantin Osipov, Moscow, Russia
> 


Re: [DISCUSS] Semantic versioning after 4.0

2021-05-03 Thread Benedict Elliott Smith
For a minor/major? I can imagine doing this for a patch version, but this is of 
much less importance to downstream users.

Do you have any examples of projects that do this for major/minor development 
branches, as you propose?

I'm just a bit confused about the proposition to decouple from releases, when 
the whole point of semver is that it's a public API that we honour around 
compatibility, so it is sort of intrinsically coupled to releases. It just 
sounds like a way to make our lives more complicated too, as it's less clear 
what releases are actually extant.


On 03/05/2021, 11:31, "Mick Semb Wever"  wrote:

> > Vendors are also free to support and provide hot-fixes and back ports 
on these unreleased versions, outside of the community's efforts or concerns
>
> This seems to me like we're endorsing the release of these versions by 
downstream maintainers? Even if we decide to modify this proposal and say "no, 
we don't endorse that," how do we prevent it?
>


We have to be explicit that semantic versioning != releases. That only
some version numbers get formal releases attached to them. And only
some of those releases get release branches to them. That is, we are
separating the concerns of versioning and releases for dev community
benefit.

I know some Apache projects don't do takeX re-votes on repeated
release attempts. Instead they fix the problem (that caused the vote
to fail), increment the version, and start a new vote on a new release
with a new version. These projects then have versions that are never
released, though this example is not to do with semver.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Semantic versioning after 4.0

2021-05-03 Thread Benedict Elliott Smith
> Vendors are also free to support and provide hot-fixes and back ports on 
> these unreleased versions, outside of the community's efforts or concerns

This seems to me like we're endorsing the release of these versions by 
downstream maintainers? Even if we decide to modify this proposal and say "no, 
we don't endorse that," how do we prevent it?

What's the benefit of this approach over using snapshot tags, if the goal is 
just making downstream maintainer's lives easier wrt merging a year's worth of 
work? 

On 03/05/2021, 10:09, "Mick Semb Wever"  wrote:

> Well the other problem I see is that this could create a lot of confusion 
for our users, if more versions start popping up (and/or versions are skipped). 
It's hard to row back from unwanted versions in the wild, and we may end up 
having to either support them or disappoint our users.


This is not made explicit to users. Not through announcements or
formal releases. This is strictly within dev.

Any bugs reported back upstream to the community in the dev cycle will
be fixed in trunk, that's the stated dev cycle and we don't deviate.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Semantic versioning after 4.0

2021-05-03 Thread Benedict Elliott Smith
Well the other problem I see is that this could create a lot of confusion for 
our users, if more versions start popping up (and/or versions are skipped). 
It's hard to row back from unwanted versions in the wild, and we may end up 
having to either support them or disappoint our users.

Do we have any examples of this approach being used elsewhere?

It just seems to me if the goal is to make more manageable downstream updates, 
a quarterly (or monthly) snapshot of latest might suffice, which can be done 
without potentially messing with our release cycle? Perhaps I'm missing 
something though.


On 03/05/2021, 09:44, "Mick Semb Wever"  wrote:

> Hmm, ok. I see some possible issues with this. You mention one 
possibility, i.e. that downstream may end up releasing these versions for us? 
Which potentially complicates our lives, whether we want it or not.
>
> Would this apply to only trunk, or to all existing major/minor releases?


Only trunk, only in the annual dev cycle.

Yeah, I can see different problems popping up, and alternative approaches.

I'm thinking let's try this to begin with, focusing on making it
easier to bump the version for our own sake (there's too much that's
hard-coded) and better documenting everything (all that's mentioned in
this thread). Off that, and seeing what happens in the ecosystem (and
what they ask for), we can evolve. Sound ok?

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Semantic versioning after 4.0

2021-05-03 Thread Benedict Elliott Smith
Hmm, ok. I see some possible issues with this. You mention one possibility, 
i.e. that downstream may end up releasing these versions for us? Which 
potentially complicates our lives, whether we want it or not. 

Would this apply to only trunk, or to all existing major/minor releases?

I wonder if there's a better way to implement this, perhaps simply with tags 
that are cut periodically and designed specifically for downstream to work with?

On 03/05/2021, 09:27, "Mick Semb Wever"  wrote:

> Sorry, I may be being dense, but it's not that I didn't parse your 
justification for it, but that I literally don't understand what the proposal 
is.


Ah, nothing more than every quarter we bump the minor version in build.xml
(the frequency is up for discussion)

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Semantic versioning after 4.0

2021-05-03 Thread Benedict Elliott Smith
Sorry, I may be being dense, but it's not that I didn't parse your 
justification for it, but that I literally don't understand what the proposal 
is.

On 03/05/2021, 08:30, "Mick Semb Wever"  wrote:

> I didn't really understand the unreleased versions proposal though.


Benedict, two brief example perspectives on it. This is all under the
"let's try, learn, evaluate" umbrella.

1)
Unreleased versions can give downstream more choices through the
annual development cycle than the binary choice of "latest snapshot"
or "a specific timestamped snapshot". An example, Reaper's tests
against trunk may find it too much overhead to keep up dev changes as
they land, but will benefit keeping up with quarterly increments.

2)
Being better at smooth version increments. Letting the version be
accurate to semver, and independent from the release cycle. This
should in turn get us better at knowing (and even automating) what
upgrades paths and compatibilities need to be tested through the dev
cycle. Our handling of versions through the tests is not ideal atm,
take for example the dtest upgrade manifest, or the jvm dtests which
can hardcode the upgrade paths within tests…)

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSSION] Attracting new contributors

2021-04-29 Thread Benedict Elliott Smith
Thanks Aleksei,

Some of these are great points, but to respond specifically to the checkstyle 
suggestion: I hope to kick off some (minor) discussion around codestyle soon to 
modernise our guide, however I would personally prefer that code style 
enforcement remains relatively light touch. Some obvious things could be 
enforced by checkstyle (such as the braces), and we should investigate that*, 
but I would hate for the project to get _too_ mechanical about the way code is 
structured.

I've been fairly opposed to the upheaval caused by changing build tooling, but 
you're right that the barrier to booting up your IDE is a big part of the 
contribution overhead for newbies, so perhaps we should take another look at it.

* I hope to utilise checkstyle soon to prohibit certain specific code patterns 
too, but that’s for a much later discussion


On 29/04/2021, 12:05, "Aleksei Zotov"  wrote:

Hi Benjamin,

I'd like to put in my two cents as well.

There were many great suggestions related to the communication and 
process. They make sense to me, however, I'd like to look at the problem 
from another perspective.

First of all, let me share my perception on the opensource activities. 
There are two main reasons why people may want to contribute: 1) they 
experience a problem on the current project 2) any kind of volunteering. 
The first reason is clear, those contributors are not going to stick 
around because they just need to solve their particular problem.

The second group of people is our target. In fact, there could be 
numerous reasons why people want to contribute (feel bored, get new 
experience, improve resume, etc), but despite the particular motivation 
point, people should feel positive of what they are doing. For that we 
should make sure they feel a part of the team/process and their work 
appreciated.

The first point is related to many suggestions that have been already 
brought. I feel the most important here is timely replies (even "sorry, 
we're busy these days, I'll review/respond in two weeks / after xxx 
version is released" is much better than silence). Such "follow up" 
responses do not address the original queries, but they help the 
external contributors to keep courage and remove uncertainty related to 
the lack of transparency (it might not be clear: a) whether the request 
is still on someone's radar b) when to expect a response c) when it is a 
good time to follow up). And obviously a "real" (or at least another 
"follow up") response needs to be provided within the ETA. This still 
does not resolve the problem of committers bandwidth, but allows to 
handle spikes in requests from the contributors.

Appreciation is the second point. Generally people like making 
achievements, we just need to make every contribution a kind of 
achievement that a person somehow may boast of. A good way of doing that 
is having some rewards. It might be smth material like a T-Shirt (I 
remember getting a T-Shirt on C* v2 release was nice; obviously not for 
a single commit, but for multiple - depends on the budget; or top 10 the 
most active external contributors) or smth free like a virtual badge, 
being posted in an annual list of contributors or similar. Even though 
it may sound a bit naive, I believe people like making and counting 
achievements and it might help to attract/retain the contributors.

On a separate note, there is a technical part of the problem of 
attracting (not retaining) the contributors. It is really important to 
make sure that the entry level is low enough and people do not spend 
much time on additional configuration, learning styling guidelines, etc 
for making their first contribution. No-one likes boring stuff :)

Based on my experience among many opensource projects, it is really 
frustrating to spend hours of personal time on getting the build working 
locally, configuring IDE or similar problems that should not ever exist 
(or at least be well documented). I believe that many people loose their 
courage and give up on this stage (it is a kind of uncomfortable to ask 
for help in running tests in a group chat with 600+ people). For 
example, Intellij configuration (/ant generate-idea-files/) did not work 
for me (test classpath was missing 

,
 
imports and formatter configs were not picked up properly) - I fixed it 
myself. Netbeans configuration 


 
was also broken. All such minor issues are really major if they can 
potentially scare away the new contributors.

Even though it might sound too 

Re: [DISCUSSION] Attracting new contributors

2021-04-28 Thread Benedict Elliott Smith
> I believe that it can be a virtuous circle where we produce new committers 
> that help mentoring newcomers.

That's the dream, and kudos for keeping it alive! I have become jaded about 
this possibility, after years of trying.


On 28/04/2021, 10:18, "Benjamin Lerer"  wrote:

>
> I think there are two main hurdles, one is restoring contributor interest
> in mentoring, and the other is finding newcomers that actually want to
> stick around.


I am interested in mentoring new committers to help the project grow and
some of the new committers expressed the same interest to me. I believe
that it can be a virtuous circle where we produce new committers that help
mentoring newcomers.

What we need is to be well organized and make sure that we have a
reasonable response time to newcomers.

Berenguer created this board to help to track newcomers contributions:

https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=463=2088
Apparently Brandon is cheating to appear as a newcomer but we will solve
that. He should be at the Nightmare level  ;-)

Le mer. 28 avr. 2021 à 10:54, Benedict Elliott Smith 
a écrit :

> I think there are two main hurdles, one is restoring contributor interest
> in mentoring, and the other is finding newcomers that actually want to
> stick around. These are perhaps two sides of the same coin, though. An 
ugly
> truth is that it isn't very enjoyable or rewarding to help newcomers when
> they mostly don't stick around - often even to complete their first patch!
> The patches are mostly uninteresting, the work often done to a low
> standard, and it is easy to underestimate the amount of time involved in
> every such failed interaction.
>
> I think making it easier to contribute and demonstrate a lasting interest
> in the project without the hand holding of long term contributors may
> benefit both sides of the equation, as it is more rewarding to help
> somebody who's demonstrated a persistent interest in the community.
>
>
> On 28/04/2021, 03:24, "Paulo Motta"  wrote:
>
> > There is no great hurdle in finding something to work on, it's
> solely finding
> someone with the knowledge that can help you work on something and
> progress
> it to commit.
>
> I agree the primary challenge is to engage existing contributors to
> mentor
> newcomers, but this doesn’t preclude having good documentation and a
> well
> maintained task pool to allow newcomers to self-serve as much as
> possible
> and reduce the mentoring burden, so these efforts are complimentary.
>
> For instance, a few students were interested in picking random tasks 
to
> work on in preparation for Google Summer of Code, but it was not
> straightforward for them to find a task to work on because we don’t
> consistently label tickets as “Low Hanging Fruit” and the ones that 
are
> tagged sometimes don’t have meaningful descriptions making it hard for
> these students to get started on tasks without unnecessarily taking
> some
> time from the mentor (which could have been saved if the tasks were
> properly described and labeled in the first place).
>
> On Tue, 27 Apr 2021 at 22:24 Kane Wilson  wrote:
>
> > The main problem, as has always been, is that the big players have a
> > stranglehold on all the committer resources, and bringing in new
> > contributors is not high on their priorities. All that's really
> required
> > here is that existing committers are directed to spend some
> non-negligible
> > portion of their time assisting non-committers (especially those not
> > already employed in their own organisation). That should really be a
> > starting point, as any other measures you take will not help until
> the time
> > is allocated so people can actually receive feedback and help from
> the
> > small pool of knowledge available.
> >
> > There is no great hurdle in finding something to work on, it's 
solely
> > finding someone with the knowledge that can help you work on
> something and
> > progress it to commit.
> >
> >
> > > Run a committer incubator program: Take applications for a small
> number
> > > of spots(5-10) and mentor these new engineers through learning the
> code
> > > base, understanding the contribution process, and eventually 
making

Re: [DISCUSSION] Attracting new contributors

2021-04-28 Thread Benedict Elliott Smith
I think there are two main hurdles, one is restoring contributor interest in 
mentoring, and the other is finding newcomers that actually want to stick 
around. These are perhaps two sides of the same coin, though. An ugly truth is 
that it isn't very enjoyable or rewarding to help newcomers when they mostly 
don't stick around - often even to complete their first patch! The patches are 
mostly uninteresting, the work often done to a low standard, and it is easy to 
underestimate the amount of time involved in every such failed interaction. 

I think making it easier to contribute and demonstrate a lasting interest in 
the project without the hand holding of long term contributors may benefit both 
sides of the equation, as it is more rewarding to help somebody who's 
demonstrated a persistent interest in the community.


On 28/04/2021, 03:24, "Paulo Motta"  wrote:

> There is no great hurdle in finding something to work on, it's solely 
finding
someone with the knowledge that can help you work on something and progress
it to commit.

I agree the primary challenge is to engage existing contributors to mentor
newcomers, but this doesn’t preclude having good documentation and a well
maintained task pool to allow newcomers to self-serve as much as possible
and reduce the mentoring burden, so these efforts are complimentary.

For instance, a few students were interested in picking random tasks to
work on in preparation for Google Summer of Code, but it was not
straightforward for them to find a task to work on because we don’t
consistently label tickets as “Low Hanging Fruit” and the ones that are
tagged sometimes don’t have meaningful descriptions making it hard for
these students to get started on tasks without unnecessarily taking some
time from the mentor (which could have been saved if the tasks were
properly described and labeled in the first place).

On Tue, 27 Apr 2021 at 22:24 Kane Wilson  wrote:

> The main problem, as has always been, is that the big players have a
> stranglehold on all the committer resources, and bringing in new
> contributors is not high on their priorities. All that's really required
> here is that existing committers are directed to spend some non-negligible
> portion of their time assisting non-committers (especially those not
> already employed in their own organisation). That should really be a
> starting point, as any other measures you take will not help until the 
time
> is allocated so people can actually receive feedback and help from the
> small pool of knowledge available.
>
> There is no great hurdle in finding something to work on, it's solely
> finding someone with the knowledge that can help you work on something and
> progress it to commit.
>
>
> > Run a committer incubator program: Take applications for a small number
> > of spots(5-10) and mentor these new engineers through learning the code
> > base, understanding the contribution process, and eventually making
> > substantive code contributions to the project. The eventual goal is that
> > those who finish will be added as a committer to the project. This could
> be
> > as big or small as we want but I can see all sorts of great things that
> > could come of this.
>
>
> This is a great idea as a follow up (i.e, after there is evidence that
> contributions are being progressed), as it would give a more concrete
> process and confidence for existing contributors that they can eventually
> become committers, and insight into what work is required.
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSSION] Update complexity levels

2021-04-27 Thread Benedict Elliott Smith
FWIW, the goal of this field was to help project planning (both going forwards, 
and in looking at how we've fared to help project going forwards) more than 
contributor assignment.  There wasn't any expectation that the correct 
complexity would be provided on triage. 

I'm not sure how much Jira is likely to be used for this purpose in the coming 
year anyway, and the boss level difficulty has yet to be used anywhere in the 
project (even on tickets that deserved it), so sure, no objection from me if 
we're simplifying.


On 27/04/2021, 19:49, "Paulo Motta"  wrote:

Branching out the discussion on the complexity levels from the "Attracting
new contributors" thread so we don't mix up the topics in the same thread.

I personally think that the "complexity" field is more an indicator/hint
for inexperienced contributors on whether he will be able to work on a
particular task. Veteran contributors will very likely just ignore this
field and work on whatever they like/need.

For instance, a person that never contributed to the project will look only
into "entry level" tasks. A person who worked on a few "entry level" tasks
will maybe start looking into "intermediate" tasks, but not into "advanced"
tasks. A person who worked on several "intermediate" tasks will become
confident to work on "advanced" tasks.

On the other end of the spectrum, the veteran contributor will not extract
any value from this field since it is able to gauge the complexity of the
task without this and decide which tasks to work on.

So, with that in mind I'm +1 in having this 3 tiered level proposed by
Patrick since it's very simple and unambiguous, while providing a lot of
value to new contributors to find out suitable tasks to work on.

Em ter., 27 de abr. de 2021 às 15:23, Patrick McFadin 
escreveu:

> I have to admit, I like those Duke Nukem levels way more than I should. I
> guess when you choose "Damn I'm Good" you get the boss fight to end all
> boss fights. "Benedict has been assigned as a reviewer..." o.O
>
> But seriously folks. :D
>
> I would advocate for a simple tiering system.
>
> Entry Level
> Intermediate
> Advanced
>
> Clearly defined buckets which not only make it easier for the person
> looking at the Jiras, it also makes it easier for whoever is creating or
> triaging the issue. Also, 3 is a magic number.
>
> Patrick
>
> On Tue, Apr 27, 2021 at 10:16 AM Stefan Miklosovic <
> stefan.mikloso...@instaclustr.com> wrote:
>
> > Quake has it like
> >
> > - I Can Win
> > - Bring It On
> > - Hurt Me Plenty
> > - Hardcore
> > - Nightmare!
> >
> > On Tue, 27 Apr 2021 at 19:02, Benedict Elliott Smith
> >  wrote:
> > >
> > > I think Duke Nuke'em would be more apt
> > >
> > > - Piece of Cake
> > > - Let's Rock
> > > - Come Get Some
> > > - Damn I'm Good
> > >
> > > On 27/04/2021, 17:57, "Patrick McFadin"  wrote:
> > >
    > > > Could always go with Doom difficulty levels:
> > >
> > >
> > >- I'm Too Young to Die - Easy.
> > >- Hurt Me Plenty - Normal.
> > >- Ultra-Violence - Hard.
> > >- Nightmare - Very Hard.
> > >-
> > >
> > >
> > > On Tue, Apr 27, 2021 at 9:50 AM Benedict Elliott Smith <
> > bened...@apache.org>
> > > wrote:
> > >
> > > > Perhaps we could replace both Complexity and Difficulty with 
e.g.
> > > > Experience?
> > > >
> > > > Newcomer
> > > > Learner
> > > > Contributor
> > > > Experienced
> > > > Veteran
> > > >
> > > > I'm not sure I like it. I don't really like segregating the
> > community into
> > > > buckets like this. But it is perhaps more intuitive than
> > complexity, while
> > > > encoding a more objective concept of difficulty.
> > > >
> > > >
> > > > On 27/04/2021, 17:33, "Paulo Motta" 
> > wrote:
> > > >
> > > > I (wrongly) assumed this proposal would be fairly
> > uncontroversial so I
> > > &g

Re: [DISCUSSION] Attracting new contributors

2021-04-27 Thread Benedict Elliott Smith
I think Duke Nuke'em would be more apt

- Piece of Cake
- Let's Rock
- Come Get Some
- Damn I'm Good

On 27/04/2021, 17:57, "Patrick McFadin"  wrote:

Could always go with Doom difficulty levels:


   - I'm Too Young to Die - Easy.
   - Hurt Me Plenty - Normal.
   - Ultra-Violence - Hard.
   - Nightmare - Very Hard.
   -


On Tue, Apr 27, 2021 at 9:50 AM Benedict Elliott Smith 
wrote:

> Perhaps we could replace both Complexity and Difficulty with e.g.
> Experience?
>
> Newcomer
> Learner
> Contributor
> Experienced
> Veteran
>
> I'm not sure I like it. I don't really like segregating the community into
> buckets like this. But it is perhaps more intuitive than complexity, while
> encoding a more objective concept of difficulty.
>
>
> On 27/04/2021, 17:33, "Paulo Motta"  wrote:
>
> I (wrongly) assumed this proposal would be fairly uncontroversial so I
> brought up within this related thread but given there is some
> divergence, I
> retract the suggestion for now and will bring it on its own thread
> later so
> we don't go too far away from the original, and more important, topic
> which
> is how to attract and retain new contributors to the project.
>
> Em ter., 27 de abr. de 2021 às 13:08, Benedict Elliott Smith <
> bened...@apache.org> escreveu:
>
> > What you are describing to me are difficulty levels, whereas this
> field
> > tries to measure complexity. The difference is that while both are
> > subjective, difficulty is relatively more so. This may lead people 
to
> > assign difficulty based on their own perception (which is very
> subjective),
> > rather than the scope of the problem (which is still subjective, but
> less
> > so).
> >
> > We can bike-shed the names or the definitions all we like, but we
> need
> > some separate text to elaborate the intended meaning, else we'll all
> mean
> > and encode different things.
> >
> > I also don't personally think Hard or Very Hard are descriptive. By
> > comparison, Byzantine is a word that not only crops up in 
distributed
> > systems to mean involving many parties (i.e. in this case many
> subsystems),
> > but is widely used in English to mean "intricately involved" with
> > connotations of labyrinthine, i.e. easy to get lost doing, or easy 
to
> > misunderstand.
> >
> > I'm definitely open to improving the terminology, but we did bike
> shed
> > this all only a year or so ago I think?
> >
> >
> >
> > On 27/04/2021, 16:20, "Paulo Motta" 
> wrote:
> >
> > Thanks for bringing the definitions and historical context
> Benedict.
> > Agreed
> > to not attach difficulties to time to complete a task.
> >
> > The fact that the complexity types need explanation or reading
> > documentation is precisely the issue I’m trying to solve by
> using more
> > straightforward and unambiguous terms (as much as possible).
> >
> > So I propose the following levels instead.
> > - Beginner (current LHF for people who have never submitted a
> patch
> > (ie.
> > trivial doc changes or minor test fixes))
> > - Easy (current LHF for people who have submitted at least a
> couple of
> > patches (ie. add parameter to existing tool))
> > - Intermediate (current normal)
> > - Hard (current Challenging)
> > - Very Hard (current Byzantine)
> >
> > Please let me know what you think.
> >
> > Em ter., 27 de abr. de 2021 às 11:44, Benedict Elliott Smith <
> > bened...@apache.org> escreveu:
> >
> > > If you're wondering, they're documented:
> > >
> >
> 
https://cwiki.apache.org/confluence/display/CASSANDRA/JIRA+Workflow+Proposals
> > >
> > > Impossible was introduced to take the place of "pony" - which
> was
> > > genuinely deployed on occasion, but I agree it's redundant as
> nobody
> > > proposes things like 

Re: [DISCUSSION] Attracting new contributors

2021-04-27 Thread Benedict Elliott Smith
Perhaps we could replace both Complexity and Difficulty with e.g. Experience?

Newcomer
Learner
Contributor
Experienced
Veteran

I'm not sure I like it. I don't really like segregating the community into 
buckets like this. But it is perhaps more intuitive than complexity, while 
encoding a more objective concept of difficulty.


On 27/04/2021, 17:33, "Paulo Motta"  wrote:

I (wrongly) assumed this proposal would be fairly uncontroversial so I
brought up within this related thread but given there is some divergence, I
retract the suggestion for now and will bring it on its own thread later so
we don't go too far away from the original, and more important, topic which
is how to attract and retain new contributors to the project.

Em ter., 27 de abr. de 2021 às 13:08, Benedict Elliott Smith <
bened...@apache.org> escreveu:

> What you are describing to me are difficulty levels, whereas this field
> tries to measure complexity. The difference is that while both are
> subjective, difficulty is relatively more so. This may lead people to
> assign difficulty based on their own perception (which is very 
subjective),
> rather than the scope of the problem (which is still subjective, but less
> so).
>
> We can bike-shed the names or the definitions all we like, but we need
> some separate text to elaborate the intended meaning, else we'll all mean
> and encode different things.
>
> I also don't personally think Hard or Very Hard are descriptive. By
> comparison, Byzantine is a word that not only crops up in distributed
> systems to mean involving many parties (i.e. in this case many 
subsystems),
> but is widely used in English to mean "intricately involved" with
> connotations of labyrinthine, i.e. easy to get lost doing, or easy to
> misunderstand.
>
> I'm definitely open to improving the terminology, but we did bike shed
> this all only a year or so ago I think?
>
>
>
> On 27/04/2021, 16:20, "Paulo Motta"  wrote:
>
> Thanks for bringing the definitions and historical context Benedict.
> Agreed
> to not attach difficulties to time to complete a task.
>
> The fact that the complexity types need explanation or reading
> documentation is precisely the issue I’m trying to solve by using more
> straightforward and unambiguous terms (as much as possible).
>
> So I propose the following levels instead.
> - Beginner (current LHF for people who have never submitted a patch
> (ie.
> trivial doc changes or minor test fixes))
> - Easy (current LHF for people who have submitted at least a couple of
> patches (ie. add parameter to existing tool))
> - Intermediate (current normal)
> - Hard (current Challenging)
>     - Very Hard (current Byzantine)
>
> Please let me know what you think.
>
> Em ter., 27 de abr. de 2021 às 11:44, Benedict Elliott Smith <
> bened...@apache.org> escreveu:
>
> > If you're wondering, they're documented:
> >
> 
https://cwiki.apache.org/confluence/display/CASSANDRA/JIRA+Workflow+Proposals
> >
> > Impossible was introduced to take the place of "pony" - which was
> > genuinely deployed on occasion, but I agree it's redundant as nobody
> > proposes things like that anymore.
> >
> > Challenging and Byzantine are useful distinctions IMO, but I'm open
> to
> > relabelling them. Levels of difficulty do not cleanly map to time
> involved,
> > however.
> >
> > The project literally never used Easy in the past, but perhaps you
> can
> > bring about the necessary change to do so.
> >
> >
> > On 27/04/2021, 15:32, "Paulo Motta" 
> wrote:
> >
> > Since this is a related topic, I'd like to open a small
> parenthesis to
> > throw out a proposal for improving the semantics of our JIRA
> > "complexity"
> > field, which currently has the following levels:
> > * Low Hanging Fruit (overall easy tasks for new or existing
> > contributors)
> > * Normal (? this is the most misleading one since it currently
> ranges
> > from
> > very simple tasks to nearly complex tasks)
> > * Challenging
> > * Byzantine (the difference between challenging, byzantin

Re: [DISCUSSION] Attracting new contributors

2021-04-27 Thread Benedict Elliott Smith
What you are describing to me are difficulty levels, whereas this field tries 
to measure complexity. The difference is that while both are subjective, 
difficulty is relatively more so. This may lead people to assign difficulty 
based on their own perception (which is very subjective), rather than the scope 
of the problem (which is still subjective, but less so).

We can bike-shed the names or the definitions all we like, but we need some 
separate text to elaborate the intended meaning, else we'll all mean and encode 
different things. 

I also don't personally think Hard or Very Hard are descriptive. By comparison, 
Byzantine is a word that not only crops up in distributed systems to mean 
involving many parties (i.e. in this case many subsystems), but is widely used 
in English to mean "intricately involved" with connotations of labyrinthine, 
i.e. easy to get lost doing, or easy to misunderstand.

I'm definitely open to improving the terminology, but we did bike shed this all 
only a year or so ago I think?



On 27/04/2021, 16:20, "Paulo Motta"  wrote:

Thanks for bringing the definitions and historical context Benedict. Agreed
to not attach difficulties to time to complete a task.

The fact that the complexity types need explanation or reading
documentation is precisely the issue I’m trying to solve by using more
straightforward and unambiguous terms (as much as possible).

So I propose the following levels instead.
- Beginner (current LHF for people who have never submitted a patch (ie.
trivial doc changes or minor test fixes))
- Easy (current LHF for people who have submitted at least a couple of
patches (ie. add parameter to existing tool))
- Intermediate (current normal)
- Hard (current Challenging)
- Very Hard (current Byzantine)

Please let me know what you think.

Em ter., 27 de abr. de 2021 às 11:44, Benedict Elliott Smith <
bened...@apache.org> escreveu:

> If you're wondering, they're documented:
> 
https://cwiki.apache.org/confluence/display/CASSANDRA/JIRA+Workflow+Proposals
>
> Impossible was introduced to take the place of "pony" - which was
> genuinely deployed on occasion, but I agree it's redundant as nobody
> proposes things like that anymore.
>
> Challenging and Byzantine are useful distinctions IMO, but I'm open to
> relabelling them. Levels of difficulty do not cleanly map to time 
involved,
> however.
>
> The project literally never used Easy in the past, but perhaps you can
> bring about the necessary change to do so.
>
>
> On 27/04/2021, 15:32, "Paulo Motta"  wrote:
>
> Since this is a related topic, I'd like to open a small parenthesis to
> throw out a proposal for improving the semantics of our JIRA
> "complexity"
> field, which currently has the following levels:
> * Low Hanging Fruit (overall easy tasks for new or existing
> contributors)
> * Normal (? this is the most misleading one since it currently ranges
> from
> very simple tasks to nearly complex tasks)
> * Challenging
> * Byzantine (the difference between challenging, byzantine and
> impossible
> tasks is blurry/unclear to me)
> * Impossible (not clear to me what's the purpose of filling a task
> that is
> impossible to do? I think we can just close the ticket as invalid
> during
> triage without setting complexity.)
>
> I propose the following levels instead:
> * Low Hanging Fruit (I think we should even rename this to "Beginner",
> since the LHF term is not very well known by outsiders and non-native
> English speakers) : easy tasks for who never contributed to the
> project.
> * Easy : easy tasks for those who have some basic familiarity with the
> project (contributed at least 2-5 LHF).
> * Intermediate : tasks with intermediate complexity, can be done in
> under a
> month.
> * Challenging : multi-month effort task.
> (no need for byzantine and impossible complexity levels since they
> don't
> add any value)
>
> If you prefer I can open a new thread with this proposal so we can
> focus on
> initiatives to attract contributors - but I think having clear
> guidelines
> on the meaning of task's complexities will help to better delineate
> what
> tasks are suitable for new contributors.
>
> Em ter., 27 de abr. de 2021 às 11:25, Joshua McKenzie <
> jmcken...@apache.org>
> escreveu:
>
> > Updating the boot camp materia

Re: [DISCUSSION] Attracting new contributors

2021-04-27 Thread Benedict Elliott Smith
  > > >
> > > +1, I had a few minor patches before but the bootcamp definitely 
helped
> > me
> > > > ramp up on the project faster and I found the recorded material very
> > > useful
> > > > during project onboarding (some of it is still available on 
Youtube).
> > > >
> > >
> > > People have different levels of experience and they will probably
> > approach
> > > the project in a different way but if a bootcamp can help to have
> another
> > > Paulo, I am willing to do it. ;-)
> > > Of course in this pandemic world the best we can probably offer for 
the
> > > moment is some virtual bootcamp.
> > >
> > > Le mar. 27 avr. 2021 à 15:34, Paulo Motta  a
> > > écrit :
> > >
> > > > +1, I had a few minor patches before but the bootcamp definitely
> helped
> > > me
> > > > ramp up on the project faster and I found the recorded material very
> > > useful
> > > > during project onboarding (some of it is still available on 
Youtube).
> > > >
> > > > I think it would be beneficial to collocate a bootcamp for new
> > > contributors
> > > > together with an annual event such as NGCC or Apachecon/Cassandra
> > Summit
> > > > and also record some of the sessions so they're available for a 
wider
> > > > audience after the fact.
> > > >
> > > > Em ter., 27 de abr. de 2021 às 10:20, Jeremy Hanna <
> > > > jeremy.hanna1...@gmail.com> escreveu:
> > > >
> > > > > I believe Paolo started with the project through a contributor 
boot
> > > camp.
> > > > > Also if I remember correctly some of the ones that were done were
    > > > > internal
> > > > > at DataStax and it helped some people get familiar with the 
project
> > who
> > > > > still contribute today.
> > > > >
> > > > > Also this would be short recorded introductions so they could be
> > around
> > > > > for viewing and with auto translate on Google for different
> languages
> > > > such
> > > > > as Japanese and Mandarin.
> > > > >
> > > > > I do like the idea of a periodic chat. I just thought some 
recorded
> > > > > introductions would help with some of the more common things like
> > “this
> > > > is
> > > > > how the read path works from end to end”.
> > > > >
> > > > > > On Apr 27, 2021, at 10:14 PM, Benedict Elliott Smith <
> > > > > bened...@apache.org> wrote:
> > > > > >
> > > > > > I think that all of the bootcamps we ran in the past produced
> > > > precisely
> > > > > zero new contributors.
> > > > > >
> > > > > > I wonder if it would be more impactful to produce slightly more
> > > > > permanent content, such as step-by-step guides to producing a
> simple
> > > > patch
> > > > > for some subsystem. Perhaps if people want to, a recording could 
be
> > > > created
> > > > > of going through that guide as well.
> > > > > >
> > > > > > That said, if there are new contributors actively trying to
> > > > participate,
> > > > > organising a periodic group chat to talk through one of the issues
> > that
> > > > > they may be working on together as a group with an active
> contributor
> > > > might
> > > > > make sense, and be more targeted in focus?
> > > > > >
> > > > > >
> > > > > > On 27/04/2021, 12:45, "Manish G" 
> > > > wrote:
> > > > > >
> > > > > >Contributor bootcamps can really help new people like me.
> > > > > >
> > > > > >>On Tue, Apr 27, 2021, 5:08 PM Jeremy Hanna <
> > > > > jeremy.hanna1...@gmail.com>
> > > > > >>wrote:
> > > > > >>
> > > > > >> One thing we've done in the past is contributor bootcamps along
> &

Re: [DISCUSSION] Attracting new contributors

2021-04-27 Thread Benedict Elliott Smith
I agree, and have said as much in the past. We have limited options for 
improving this, though. I've proposed in the past a rotating role for 
contributors to respond to Jira comments, but even once a committer is involved 
their other commitments may make feedback rounds take a long time.

However, even this is likely to have at most a modest impact. Most contributors 
don't stick around after making a patch, even if given tight feedback loops 
(which does happen). They just want their bug fixed - which is great, but we 
should set ourselves realistic expectations.

The community needs to do better specifically with new active contributors who 
stick around for a few tickets, and to produce better (passive) incentives for 
people to stick around for a few tickets.

On 27/04/2021, 13:22, "Stefan Miklosovic"  
wrote:

It really boils down just to a simple "problem" to have enough
committers to look at it over a (preferably) shorter period of time
and make that feedback loop shorter. That's it. You might have the
best guides and whatever but if a dust settles at it no guide will
make it happen.

On Tue, 27 Apr 2021 at 14:14, Benedict Elliott Smith
 wrote:
>
> I think that all of the bootcamps we ran in the past produced precisely 
zero new contributors.
>
> I wonder if it would be more impactful to produce slightly more permanent 
content, such as step-by-step guides to producing a simple patch for some 
subsystem. Perhaps if people want to, a recording could be created of going 
through that guide as well.
>
> That said, if there are new contributors actively trying to participate, 
organising a periodic group chat to talk through one of the issues that they 
may be working on together as a group with an active contributor might make 
sense, and be more targeted in focus?
>
>
> On 27/04/2021, 12:45, "Manish G"  wrote:
>
> Contributor bootcamps can really help new people like me.
>
> On Tue, Apr 27, 2021, 5:08 PM Jeremy Hanna 

> wrote:
>
> > One thing we've done in the past is contributor bootcamps along 
with the
> > the new contributor guide and the LHF complexity tickets.  
Unfortunately, I
> > don't know that the contributor bootcamps were ever recorded.
> > Presentations were done to introduce people to the codebase 
generally (I
> > think Gary did this at one point) as well as specific parts of the
> > codebase, such as compaction.  What if we broke up the codebase into
> > categories and people could volunteer to do a short introduction to 
that
> > part of the codebase in the form of a video screenshare.  I don't 
think
> > this would take the place of mentoring someone, but if we had 
introductions
> > to different parts of the codebase, I think it would lower the bar 
for
> > interested contributors and scale the existing group more easily.  
Besides
> > the codebase itself, we could also introduce things like CI 
practices or
> > testing or documentation.
> >
> > > On Apr 24, 2021, at 12:49 AM, Benjamin Lerer  
wrote:
> > >
> > > Hi Everybody,The Apache Cassandra project always had some issues 
to
> > > attract and retain new contributors. I think it would be great to 
change
> > > this.According to the "How to Attract New Contributors" blog post 
(
> > > https://www.redhat.com/en/blog/how-attract-new-contributors) 
having a
> > good
> > > onboarding process is a critical part. How to contribute should be
> > obvious
> > > and contributing should be as easy as possible for all the 
different
> > types
> > > of contributions: code, documentation, web-site or help with our 
CI
> > > infrastructure.I would love to hear about your ideas on how we can
> > improve
> > > things.If you are new in the community, do not hesitate to share 
your
> > > experience and your suggestions on what we can do to make it 
easier for
> > you
> > > to contribute.
> >
> >
> > 
-
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
&

Re: [DISCUSSION] Attracting new contributors

2021-04-27 Thread Benedict Elliott Smith
I think that all of the bootcamps we ran in the past produced precisely zero 
new contributors.

I wonder if it would be more impactful to produce slightly more permanent 
content, such as step-by-step guides to producing a simple patch for some 
subsystem. Perhaps if people want to, a recording could be created of going 
through that guide as well.

That said, if there are new contributors actively trying to participate, 
organising a periodic group chat to talk through one of the issues that they 
may be working on together as a group with an active contributor might make 
sense, and be more targeted in focus?


On 27/04/2021, 12:45, "Manish G"  wrote:

Contributor bootcamps can really help new people like me.

On Tue, Apr 27, 2021, 5:08 PM Jeremy Hanna 
wrote:

> One thing we've done in the past is contributor bootcamps along with the
> the new contributor guide and the LHF complexity tickets.  Unfortunately, 
I
> don't know that the contributor bootcamps were ever recorded.
> Presentations were done to introduce people to the codebase generally (I
> think Gary did this at one point) as well as specific parts of the
> codebase, such as compaction.  What if we broke up the codebase into
> categories and people could volunteer to do a short introduction to that
> part of the codebase in the form of a video screenshare.  I don't think
> this would take the place of mentoring someone, but if we had 
introductions
> to different parts of the codebase, I think it would lower the bar for
> interested contributors and scale the existing group more easily.  Besides
> the codebase itself, we could also introduce things like CI practices or
> testing or documentation.
>
> > On Apr 24, 2021, at 12:49 AM, Benjamin Lerer  wrote:
> >
> > Hi Everybody,The Apache Cassandra project always had some issues to
> > attract and retain new contributors. I think it would be great to change
> > this.According to the "How to Attract New Contributors" blog post (
> > https://www.redhat.com/en/blog/how-attract-new-contributors) having a
> good
> > onboarding process is a critical part. How to contribute should be
> obvious
> > and contributing should be as easy as possible for all the different
> types
> > of contributions: code, documentation, web-site or help with our CI
> > infrastructure.I would love to hear about your ideas on how we can
> improve
> > things.If you are new in the community, do not hesitate to share your
> > experience and your suggestions on what we can do to make it easier for
> you
> > to contribute.
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSSION] Next release roadmap

2021-04-26 Thread Benedict Elliott Smith
I think my earlier response vanished into the moderator queue. Just a few 
comments:

1) The Paxos latency (and correctness) improvements I think should land in 
4.0.x, as we have introduced a fairly significant regression and this work 
mostly resolves outstanding issues with LWTs today.
2) If we aim to deliver multi-partition LWTs in 4.x/5.0, we may likely want to 
pair this with work to further reduce latency beyond the above work, as 
contention will become a more significant problem. Should I be involved in 
delivering multi-partition LWTs I will also be aiming to deliver even lower 
latencies for the release they land in.
3) To support all of the above work, I also aim to deliver a Simulator facility 
for deterministically executing cluster workloads under adversarial scheduling 
(i.e. that intercepts all message and thread events and evaluates them 
sequentially, in pseudorandom order), alongside linearizability verification 
built upon this. This work will include (or have as a prerequisite) significant 
clean-ups to internal functionality like executors, use of futures and other 
concurrency primitives, and mocking out of time and the filesystem.


On 23/04/2021, 14:46, "Benjamin Lerer"  wrote:

Hi everybody,

Thanks for all the responses. I went through the emails and aggregated the
proposals to give us an idea on where we stand at this point.

I only included the improvements in the list and left on the side the bug
fixes.
Regarding bug fixes, I wonder if we should not have discussions every month
to discuss what are the important issues that should be fixed in priority.
I feel that we sometimes tend to forget old issues even if they are more
important than some new ones.

Do not hesitate to tell me if I missed something or misinterpreted some
proposal.

*Query side improvements:*

  * Storage Attached Index or SAI. The CEP can be found at

https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index
  * Add support for OR predicates in the CQL where clause
  * Allow to aggregate by time intervals (CASSANDRA-11871) and allow UDFs
in GROUP BY clause
  * Ability to read the TTL and WRITE TIME of an element in a collection
(CASSANDRA-8877)
  * Multi-Partition LWTs
  * Materialized views hardening: Addressing the different Materialized
Views issues (see CASSANDRA-15921 and [1] for some of the work involved)

*Security improvements:*

  * SSTables encryption (CASSANDRA-9633)
  * Add support for Dynamic Data Masking (CEP pending)
  * Allow the creation of roles that have the ability to assign arbitrary
privileges, or scoped privileges without also granting those roles access
to database objects.
  * Filter rows from system and system_schema based on users permissions
(CASSANDRA-15871)

*Performance improvements:*

  * Trie-based index format (CEP pending)
  * Trie-based memtables (CEP pending)
  * Paxos improvements: Paxos / LWT implementation that would enable the
database to serve serial writes with two round-trips and serial reads with
one round-trip in the uncontended case

*Safety/Usability improvements:*

  * Guardrails. The CEP can be found at

https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails
  * Add ability to track state in repair (CASSANDRA-15399)
  * Repair coordinator improvements (CASSANDRA-15399)
  * Make incremental backup configurable per keyspace and table
(CASSANDRA-15402)
  * Add ability to blacklist a CQL partition so all requests are ignored
(CASSANDRA-12106)
  * Add default and required keyspace replication options (CASSANDRA-14557)
  * Transactional Cluster Metadata: Use of transactions to propagate
cluster metadata
  * Downgrade-ability: Ability to downgrade to downgrade in the event that
a serious issue has been identified

*Pluggability improvements:*

  * Pluggable schema manager (CEP pending)
  * Pluggable filesystem (CEP pending)
  * Pluggable authenticator for CQLSH (CASSANDRA-16456). A CEP draft can be
found at

https://docs.google.com/document/d/1_G-OZCAEmDyuQuAN2wQUYUtZBEJpMkHWnkYELLhqvKc/edit
  * Memtable API (CEP pending). The goal being to allow improvements such
as CASSANDRA-13981 to be easily plugged into Cassandra

*Memtable pluggable implementation:*

  * Enable Cassandra for Persistent Memory (CASSANDRA-13981)

*Other tools:*

  * CQL compatibility test suite (CEP pending)

Le jeu. 22 avr. 2021 à 16:11, Benjamin Lerer  a écrit :

> Finally, I think it's important we work to maintain trunk in a shippable
>> state.
>
>
> I am +100 on this. Bringing Cassandra to such a state was a huge effort
> and keeping it that way will help us to ensure the quality of the
> releases.
>
> Le jeu. 15 avr. 2021 à 17:30, Scott 

Re: [DISCUSS] Releases after 4.0

2021-04-01 Thread Benedict Elliott Smith
> it would make sense to put that information on a *Roadmap* page

That makes sense to me, and I'm looking forward to agreeing a roadmap. I think 
it will be nice for the project to start properly looking to the future again.

On 01/04/2021, 14:06, "Benjamin Lerer"  wrote:

Thanks everybody.

I opened CASSANDRA-16556
 to update the end
of support dates for the different versions. I assumed that we will manage
to release 4.0-GA in April (otherwise I will re-update them ;-) )

Concerning the release cadence, it seems that we do not have a proper place
to put that information on our website. In an offline discussion Mick
raised the point that it would make sense to put that information on a 
*Roadmap
*page. That makes sense to me. I will trigger the roadmap discussion next
week and once we agree on some roadmap, I propose to create a new page for
it where I will include the information on the release cadence.

I am fully open to another proposal.


On Tue, Mar 30, 2021 at 11:24 AM Sam Tunnicliffe  wrote:

> +1
>
> > On 29 Mar 2021, at 15:41, Joseph Lynch  wrote:
> >
> > I am slightly concerned about removing support for critical bug fixes
> > in 3.0 on a short time-frame (<1 year). I know of at least a few major
> > installations, including ours, who are just now able to finish
> > upgrades to 3.0 in production due to the number of correctness and
> > performance bugs introduced in that release which have only been
> > debugged and fixed in the past ~2 years.
> >
> > I like the idea of the 3-year support cycles, but I think since
> > 3.0/3.11/4.0 took so long to stabilize to a point folks could upgrade
> > to, we should reset the clock somewhat. What about the following
> > assuming an April 2021 4.0 cut:
> >
> > 4.0: Fully supported until April 2023 and high severity bugs until
> > April 2024 (2 year full, 1 year bugfix)
> > 3.11: Fully supported until April 2022 and high severity bugs until
> > April 2023 (1 year full, 1 year bugfix).
> > 3.0: Supported for high severity correctness/performance bugs until
> > April 2022 (1 year bugfix)
> > 2.2+2.1: EOL immediately.
> >
> > Then going forward we could have this nice pattern when we cut the
> > yearly release:
> > Y(n-0): Support for 3 years from now (2 full, 1 bugfix)
> > Y(n-1): Fully supported for 1 more year and supported for high
> > severity correctness/perf bugs 1 year after that (1 full, 1 bugfix)
> > Y(n-2): Supported for high severity correctness/bugs for 1 more year (1
> bugfix)
> >
> > What do you think?
> > -Joey
> >
> > On Mon, Mar 29, 2021 at 9:39 AM Benjamin Lerer
> >  wrote:
> >>
> >> Thanks to everybody and sorry for not finalizing that email thread
> sooner.
> >>
> >> For the release cadence the agreement is:* one release every year +
> >> periodic trunc snapshot*
> >> For the number of releases being supported the agreement is 3.  *Every
> >> incoming release should be supported for 3 years.*
> >>
> >> We did not reach a clear agreement on several points :
> >> * The naming of versions: semver versus another approach and the name 
of
> >> snapshot versions
> >> * How long will we support 3.11. Taking into account that it has been
> >> released 4 years ago does it make sense to support it for the next 3
> years?
> >>
> >> I am planning to open some follow up discussions for those points in 
the
> >> coming weeks.
> >>
> >> When there is an agreement we should document the changes on the 
webpage
> >>> and also highlight it as part of the 4.0 release material as it's an
> >>> important change to the release cycle and LTS support.
> >>>
> >>
> >> It is a valid point. Do you mind if I update the documentation when we
> have
> >> clarified the version names and that we have a more precise idea of 
when
> >> 4.0 GA will be released? That will allow us to make a clear message on
> when
> >> to expect the next supported version.
> >>
> >> On Mon, Feb 8, 2021 at 10:05 PM Paulo Motta 
> >> wrote:
> >>
> >>> +1 to the yearly release cadence + periodic trunk snapshots + support
> to 3
> >>> previous release branches.. I think this will give some nice
> predictability
> >>> to the project.
> >>>
> >>> When there is an agreement we should document the changes on the
> webpage
> >>> and also highlight it as part of the 4.0 release material as it's an
> >>> important change to the release cycle and LTS support.
> >>>
> >>> Em sex., 5 de fev. de 2021 às 18:08, Brandon Williams <
> dri...@gmail.com>
> >>> escreveu:
> >>>
>  Perhaps on my third try...  keep three branches total, including 
3.11:
>  

Re: Download source release / binary files in source release

2021-03-30 Thread Benedict Elliott Smith
There is no legal reason; this was disavowed on LEGAL-288. The ostensible 
reason is that Roy Fielding, who filed the papers of incorporation, interprets 
the charter to require this. I don't think, however, anybody has challenged 
this interpretation of the charter. I certainly do not interpret it to require 
this, even if you take a very narrow view of open source software.

On 30/03/2021, 16:47, "Jordan West"  wrote:

I have yet to see a legal reason why including binaries in packages is a
bad thing. I’ve read the thread and the documents linked. In fact, it looks
like it’s done specifically to avoid legal issues with copy left licenses.
It’s very common for Apache to hold on to past policies at the expense of
its projects’ users (see the slow transition to Git) all while claiming to
do it for their benefit. It’s a decade later, the landscape has changed. We
should absolutely protect the project legally but trying to guess the
spirit of open source at the cost of users is of little benefit to all
stakeholders.

In the end this discussion has moved to a list most of us don’t have access
to and when asked to contribute the original reporter basically said “Your
problem. You fix it” despite having a significant amount of experience in
making builds “comply”.  It’s also causing the delay of the projects first
major release in 5 years, that many of this list have contributed large
portions of their life too. That’s not very in the spirit of open source
and I am disappointed again by the ASFs role in this — which continues to
be ambiguous and at the cost of its users and developers.

All that said, if we fix this great. If we don’t, eh. As long as we are
legally compliant with the licenses of the dependencies we use we should
value convenience for users over pedanticsm and statements that are a
decade old. If there is a legal reason to change this it’s been explained
poorly by the ASF and needs clarification. It also can only be so important
if we are only catching it now after so many releases with the project.

Jordan

On Tue, Mar 30, 2021 at 7:19 AM Joshua McKenzie 
wrote:

> FWIW I don't have access to what's being raised with the board so
> effectively can't participate in this discussion beyond +1'ing Jirsa:
>
> Based on this point, I personally won't vote to approve a future release
> > with binary packages, but I also strongly disagree with the assertion in
> > that same past thread that it's worth nuking a 10+year history of
> releases.
> > That's the type of action that would severely diminish trust in the
> > foundation.
>
>
> We SHOULD look at what's required to rebuild PAST releases.
>
>
> We should keep in mind what's best for our users. While avoiding including
> compiled binaries that can't be verified as open source makes complete
> sense from a "maximize safety to our users" perspective and can be done on
> forward-going releases with minimal lift, we also have to consider how we
> get There from Here on past releases. Pulling the rug out from our entire
> user-base and releases after over a decade based on a conversation that
> happened off-list (i.e. not on the C* dev list) 9 years ago is, hopefully
> we can agree, not in our users' best interests nor the best interests of
> this project's longevity.
>
> ~Josh
>
> On Tue, Mar 30, 2021 at 9:38 AM Mick Semb Wever  wrote:
>
> > >
> > > It good to see you are taking action, but I think the situation is a
> > > little more seriously that you may realise, I suggest you look at what
> > > actions the board has taken in similar situations in the past. I'll
> > update
> > > the board agenda item to reflect the current situation.
> > >
> >
> >
> > The current board agenda item is still not accurate. The PMC members and
> > the project are not ignoring the issue.
> >
> > Also, it would be nice if you could reference this thread, in both the
> > board's agenda item and ML post, to allow people to have a complete view
> of
> > the discussion.
> >
> > I am happy to add information to the agenda item if you agree to it.
> > Better yet, I suggest that we work together in public to word it. Most
> > people on this list do not have access to the message. There is a
> community
> > here, and the way we work together to solve problems matters.
> >
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Download source release / binary files in source release

2021-03-30 Thread Benedict Elliott Smith
As I'm sure you're aware, only a couple of people in the community are able to 
follow or participate in board discussions without being expressly included.

On 30/03/2021, 09:51, "Justin Mclean"  wrote:

Hi,

JFYI I've started a discussion about this on the board list [1]. Note that 
that list is for the board to conduct business on, so please take care in what 
you post there.

Thanks,
Justin

1. 
https://lists.apache.org/thread.html/rda27b6bc832d7e36eb12cc93343a358f5848bd10198e0165110ed4fc%40%3Cboard.apache.org%3E

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Download source release / binary files in source release

2021-03-29 Thread Benedict Elliott Smith
> I think the situation is a little more seriously that you may realise, I 
> suggest you look at what actions the board has taken in similar situations in 
> the past

 

I thought you had already indicated the likely remedy: the removal of 
non-compliant releases?

 

I’m puzzled by your desire for immediate action. The enquiries you initiated 
are still ongoing. Surely, once they conclude, they will afford some time for 
the community to decide how to accommodate the ruling?

 

 

From: Justin Mclean 
Sent: Monday, March 29, 2021 11:45:29 PM
To: dev@cassandra.apache.org 
Subject: Re: Download source release / binary files in source release 

 

Hi,

> To the PMC: the next boarding meeting is on 21st April, so we have time to
> get this release out and probably more as well (hopefully with the fix
> for CASSANDRA-16391) before that date.

If I was a PMC member here, I would reconsider making that release without 
fixing this issue. I would also discuss on list how previous releases can be 
corrected and how much effort that might be. This is just a friendly suggestion.

> Would you mind switching hats here and also representing the project by
> including the following items:
>  - the project is actively working on addressing the fault,
>  - the project has taken this approach since incubation²,
>  - no one has mentioned it until now, and in the middle of a release vote
> are expecting an impromptu change,
>  - there is agreement, in the project and on legal³, that the ASF policy
> docs are not clear on the matter and needs to be improved,
>  - the project is very keen to see those docs improved asap so to be
> confident of changes they invest in.

It good to see you are taking action, but I think the situation is a little 
more seriously that you may realise, I suggest you look at what actions the 
board has taken in similar situations in the past. I'll update the board agenda 
item to reflect the current situation.

Thanks,
Justin

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Releases after 4.0

2021-03-29 Thread Benedict Elliott Smith
+1

On 29/03/2021, 21:16, "Ben Bromhead"  wrote:

+1 good sensible suggestion.

On Tue, Mar 30, 2021 at 7:37 AM Ekaterina Dimitrova 
wrote:

> I also like the latest suggestion, +1, thank you
>
> On Mon, 29 Mar 2021 at 14:16, Yifan Cai  wrote:
>
> > +1
> >
> > On Mon, Mar 29, 2021 at 8:42 AM J. D. Jordan 
> > wrote:
> >
> > > +1 that deprecation schedule seems reasonable and a good thing to move
> > to.
> > >
> > > > On Mar 29, 2021, at 10:23 AM, Benjamin Lerer 
> > wrote:
> > > >
> > > > The proposal sounds good to me too.
> > > >
> > > >> Le lun. 29 mars 2021 à 16:48, Brandon Williams  a
> > > écrit :
> > > >>
> > > >>> On Mon, Mar 29, 2021 at 9:41 AM Joseph Lynch <
> joe.e.ly...@gmail.com>
> > > >>> wrote:
> > > >>> I like the idea of the 3-year support cycles, but I think since
> > > >>> 3.0/3.11/4.0 took so long to stabilize to a point folks could
> upgrade
> > > >>> to, we should reset the clock somewhat.
> > > >>
> > > >> I agree, the length of time to release 4.0 and the initialization
> of a
> > > >> new release cycle requires some special consideration for current
> > > >> releases.
> > > >>
> > > >>> 4.0: Fully supported until April 2023 and high severity bugs until
> > > >>> April 2024 (2 year full, 1 year bugfix)
> > > >>> 3.11: Fully supported until April 2022 and high severity bugs 
until
> > > >>> April 2023 (1 year full, 1 year bugfix).
> > > >>> 3.0: Supported for high severity correctness/performance bugs 
until
> > > >>> April 2022 (1 year bugfix)
> > > >>> 2.2+2.1: EOL immediately.
> > > >>>
> > > >>> Then going forward we could have this nice pattern when we cut the
> > > >>> yearly release:
> > > >>> Y(n-0): Support for 3 years from now (2 full, 1 bugfix)
> > > >>> Y(n-1): Fully supported for 1 more year and supported for high
> > > >>> severity correctness/perf bugs 1 year after that (1 full, 1 
bugfix)
> > > >>> Y(n-2): Supported for high severity correctness/bugs for 1 more
> year
> > (1
> > > >> bugfix)
> > > >>
> > > >> This sounds excellent to me, +1.
> > > >>
> > > >>
> -
> > > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >>
> > > >>
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> >
>


-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
 | +64 27 383 8975



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Download source release / binary files in source release

2021-03-28 Thread Benedict Elliott Smith
I thought you had indicated you were anyway raising this with the board?

Either way, I don't personally see any issue with delaying the vote by a week 
or so if it will bring some official clarity to this issue, now it has been 
raised. How quickly can we expect to see changes reflected in the official 
policy documents?


On 28/03/2021, 11:48, "Justin Mclean"  wrote:

HI,

> I recommend that the PMC continues its vote on 4.0-rc1.

In that case I'll need to raise this issue with theASF board.

Justin

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Download source release / binary files in source release

2021-03-28 Thread Benedict Elliott Smith
> I guess you are asking for something official from VP Legal Affairs or the 
> ASF board? If so I can make that happen.

I would prefer the official policy pages to be updated to have a clear 
statement on this, so this problem can be solved in perpetuity.

> IMO it does, the project can choose to ignore that if they want. I suggest 
> you read what Roy wrote on this subject

I have read all of the above, and none of these rise to official statements of 
policy that clearly accord with your view. I'm not ignoring anything, surely 
you can at least agree that official statements on this are unclear?

> Saying we all volunteers, while true, doesn't remove the responsibility the 
> PMC has to make sure its releases are open source / comply with ASF policy.

The PMC has acted in good faith in this matter, and has followed the official 
policy documents as it interprets them. How else is the PMC meant to discern 
ASF policy? I am happy to accept there has been a failure to communicate the 
intended policy, but asking the PMC to bear a significant cost correcting 
actions taken in good faith on reasonable readings of the policy documents 
would not in my opinion be reasonable.



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Download source release / binary files in source release

2021-03-28 Thread Benedict Elliott Smith
Hi Justin,

You are probably right, but as far as I am aware you are not an official source 
of ASF policy on this matter. The official policy pages do not stipulate this, 
so I would appreciate if you could get them updated to accord more clearly your 
beliefs before the project makes the necessary changes.

> If the board was to get involved then I think it would be likely, going on 
> previous similar situations, it would ask for the project to remove the non 
> complainant releases.

I can only speak for myself, but I am happy to ensure future releases follow 
this policy once it is clearly stated as an official ASF policy by the official 
policy documentation.

As for prior releases, since 1) the official guidance has not required this to 
date; and 2) there is no _legal_ reason to require this (per the LEGAL thread 
you linked), I have no personal intention of going back to modify prior 
releases.

The board is of course free to wield whichever tools it likes, but please 
remember that this is a volunteer endeavour. Expecting project members to 
volunteer days of their time to retroactively meet a policy they had not been 
informed of, was in no official guidance, and has no legal reasoning behind it, 
is a tough sell.



On 28/03/2021, 05:15, "Justin Mclean"  wrote:

Hi,

I can say with 100% certainty that:
- ASF source releases cannot contain compiled code (jars, dlls or the like)
- ASF source releases cannot include Category B code compiled or not 
compiled
- ASF convenience binaries can contain Category B compiled code

In various roles at the ASF including PMC member, mentor, VP Incubator I 
have reviewed somewhere between 600 and 700 releases (and possible more) over 
the last decade. Every single time I've seen a source release candidate 
contained compiled code I have voted -1 on it. I've see many many others do the 
same and I cannot recall any source release containing compiled code being made 
an ASF release. At ApacheCon I often give talks on how to make releases. Every 
release check list I've seen includes a check for compile code. I could go on 
but I think that's probably enough context.

On a couple of occasions there's been a little confusion around this so 
I'll make sure the above is clearly stated in our legal FAQ/policy, after some 
discussion on the legal discuss list.

This is not my project and I hope the PMC looks into this, decides what to 
do, and does what they think is needed to correct this. It would be best not to 
have to get the ASF board involved (of which I'm a current member). If the 
board was to get involved then I think it would be likely, going on previous 
similar situations, it would ask for the project to remove the non complainant 
releases.

Thanks,
Justin

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Download source release / binary files in source release

2021-03-27 Thread Benedict Elliott Smith
> Including Category B binaries in a source release is mentioned in ASF policy 
> here [1].

Sorry to keep banging the same drum, but I read this before our earlier emails, 
and if this is the intended meaning it needs to be rewritten. I also doubt this 
was the intended meaning of the original author. The subject of the bracketed 
clause is grammatically ambiguous, so other contextual cues must be used to 
interpret it:

> BINARY-ONLY INCLUSION CONDITION
> Unless otherwise specified, all Category B licensed works should be included 
> in binary-only form in Apache Software Foundation convenience binaries (and 
> not source code).

1. Source _code_ cannot (pretty much by definition) contain any binaries, so 
the only semantically plausible interpretation is prohibiting the inclusion of 
the dependency's source code in the release
2. The heading "binary-only inclusion" provides the motivating context for the 
sentence, strongly suggesting the bracket is likely a clarifying reiteration 
that the dependency's source code should not be included
3. Elsewhere in this document source code appears to refer to the normal 
meaning, i.e. source code, not source packages/releases

Since we as a community use these websites to determine our correct actions, 
I'm not sure we can really help fix this with PRs, since we have no independent 
primary knowledge of the desired content. I still don't really know for sure 
the source of your expectation.

I'm sure we can update our build scripts to fetch these libraries instead of 
bundling them, if we can get some kind of official confirmation that this is 
indeed necessary. Finding people willing to volunteer to go back and modify 
earlier releases may be a tall order, however.


On 27/03/2021, 23:24, "Justin Mclean"  wrote:

Hi,

> This is a known problem. Please help out.

That is the reason of having those jars in the source release? Could it 
just be replaced by a series of curl commands in a shell script?

I can help fix up the LICENSE and NOTICE files, but the inclusion of 
compiled code in a source release is the bigger is here. 

> It has been made mention of on most of the recent vote release threads, 
and
> we have a ticket CASSANDRA-16391 open to deal with it (eta is immediately
> after 4.0).

I'm not sure that fixing this after a 4.0 release would be an option and 
the question may come up of what needs to be done with the source releases that 
are already public.

Including Category B binaries in a source release is mentioned in ASF 
policy here [1].

Thanks,
Justin

1. https://apache.org/legal/resolved.html#binary-only-inclusion-condition

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Download source release / binary files in source release

2021-03-27 Thread Benedict Elliott Smith
> Because a source release could not contain compiled code

Again, I don't see this stated explicitly. Perhaps the guidance should be 
clarified if this is the intention?

On 27/03/2021, 01:59, "Justin Mclean"  wrote:

Hi,

> Could you clarify why you think this is incompatible with ASF policy?

Because a source release could not contain compiled code (category A or 
otherwise), if it does then it not open source. See for instance [1]. This is 
why tools like Apache Rat look for certain types of binary files in release 
artefacts.

This has also come up before a number of times e.g. including the gradle 
jar in a source release e.g [2]

Thanks,
Justin

1. http://www.apache.org/legal/release-policy.html#artifacts
2. https://issues.apache.org/jira/browse/LEGAL-288

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Download source release / binary files in source release

2021-03-27 Thread Benedict Elliott Smith
> I suggest you read the whole thread. The outcome was that it's OK to put jars 
> in version control but not in a source release.

There was no outcome AFAICT? There was a suggestion that was explicitly 
caveated as only a suggestion that required formal approval by VP Legal, which 
does not seem to have been provided in the thread?

I don't mind particularly what the outcome is here, but I'm surprised that the 
official release policy does not state this explicitly, if it is such an 
important point. It would be very easy for it to do so.


On 27/03/2021, 03:13, "Justin Mclean"  wrote:

HI,

> The notion that these jars are "not open source" and must therefor not be 
used in the way they are intended is a preposterous stance

I suggest you read the whole thread. The outcome was that it's OK to put 
jars in version control but not in a source release.

This has been discussed several times on the incubator list and elsewhere. 
What most projects do is provide a script that a user can run to download the 
needed bits.

I recall Roy having something to say it a while ago:

https://lists.apache.org/thread.html/0d6952401efbd72efbeafc86ebabbf7267f6c9d2d59f32dd93b7d4db%401332868587%40%3Cgeneral.incubator.apache.org%3E

Thanks,
Justin

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Download source release / binary files in source release

2021-03-26 Thread Benedict Elliott Smith
> When I did download the the 3.11.10 release [2], I can see that it contained 
> compiled binary files (jars), which I don't think is in line with ASF release 
> policy.

Could you clarify why you think this is incompatible with ASF policy? AFAICT 
the policy only stipulates that binary releases _will_ contain dependencies in 
binary form, and that releases _will not_ contain (Category B) dependencies in 
source code form. I cannot find any stipulation that source releases must not 
contain binary forms of dependencies, but I may be missing it.


On 26/03/2021, 23:48, "Justin Mclean"  wrote:

Hi,

I noticed the download page [1] contains links to convenience binaries but 
not to the actual release. I can see that the source is in the place on the 
mirrors but there's not an obvious link to it.

When I did download the the 3.11.10 release [2], I can see that it 
contained compiled binary files (jars), which I don't think is in line with ASF 
release policy.

Thanks,
Justin

1. https://cassandra.apache.org/download/
2. 
https://www.apache.org/dyn/closer.lua/cassandra/3.11.10/apache-cassandra-3.11.10-src.tar.gz

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Project Roadmap

2021-03-02 Thread Benedict Elliott Smith
Yep, I'm not proposing we start discussions right now. Just wanted to float the 
idea, see how people felt about it and how people might like it to look 
procedurally.

My only goal is that we have a rough roadmap agreed before GA, to publish 
alongside any announcement.

On 02/03/2021, 09:57, "Benjamin Lerer"  wrote:

>
> I completely agree we should consider any roadmap a living document that
> we expect to revise, but my hope is that we will formalise an agreed
> roadmap by vote.


I believe that everybody will be in favor of discussing the plans for the
next release. We do not really need to commit to anything at this point.
My proposal would be to get 4.0-RC out of the door and let a couple of
weeks for people to think about the next release. Then we can trigger a
discussion for everybody on what they are willing to focus on first.
What do you think?

Le mar. 2 mars 2021 à 06:29, Berenguer Blasi  a
écrit :

> +1000 on some form of roadmap for visibility and planning
>
    > On 1/3/21 18:35, Benedict Elliott Smith wrote:
> > I completely agree we should consider any roadmap a living document that
> we expect to revise, but my hope is that we will formalise an agreed
> roadmap by vote.  My view is that we should aim to regularly revisit the
> roadmap, and anticipate that it will be revised based on contributors'
> shifting priorities and pressures.
> >
> > I think the important thing is that in revising the roadmap we'll again
> make explicit trade-offs as a community about what we want to invest in
> before the next release.
> >
> >
> > On 01/03/2021, 13:26, "Benjamin Lerer"  wrote:
> >
> > Having an open discussion about what we want to release as a
> community on
> > the next version makes total sense to me. I also agree that the
> roadmap
> > should not be written on stone and that we should be flexible if we
> believe
> > that we need to.
> > We should also take this discussion as an opportunity to discuss how
> we
> > plan to use CEPs moving forward.
> > .
> >
> > Le lun. 1 mars 2021 à 13:21, Benedict Elliott Smith <
> bened...@apache.org> a
> > écrit :
> >
> > > I guess I meant that I don't foresee roadmap discussions having a
> hard
> > > requirement of CEP for all goals we might discuss, though it would
> probably
> > > be expected that many of the biggest proposals would already at
> least have
> > > a minimal CEP to be filed, you're right.
> > >
> > > Certainly if an advanced CEP exists I hadn't meant to exclude it,
> I more
> > > meant that the CEP process is quite involved and spans the
> lifetime of the
> > > work, and a roadmap helps the project decide on goals irrespective
> of a
> > > CEP, and helps resource a CEP early in its lifecycle.
> > >
> > > On 01/03/2021, 11:15, "Mick Semb Wever"  wrote:
> > >
> > > >
> > > > I think of a roadmap as a pre-CEP activity for upcoming
> releases,
> > > items
> > > > thereon beginning the CEP process, …
> > > >
> > >
> > >
> > > What about having it the other way around? That the roadmap is
> a
> > > visualisation of the CEPs, i.e. those past initial triage that
> have
> > > initial
> > > commitment and momentum. A reflective approach of the roadmap,
> just a
> > > visualisation of existing processes, prevents the adding of a
> new
> > > process
> > > to the community. It will also incentivise the thoroughness of
> new
> > > CEPs.
> > >
> > > The benefit of having the roadmap as a separate manual process
> pre-CEP
> > > might save us the cost of creating CEPs that get rejected, but
> I can't
> > > see
> > > that actually being a problem for us.
> > >
> > > +1 to having the roadmap, in any form.
> > >
> > >
> > >
> > >
> -
> > > To unsubscribe, e-mail: dev

Re: Project Roadmap

2021-03-01 Thread Benedict Elliott Smith
I completely agree we should consider any roadmap a living document that we 
expect to revise, but my hope is that we will formalise an agreed roadmap by 
vote.  My view is that we should aim to regularly revisit the roadmap, and 
anticipate that it will be revised based on contributors' shifting priorities 
and pressures.

I think the important thing is that in revising the roadmap we'll again make 
explicit trade-offs as a community about what we want to invest in before the 
next release.


On 01/03/2021, 13:26, "Benjamin Lerer"  wrote:

Having an open discussion about what we want to release as a community on
the next version makes total sense to me. I also agree that the roadmap
should not be written on stone and that we should be flexible if we believe
that we need to.
We should also take this discussion as an opportunity to discuss how we
plan to use CEPs moving forward.
.

Le lun. 1 mars 2021 à 13:21, Benedict Elliott Smith  a
écrit :

> I guess I meant that I don't foresee roadmap discussions having a hard
> requirement of CEP for all goals we might discuss, though it would 
probably
> be expected that many of the biggest proposals would already at least have
> a minimal CEP to be filed, you're right.
>
> Certainly if an advanced CEP exists I hadn't meant to exclude it, I more
> meant that the CEP process is quite involved and spans the lifetime of the
> work, and a roadmap helps the project decide on goals irrespective of a
> CEP, and helps resource a CEP early in its lifecycle.
>
> On 01/03/2021, 11:15, "Mick Semb Wever"  wrote:
>
> >
> > I think of a roadmap as a pre-CEP activity for upcoming releases,
> items
> > thereon beginning the CEP process, …
> >
>
>
> What about having it the other way around? That the roadmap is a
> visualisation of the CEPs, i.e. those past initial triage that have
> initial
> commitment and momentum. A reflective approach of the roadmap, just a
> visualisation of existing processes, prevents the adding of a new
> process
> to the community. It will also incentivise the thoroughness of new
> CEPs.
>
> The benefit of having the roadmap as a separate manual process pre-CEP
> might save us the cost of creating CEPs that get rejected, but I can't
> see
> that actually being a problem for us.
>
> +1 to having the roadmap, in any form.
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Project Roadmap

2021-03-01 Thread Benedict Elliott Smith
To say it another way, I would expect that once a roadmap is agreed we would 
ensure there was a CEP for every item. I don't know whether every CEP would 
necessarily be voted into a roadmap, nor whether every item voted on would have 
a CEP before the vote is conducted, and don't have a strongly held position on 
either (but think it's probably better they're not intrinsically tied in either 
direction).


On 01/03/2021, 12:13, "Benedict Elliott Smith"  wrote:

I guess I meant that I don't foresee roadmap discussions having a hard 
requirement of CEP for all goals we might discuss, though it would probably be 
expected that many of the biggest proposals would already at least have a 
minimal CEP to be filed, you're right. 

Certainly if an advanced CEP exists I hadn't meant to exclude it, I more 
meant that the CEP process is quite involved and spans the lifetime of the 
work, and a roadmap helps the project decide on goals irrespective of a CEP, 
and helps resource a CEP early in its lifecycle.

On 01/03/2021, 11:15, "Mick Semb Wever"  wrote:

>
> I think of a roadmap as a pre-CEP activity for upcoming releases, 
items
> thereon beginning the CEP process, …
>


What about having it the other way around? That the roadmap is a
visualisation of the CEPs, i.e. those past initial triage that have 
initial
commitment and momentum. A reflective approach of the roadmap, just a
visualisation of existing processes, prevents the adding of a new 
process
to the community. It will also incentivise the thoroughness of new CEPs.

The benefit of having the roadmap as a separate manual process pre-CEP
might save us the cost of creating CEPs that get rejected, but I can't 
see
that actually being a problem for us.

+1 to having the roadmap, in any form.



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Project Roadmap

2021-03-01 Thread Benedict Elliott Smith
I guess I meant that I don't foresee roadmap discussions having a hard 
requirement of CEP for all goals we might discuss, though it would probably be 
expected that many of the biggest proposals would already at least have a 
minimal CEP to be filed, you're right. 

Certainly if an advanced CEP exists I hadn't meant to exclude it, I more meant 
that the CEP process is quite involved and spans the lifetime of the work, and 
a roadmap helps the project decide on goals irrespective of a CEP, and helps 
resource a CEP early in its lifecycle.

On 01/03/2021, 11:15, "Mick Semb Wever"  wrote:

>
> I think of a roadmap as a pre-CEP activity for upcoming releases, items
> thereon beginning the CEP process, …
>


What about having it the other way around? That the roadmap is a
visualisation of the CEPs, i.e. those past initial triage that have initial
commitment and momentum. A reflective approach of the roadmap, just a
visualisation of existing processes, prevents the adding of a new process
to the community. It will also incentivise the thoroughness of new CEPs.

The benefit of having the roadmap as a separate manual process pre-CEP
might save us the cost of creating CEPs that get rejected, but I can't see
that actually being a problem for us.

+1 to having the roadmap, in any form.



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Project Roadmap

2021-03-01 Thread Benedict Elliott Smith
Yes, absolutely my goal isn't to prohibit work outside of the roadmap.

For really large, complex items of work that potentially require wide input 
from community e.g. because of semantic or stability implications (i.e. the 
kind we only deliver a handful per release), I think it would be legitimate 
(and helpful) for the community to pause integration of work until either the 
roadmap can be adjusted (to deprioritise other items taking its focus) or until 
the roadmap catches up. The community has only so much capacity for those kinds 
of contributions each release, and I think it is beneficial to the project to 
manage that capacity, and also to ensure such major contributions get due 
attention. But only the biggest organisations are going to be even remotely 
constrained by this, and they're able to re-shape the roadmap, so it's less a 
restriction and more a mechanism to ensure collaboration and communication on 
the riskiest contributions.

This is of course all up for debate, but I think this would be both a benefit 
of a roadmap, and also strengthen its other utilities by helping keep the 
roadmap accurate and honest.


On 01/03/2021, 10:16, "Sumanth Pasupuleti"  
wrote:

+1 to the idea of the project roadmap and the said benefits for planning.
In my opinion, it certainly does a world of good for visibility on what is
in the works/ what to look forward to for both the developers as well as
users. So long as "allowed work" is not restricted to items in the project
roadmap and developers can still make contributions to work unlisted in the
project roadmap, I think having a project roadmap is certainly a step in
the right direction.

Thanks,
Sumanth

On Mon, Mar 1, 2021 at 1:18 AM Benedict Elliott Smith 
wrote:

> A while back somebody privately raised the idea of a project roadmap to
> me, and I’d like to propose we formally consider it as a project now that
> 4.0 is approaching completion.
>
>
>
> I think there are two major benefits to agreeing a roadmap:
>
>
>
> 1) It helps us to coordinate finite project resources between multiple
> entities, as we can signal to each other what our priorities are, agree to
> prioritise items on the roadmap, and plan cross-organisation capacity
> necessary for each roadmap item.
>
> 2) It signals to the wider user community what to expect, facilitating
> confidence in project health and direction. I think this will be
> particularly helpful as 4.0 is announced, given the extraordinary amount 
of
> time that passed between 3.11 and 4.0.
>
>
>
> I think of a roadmap as a pre-CEP activity for upcoming releases, items
> thereon beginning the CEP process, with target releases being assigned by
> the roadmap (subject to revision) and project members opting-in to the
> endeavour to deliver for that release.  I don’t think it should lead to
> work progressing only on roadmap items, but that other major endeavours
> (i.e. those entailing large impact to the project, or requiring lots of
> cross-org input) could be put on hold until the earlier roadmap items were
> properly resourced (or the roadmap revised).
>
>
>
> What do people think?
>
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Project Roadmap

2021-03-01 Thread Benedict Elliott Smith
A while back somebody privately raised the idea of a project roadmap to me, and 
I’d like to propose we formally consider it as a project now that 4.0 is 
approaching completion.

 

I think there are two major benefits to agreeing a roadmap:

 

1) It helps us to coordinate finite project resources between multiple 
entities, as we can signal to each other what our priorities are, agree to 
prioritise items on the roadmap, and plan cross-organisation capacity necessary 
for each roadmap item.

2) It signals to the wider user community what to expect, facilitating 
confidence in project health and direction. I think this will be particularly 
helpful as 4.0 is announced, given the extraordinary amount of time that passed 
between 3.11 and 4.0.

 

I think of a roadmap as a pre-CEP activity for upcoming releases, items thereon 
beginning the CEP process, with target releases being assigned by the roadmap 
(subject to revision) and project members opting-in to the endeavour to deliver 
for that release.  I don’t think it should lead to work progressing only on 
roadmap items, but that other major endeavours (i.e. those entailing large 
impact to the project, or requiring lots of cross-org input) could be put on 
hold until the earlier roadmap items were properly resourced (or the roadmap 
revised).

 

What do people think?



Re: Cassandra 4.0 Status 2021-02-25

2021-02-26 Thread Benedict Elliott Smith
Fair enough.

On 26/02/2021, 20:45, "Mick Semb Wever"  wrote:

Should we wait for e.g. five clean CI runs in a row?  Historically flaky
> tests have been a real issue for the project, and CI success probably
> shouldn't be taken instantaneously for releases.



There are tickets for flakey tests that have been pushed to fixVersion
4.0-rc intentionally, making this difficult to achieve.

A green run will be a huge achievement for the project, something we
haven't seen in a very long time. My understanding of the "we see one clean
CI run" position was taking it as a stake-in-the-ground, knowing (and
expecting) that the situation improves (with the ongoing work by many)
towards GA.

Might we instead apply the criteria on one of the rc releases before GA ?



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: New Cassandra website for review

2021-02-26 Thread Benedict Elliott Smith
Very nice.

On 26/02/2021, 21:36, "Melissa Logan"  wrote:

Hi all,

We are excited to share the almost-complete Cassandra website design
(CASSANDRA-16115). Huge thanks to Lorina Poland, Anthony Grosso, Mick Semb
Weaver, Josh Levy, Chris Thornett, Diogenese Topper, and a few others who
contributed to this effort.

Note: There are a few updates to be made prior to launch, but we wanted to
share to get initial input and signoff to begin the final port to Antora.

To be completed:

   - *Homepage: *The logos are placeholders -- they're being updated and
   resized (pulled from case studies page).
   - *Docs* will be added once 4.0 documentation is complete. Design
   will match new site.
   - *Case Studies * logos are being updated and resized, so ignore broken
   links.

If you have case studies or resources -- or community photos --
please reply to me and we'll add.

Site for review: https://cassandra.staged.apache.org/

https://issues.apache.org/jira/browse/CASSANDRA-16115

Melissa Logan



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra 4.0 Status 2021-02-25

2021-02-26 Thread Benedict Elliott Smith
Should we wait for e.g. five clean CI runs in a row?  Historically flaky tests 
have been a real issue for the project, and CI success probably shouldn't be 
taken instantaneously for releases.

On 26/02/2021, 19:38, "Michael Semb Wever"  wrote:


> * We’re within line-of-sight to closing out beta scope. Any work people 
can
> do on remaining blockers will accelerate the project toward RC.


So, last call for last minute concerns, the plan is to cut 4.0-rc1 once 
 - those beta tickets are resolved,
 - the gremlins around v5 native protocol (+python driver frames) are 
fixed, and 
 - we see one clean CI run

Which looks like it might happen next week.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Releases after 4.0

2021-02-05 Thread Benedict Elliott Smith
.0 maintenance branch.  We
> > make future 4.0.1 4.0.2 4.0.3 releases from this branch.
> >
> > Trunk continues development, some new features are added there.  After a
> > few months we release 4.1.0 from trunk, we do not cut a cassandra-4.1
> > branch.  Development continues along on trunk, some new features get in
> so
> > we bump the version in the branch to 4.2.0.  A few months go by we
> release
> > 4.2.0 from trunk.  Some bug fixes go into trunk with no new features, 
the
> > version on the branch bumps to 4.2.1, we decide to make a release from
> > trunk, and only fixes have gone into trunk since the last release, so we
> > release 4.2.1 from trunk.
> >
> > We continue on this way releasing 4.3.0, 4.4.0, 4.4.1 …. We decide it is
> > time for a new maintenance branch to be cut.  So with the release of
> 4.5.0
> > we also cut the cassandra-4.5 branch.  This branch will get patch
> releases
> > made from it 4.5.1 4.5.2 4.5.3.
> >
> > Trunk continues on as 4.6.0, 4.7.0, 4.8.0 …. At some point the project
> > decides it wants to drop support for some deprecated feature, trunk gets
> > bumped to 5.0.0. More releases happen from trunk 5.0.0, 5.1.0, 5.2.0,
    > 5.2.1
> > development on trunk continues on.  Time for a new maintenance branch
> with
> > 5.3.0 so cassandra-5.3 gets cut...
> >
> > This does kind of look like what we tried for tick/tock, but it is not
> the
> > same.  If we wanted to name this something, I would call it something
> like
> > "releasable trunk+periodic maintenance branching”.  This is what many
> > projects that release from trunk look like.
> >
> > -Jeremiah
> >
> >
> > > On Jan 28, 2021, at 10:31 AM, Benedict Elliott Smith <
> > bened...@apache.org> wrote:
> > >
> > > But, as discussed, we previously agreed limit features in a minor
> > version, as per the release lifecycle (and I continue to endorse this
> > decision)
> > >
> > > On 28/01/2021, 16:04, "Mick Semb Wever"  wrote:
> > >
> > >> if there's no such features, or anything breaking compatibility
> > >>
> > >> What do you envisage being delivered in such a release, besides bug
> > >> fixes?  Do we have the capacity as a project for releases dedicated 
to
> > >> whatever falls between those two gaps?
> > >>
> > >
> > >
> > >All releases that don't break any compatibilities as our documented
> > >guidelines dictate (wrt. upgrades, api, cql, native protocol, etc).
> > Even
> > >new features can be introduced without compatibility breakages (and
> > should
> > >be as often as possible).
> > >
> > >Honouring semver does not imply more releases, to the contrary it 
is
> > just
> > >that a number of those existing releases will be minor instead of
> > major.
> > >That is, it is an opportunity cost to not recognise minor releases.
> > >
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Releases after 4.0

2021-01-28 Thread Benedict Elliott Smith
But, as discussed, we previously agreed limit features in a minor version, as 
per the release lifecycle (and I continue to endorse this decision)

On 28/01/2021, 16:04, "Mick Semb Wever"  wrote:

> if there's no such features, or anything breaking compatibility
>
> What do you envisage being delivered in such a release, besides bug
> fixes?  Do we have the capacity as a project for releases dedicated to
> whatever falls between those two gaps?
>


All releases that don't break any compatibilities as our documented
guidelines dictate (wrt. upgrades, api, cql, native protocol, etc).  Even
new features can be introduced without compatibility breakages (and should
be as often as possible).

Honouring semver does not imply more releases, to the contrary it is just
that a number of those existing releases will be minor instead of major.
That is, it is an opportunity cost to not recognise minor releases.



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Releases after 4.0

2021-01-28 Thread Benedict Elliott Smith
> if there's no such features, or anything breaking compatibility

What do you envisage being delivered in such a release, besides bug fixes?  Do 
we have the capacity as a project for releases dedicated to whatever falls 
between those two gaps?

I'd like to see us have three branches: life support (critical fixes), stable 
(fixes), and development. Minors don't fit very well into that IMO.

> I am a bit scared of a continuous delivery approach for a database 
> due to the lack of guarantee on the APIs and protocols as you mentioned

Well, this could be resolved by marking features as unstable, then 
experimental, as we have begun doing. So that API stability is tied to features 
more tightly than releases.  I'm actually warming to this configuration, but I 
think we're all circling ideas in a similar vicinity and I suspect none of us 
are tightly wed to the specifics.

> we should allow ourselves to cut a release sooner. 

The only issue here is that we then create an extra maintenance overhead, as we 
have more releases to manage.  This is one advantage of the CD approach - we 
nominate a release much less frequently for long term support, at which point 
we rollover to a new major (presumably also only deprecating across such a 
boundary).

I suppose in practice all this wouldn't be too different to tick-tock, just 
with a better state of QA, a higher bar to merge and (perhaps) no fixed release 
cadence. This realisation makes me less keen on it, for some reason.




On 28/01/2021, 14:23, "Mick Semb Wever"  wrote:

We have had a very problematic history with release versioning, such that
> our users probably think the numbers are meaningless.
>


Completely agree with this. But it feels that we're throwing the baby out
with the bath water…

I do think we can do semver with just a minimal amount of dev guidelines in
place, to great benefit to users (and for ourselves).



> However, in the new release lifecycle document (and in follow-up
> discussions around qualifying releases) we curtail _features_ once a
> release is GA, and also stipulate that a new major version is associated
> with a release.
>


The following aspects remain open questions for me…
 - if there's no such features, or anything breaking compatibility, isn't
there benefit in releasing a minor,
 - can we better indicate to users the upgrade path (and how we simplify
which upgrade paths we have to support),
 - the practice of deprecating an API one release and then removing it the
following,
 - we have CQL, Native Protocol, SSTable versioning, can they tie in to our
semver (especially their majors, which are also about compatibility)

I would have thought we have enough here to provide a set of guidelines to
the dev community about when a release is either a major or minor. The
missing piece here is how do we apply a branching strategy. I would suggest
the same branching strategy that we would do under your suggestion of
, so that the decision about a release being a major or a
minor can be lazy made. It may be in practice that this starts off with
every release being a major, as you have suggested, but we would keep the
minor numbers and semver there to use when we see fit. If our practices
improve, with the dev guidelines in place, we may see that releases become
mostly minors.

And I see this increases relevance if we introduce more SPIs into the
codebase and have a bigger dev ecosystem around us, e.g. storage engine,
compaction, indexes, thin-coordinators… Already today we know we have
consumers of our maven artifacts, and dependency hell is a big part of
semver's value, ref: https://semver.org/



> Why make a release at a fixed time each year?
> > Stable-trunk is more popularly associated with Continuously Delivered
> (CD) codebases
>
> We need to pick some points in time to provide stability/patch support
> for, and an annual cadence provides some certainty to the community.
>


What we can do depends on how much time the community has to contribute.
That is a changing and responsive thing. We can aim for an objective, and
improve/streamline processes and guidelines.

So, my suggestion is to…
 - keep semver,
 - we decide whether a release is major or minor when we agree to cut a
release, and
 - that decision is primarily based on documented dev guidelines.



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Releases after 4.0

2021-01-28 Thread Benedict Elliott Smith
We have had a very problematic history with release versioning, such that our 
users probably think the numbers are meaningless.

However, in the new release lifecycle document (and in follow-up discussions 
around qualifying releases) we curtail _features_ once a release is GA, and 
also stipulate that a new major version is associated with a release.

This happens to accord with my preference, namely that we eliminate the concept 
of a minor release in semver terms. We have feature releases and patch 
releases. i.e., 4.0's first bug fix release is 4.1, and in a year we ship 5.0.  
There has been support voiced for this in a couple of forums (including on this 
list back in 2019), but it was perhaps never fully discussed/settled.

> Why make a release at a fixed time each year?
> Stable-trunk is more popularly associated with Continuously Delivered (CD) 
> codebases

We need to pick some points in time to provide stability/patch support for, and 
an annual cadence provides some certainty to the community.  Perhaps it 
wouldn't make sense if we aim for true continuous delivery of trunk. However, 
there is value in flexibility to experiment/revisit decisions before committing 
to APIs and feature behaviours long term. By providing continuous delivery of 
builds that do not guarantee API stability for new features/APIs, users that 
are able to accept some small risk in this regard (e.g. during development, or 
where they do not intend to use the new features) may still benefit from access 
to high quality releases quickly, and the project gets more rapid feedback.

Perhaps we can have a flexible approach though, wherein we have continuous 
delivery of release candidates, and on arbitrary timelines cut releases that 
create API-stability points, and periodically nominate such releases to receive 
3+ years of support. 



On 28/01/2021, 11:42, "Mick Semb Wever"  wrote:

> I'd like to pair this with stricter merge criteria, so that we maintain a
~shippable trunk, [snip]. We might have to get happy with reverting commits
that break things.


Yes and yes! The work we have done, started on, and undercovered in the 4.0
Quality Testing Epics should continue.

Our CI systems have also improved. Folk are using both circleci and
ci-cassandra, and i think the combination brings an advantage.  Though
there's still a lot to do. CircleCI doesn't cover everything, and
with ci-cassandra there are a few things still to do. For example: arm64,
jmh, jacoco, dtest-large-novnode, and putting dtest-upgrade into the
pipeline. Jira tickets exist for these. Another issue we have is reliably
identifying flaky tests and test history. All test results and logs are now
getting stored in nightlies.a.o, so the data is there to achieve it, but
searching it remains overly raw.

If such efforts continue, as they have, we should definitely be able to
avoid repeating the feature freeze requirement.


> My preference is for a simple annual major release cadence, with minors
as necessary. This is simple for the user community and developer
community: support = x versions = x years.

This raises a number of questions for me.

Why make a release at a fixed time each year?
The idea of one major release a year contradicts in practice any efforts
towards a stable-trunk. Stable-trunk is more popularly associated with
Continuously Delivered (CD) codebases. Yearly releases are not quite that,
and I can't see a stricter merge criteria compensating enough. I have put
effort into the release process, and encouraged the community to have more
active release managers, so that when we need a release we can get one. We
should be looking into cutting patch releases as often as possible.

For how many years shall we support major versions?
Currently we maintain three release branches, plus one limited to security
issues, and the oldest has been supported for 5 years. I think 5 years is
too long for the current community and would suggest bringing it down to 3
years. The project is maturing, and along with efforts towards a
stable-trunk, I would expect upgrades to be getting easier. Asking users to
upgrade at least once every three years shouldn't be a big deal for an OSS
project.


> I understood us to have agreed to drop semver, because our major/minor
history has been a meaningless distinction…


I am not sure that I understand that point, is there reference to this
agreement?
Not releasing minor versions doesn't mean we drop semver, we still have the
three numbers there in our versions. In your first reply you wrote that we
would do "minors as necessary", what were your thoughts on what a minor was
there? Was it just a relabelling of patch versions?

Now that we have our Release Lifecycle guidelines defined, which included
discussions on compatibility issues, isn't it a good 

Re: [DISCUSS] Releases after 4.0

2021-01-27 Thread Benedict Elliott Smith
I understood us to have agreed to drop semver, because our major/minor history 
has been a meaningless distinction, and instead to go major/patch (or 
major/minor - with minor for patches), depending how you want to slice it.

But there have been a lot of discussions over the past year or so, so I may be 
misremembering.

On 27/01/2021, 13:25, "Benjamin Lerer"  wrote:

>
> My preference is for a simple annual major release cadence, with minors as
> necessary.
>

I do not think that I fully understand your proposal. How do you define a
major and a minor release?
My understanding of a major release was a version that broke some of the
compatibilities. By consequence, once a breaking change has been introduced
it will not be possible to release a minor and we will have to wait for a
major release. In a similar way if no breaking change has been introduced,
does it make sense to release a major?




On Wed, Jan 27, 2021 at 11:21 AM Benedict Elliott Smith 

wrote:

> Perhaps we could also consider quarterly "develop" releases, so that we
> have pressure to maintain a shippable trunk? This provides some 
opportunity
> for more releases without incurring the project maintenance costs or user
> coordination costs. Something like a feature-incomplete mid-cycle RC, that
> a user wanting shiny features can grab, providing feedback throughout the
> development cycle.
    >
> On 26/01/2021, 14:11, "Benedict Elliott Smith" 
> wrote:
>
> My preference is for a simple annual major release cadence, with
> minors as necessary. This is simple for the user community and developer
> community: support = x versions = x years.
>
> I'd like to pair this with stricter merge criteria, so that we
> maintain a ~shippable trunk, and we cut a release at ~the same time every
> year, whatever features are merged. We might have to get happy with
> reverting commits that break things.
>
> I think faster cadences impose too much burden on the developer
> community for maintenance and the user community for both upgrades and
> making sense of what's going on. I think slower cadences collapse, as the
> release window begins to collect too many hopes and dreams.
>
> My hope is that we get to a point where snapshots of trunk are safe to
> run, and that major contributors are ahead of the release window for
> internal consumption, rather than behind - this might also alleviate
> pressure for hitting release windows with features.
>
>
>
>
> On 26/01/2021, 13:56, "Benjamin Lerer" 
> wrote:
>
>  Hi everybody
>
> It seems that there is a need to discuss how we will deal with
> releases
> after 4.0
> We are now relatively close from the 4.0 RC release so it make
> sense to me
> to start discussing that subject especially as it has some impact
> on some
> things like dropping support for python 2
>
> The main questions are in my opinion:
> 1) What release cadence do we want to use for major/minor 
versions?
> 2) How do we plan to ensure the quality of the releases?
>
> It might make sense to try a release cadence and see how it works
> out in
> practice revisiting our decision if we feel the need for it.
>
> One important thing to discuss with the cadence is the amount of
> time we
> want to support the releases. 2.2 has been supported for more than
> 5 years,
> we might not be able to support releases for a similar time frame
> if we
> release a version every 6 months for example.
> To be sure that we are all on the same page regarding what minor
> and major
> versions are and their naming: 4.1 would be a minor version
> (improvements
> and features that don't break compatibility) and 5.0 would be a
> major
> version (compatibility breakages)
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Releases after 4.0

2021-01-27 Thread Benedict Elliott Smith
Perhaps we could also consider quarterly "develop" releases, so that we have 
pressure to maintain a shippable trunk? This provides some opportunity for more 
releases without incurring the project maintenance costs or user coordination 
costs. Something like a feature-incomplete mid-cycle RC, that a user wanting 
shiny features can grab, providing feedback throughout the development cycle.

On 26/01/2021, 14:11, "Benedict Elliott Smith"  wrote:

My preference is for a simple annual major release cadence, with minors as 
necessary. This is simple for the user community and developer community: 
support = x versions = x years.

I'd like to pair this with stricter merge criteria, so that we maintain a 
~shippable trunk, and we cut a release at ~the same time every year, whatever 
features are merged. We might have to get happy with reverting commits that 
break things.

I think faster cadences impose too much burden on the developer community 
for maintenance and the user community for both upgrades and making sense of 
what's going on. I think slower cadences collapse, as the release window begins 
to collect too many hopes and dreams.

My hope is that we get to a point where snapshots of trunk are safe to run, 
and that major contributors are ahead of the release window for internal 
consumption, rather than behind - this might also alleviate pressure for 
hitting release windows with features.




On 26/01/2021, 13:56, "Benjamin Lerer"  wrote:

 Hi everybody

It seems that there is a need to discuss how we will deal with releases
after 4.0
We are now relatively close from the 4.0 RC release so it make sense to 
me
to start discussing that subject especially as it has some impact on 
some
things like dropping support for python 2

The main questions are in my opinion:
1) What release cadence do we want to use for major/minor versions?
2) How do we plan to ensure the quality of the releases?

It might make sense to try a release cadence and see how it works out in
practice revisiting our decision if we feel the need for it.

One important thing to discuss with the cadence is the amount of time we
want to support the releases. 2.2 has been supported for more than 5 
years,
we might not be able to support releases for a similar time frame if we
release a version every 6 months for example.
To be sure that we are all on the same page regarding what minor and 
major
versions are and their naming: 4.1 would be a minor version 
(improvements
and features that don't break compatibility) and 5.0 would be a major
version (compatibility breakages)



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Releases after 4.0

2021-01-26 Thread Benedict Elliott Smith
My preference is for a simple annual major release cadence, with minors as 
necessary. This is simple for the user community and developer community: 
support = x versions = x years.

I'd like to pair this with stricter merge criteria, so that we maintain a 
~shippable trunk, and we cut a release at ~the same time every year, whatever 
features are merged. We might have to get happy with reverting commits that 
break things.

I think faster cadences impose too much burden on the developer community for 
maintenance and the user community for both upgrades and making sense of what's 
going on. I think slower cadences collapse, as the release window begins to 
collect too many hopes and dreams.

My hope is that we get to a point where snapshots of trunk are safe to run, and 
that major contributors are ahead of the release window for internal 
consumption, rather than behind - this might also alleviate pressure for 
hitting release windows with features.




On 26/01/2021, 13:56, "Benjamin Lerer"  wrote:

 Hi everybody

It seems that there is a need to discuss how we will deal with releases
after 4.0
We are now relatively close from the 4.0 RC release so it make sense to me
to start discussing that subject especially as it has some impact on some
things like dropping support for python 2

The main questions are in my opinion:
1) What release cadence do we want to use for major/minor versions?
2) How do we plan to ensure the quality of the releases?

It might make sense to try a release cadence and see how it works out in
practice revisiting our decision if we feel the need for it.

One important thing to discuss with the cadence is the amount of time we
want to support the releases. 2.2 has been supported for more than 5 years,
we might not be able to support releases for a similar time frame if we
release a version every 6 months for example.
To be sure that we are all on the same page regarding what minor and major
versions are and their naming: 4.1 would be a minor version (improvements
and features that don't break compatibility) and 5.0 would be a major
version (compatibility breakages)



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Which fix version should be used for the Quality Testing tickets

2020-12-11 Thread Benedict Elliott Smith
Yes, I meant of those tickets we agreed at ApacheCon a year ago, blocking at 
most GA seems reasonable - roughly in accordance with what Benjamin was saying. 
 Not that the entire codebase should be brought up to our preferred standards 
before any new releases are cut.  I'm also not opposed to some modification of 
scope of those tasks, if we anticipate release dragging on too much longer.


On 11/12/2020, 17:07, "Benjamin Lerer"  wrote:

>
> As an aside, I disagree about this blocking GA. We have a decade or so of
> debt and this is essentially a category without a ceiling. Under this
> umbrella we could feasibly delay 4.0 for another multiple years.


I do not think that anybody wants to delay 4.0 more than needed.
Now, nothing prevents us from having another discussion about the scope of
what is reasonable to tackle in 4.0.
I can start that discussion beginning of January if people feel the need
for it.


On Fri, Dec 11, 2020 at 5:05 PM Joshua McKenzie 
wrote:

> >
> > - Anticipated not to find serious bugs (e.g. old unchanged but poorly
> > tested features): Block GA
>
>  As an aside, I disagree about this blocking GA. We have a decade or so of
> debt and this is essentially a category without a ceiling. Under this
> umbrella we could feasibly delay 4.0 for another multiple years.
>
>
> On Fri, Dec 11, 2020 at 10:43 AM Joshua McKenzie 
> wrote:
>
> > Reasonable categories. We haven't discussed what qualifies where for 4.0
> > have we? (new lacking | changed modest | old lacking)
> >
> > On Fri, Dec 11, 2020 at 9:14 AM Benedict Elliott Smith <
> > bened...@apache.org> wrote:
> >
> >> In my opinion...
> >>
> >> - Expected to find serious bugs (e.g. new poorly tested features): 
Block
> >> beta
> >> - Anticipated to possibly find serious bugs (e.g. extensively changed
> >> features with modest testing): Block RC
> >> - Anticipated not to find serious bugs (e.g. old unchanged but poorly
> >> tested features): Block GA
> >>
> >> Which mostly accords with what you're saying re: today's state of play 
I
> >> think.
> >>
> >>
> >> On 11/12/2020, 13:03, "Benjamin Lerer" 
> >> wrote:
> >>
> >> It looks like my question raised more questions than I had in mind.
> >>
> >> 1. What is the meaning of the fix version?
> >> 2. When do we move from beta to RC?
> >> 3. Where does the Quality tickets fit in all that?
> >>
> >>
> >> *What is the meaning of the fix version?*
> >>
> >> It looks like we should just pick a definition and document it.
> >>
> >> My preference would be 'The version in which the item must be 
fixed'
> >> (e.g
> >> 4.0-beta if the ticket must be fixed in a beta release).
> >>
> >>
> >> *When do we move from beta to RC?*
> >>
> >> The 2 things that I can get from the Release Lifecycle page are:
> >>
> >> 1. *No flaky tests - All tests (Unit Tests and DTests) should pass
> >> consistently.*
> >> 2.* If there are no known bugs to be fixed before release, we
> promote
> >> to
> >> RC.*
> >>
> >> The first point is pretty clear, the second is a bit more vague. 
The
> >> main
> >> reason for that is probably that there is a choice to make here.
> >> There are
> >> some bugs that we cannot or chose to not fix in 4.0 (e.g. some LWT
> >> consistency issues during cluster changes).
> >> By consequence, I do not know if we can be more precise. We have to
> >> agree
> >> on which known bugs we want to fix on the release. Once they are
> >> fixed and
> >> we have the tests that pass constantly we should be able to cut an
> RC
> >> release.
> >>
> >>
> >> *Where does the Quality tickets fit in all that?*
> >>
> >> That is for me the tricky question because the `Quality tickets` 
are
> >> really
> >> about extending the test coverage and we probably did not think
> about
> >> that
> >> type of work when the Release Lifecycle page was written.

Re: Which fix version should be used for the Quality Testing tickets

2020-12-11 Thread Benedict Elliott Smith
In my opinion...

- Expected to find serious bugs (e.g. new poorly tested features): Block beta
- Anticipated to possibly find serious bugs (e.g. extensively changed features 
with modest testing): Block RC
- Anticipated not to find serious bugs (e.g. old unchanged but poorly tested 
features): Block GA

Which mostly accords with what you're saying re: today's state of play I think.


On 11/12/2020, 13:03, "Benjamin Lerer"  wrote:

It looks like my question raised more questions than I had in mind.

1. What is the meaning of the fix version?
2. When do we move from beta to RC?
3. Where does the Quality tickets fit in all that?


*What is the meaning of the fix version?*

It looks like we should just pick a definition and document it.

My preference would be 'The version in which the item must be fixed' (e.g
4.0-beta if the ticket must be fixed in a beta release).


*When do we move from beta to RC?*

The 2 things that I can get from the Release Lifecycle page are:

1. *No flaky tests - All tests (Unit Tests and DTests) should pass
consistently.*
2.* If there are no known bugs to be fixed before release, we promote to
RC.*

The first point is pretty clear, the second is a bit more vague. The main
reason for that is probably that there is a choice to make here. There are
some bugs that we cannot or chose to not fix in 4.0 (e.g. some LWT
consistency issues during cluster changes).
By consequence, I do not know if we can be more precise. We have to agree
on which known bugs we want to fix on the release. Once they are fixed and
we have the tests that pass constantly we should be able to cut an RC
release.


*Where does the Quality tickets fit in all that?*

That is for me the tricky question because the `Quality tickets` are really
about extending the test coverage and we probably did not think about that
type of work when the Release Lifecycle page was written.

We can decide to have (flaky tests + known bugs + increasing test coverage)
in beta and fix the documentation tickets in RC while waiting for people to
raise bugs or have (flaky tests + known bugs) in beta and (increasing test
coverage + documentation tickets) in RC.

I tend to prefer the (flaky tests + know bugs) in beta and (increasing test
coverage + documentation tickets) in RC approach. It reduces the scope for
the beta release and will increase our focus. Hopefully it might help us to
move faster.
I also hope that an RC release will push more people to test the release.
Increasing our confidence in the stability of 4.0.

What do you think?



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Bringing protocol v5 out of beta and dropping support from 3.11.x

2020-12-08 Thread Benedict Elliott Smith
Perhaps we should skip v5, and move to v6 for the new protocol to avoid this 
issue?

On 08/12/2020, 10:53, "Sam Tunnicliffe"  wrote:

CASSANDRA-15299 has revised the wire format of CQL native protocol to add a 
framing layer around the existing CQL messages. This is targetted at protocol 
v5, which is (still) currently in beta. There's a small problem with this 
though; while v5-beta is supported in Cassandra 3.11.x and 4.0.0, the new wire 
format is only committed to trunk. This means that if clients upgrade to a 
version of their driver which supports the new wire format (3.10.0 for the java 
driver, python driver is not yet offically released), connections to Cassandra 
3.11.x nodes will break if the client specifies v5-beta as the preferred 
protocol version. 

Of course, any protocol changes that landed in v5-beta have had the 
potential to cause this breakage and in some ways this particular change has a 
better failure mode as it prevents incompatible clients from connecting at all. 
As we have no intention of backporting the new wire format to 3.11.x, and 
because v5-beta has always been characterised as an unsupported, dev-only 
preview, I'm proposing we remove support for it from the 3.11 line. At the same 
time, we should promote v5 from beta and create a new v6-beta for future 
development (CASSANDRA-14973).

If there are no objections, I'll file a JIRA for 3.11.x and post a patch 
shortly.

Thanks,
Sam


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

2020-11-24 Thread Benedict Elliott Smith
I think the keyword there is "normally" - if we can't say _certainly_, then 
this is probably an unsafe change to make.

I can imagine any number of hacky upgrade processes that would be dangerous 
with this change.

But, happy to defer to the consensus of others.



On 24/11/2020, 11:04, "Paulo Motta"  wrote:

 In this case the breaking change is a feature, not a bug. The exact
intention of this is to require manual intervention to raise awareness
about the potential performance degradation. This sounds reasonable, once
we already broke the contract of not introducing performance regressions in
a minor.

I don't see how this can pose an outage risk to the cluster given upgrades
are normally performed in a rolling restart fashion, so the worst that
could happen is the first node in the sequence not starting, so the upgrade
would not proceed. In my view this would be far less harmful than figuring
out about a performance regression after all your nodes are upgraded.

Nevertheless, I'm pretty fine on retracting the suggestion to move forward
with the proposal if you feel strongly about it.

Em ter., 24 de nov. de 2020 às 07:26, Benedict Elliott Smith <
bened...@apache.org> escreveu:

> In my parlance the config property would be a breaking change, whereas the
> LWT behaviour would be a performance regression.  This latter might cause
> partial outages or service degradation, but refusing to start a prod
> cluster without manual intervention is potentially a much worse situation,
> and even more surprising for a patch upgrade.
>
> On 24/11/2020, 01:05, "Paulo Motta"  wrote:
>
> Isn't the plan to change LWT implementation (and performance
> expectation)
> in a patch version? This is a breaking change by itself, I'm just
> proposing
> to make the trade-off choice explicit in the yaml to prevent 
unexpected
> performance degradation during upgrade (for users who are not aware of
> the
> change).
>
> Just to make it clear, I'm proposing having a "lwt_legacy_mode: false"
> uncommented in the default yaml with a descriptive comment about
> CASSANDRA-12126, so new users will always get the new behavior, but
> users
> using a yaml template based on a previous 3.X version will not be able
> to
> start the node because this property will be missing. I believe the
> majority of operators will just update their yaml with
> "lwt_legacy_mode:
> false" and move on with their upgrades, but people wanting to keep the
> previous performance will become aware of the breaking change and set
> it to
> true.
>
> Em seg., 23 de nov. de 2020 às 21:07, Benedict Elliott Smith <
> bened...@apache.org> escreveu:
>
> > What do you mean by minor upgrade? We can't break patch upgrades for
> any
> > of 3.x, as this could also cause surprise outages.
> >
> > On 23/11/2020, 23:51, "Paulo Motta" 
> wrote:
> >
> >  I was thinking about the YAML requirement during the 3.X minor
> > upgrade to
> > make the decision explicit (need to update yaml) rather than
> implicit
> > (by
> > upgrading you agree with the change), since the latter can go
> > unnoticed by
> > those who don't pay attention to NEWS.txt
> >
> > Em seg., 23 de nov. de 2020 às 20:03, Benedict Elliott Smith <
> > bened...@apache.org> escreveu:
> >
> > > What's the value of the yaml? The user is likely to have
> upgraded to
> > > latest 3.x as part of the upgrade process to 4.0, so they'll
> already
> > have
> > > had a decision made for them. If correctness didn't break
> anything,
> > there
> > > doesn't any longer seem much point in offering a choice?
> > >
> > > On 23/11/2020, 22:45, "Brandon Williams" 
> wrote:
> > >
> > > +1 to both as well.
> > >
> > > On Mon, Nov 23, 2020, 4:42 PM Blake Eggleston
> > > 
> > > wrote:
> > >
> > > > +1 to correctness, and I like the yaml idea
> > > >
> > > > > On Nov 23, 2020, at 4:20 AM, Paulo Motta <
> 

Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

2020-11-24 Thread Benedict Elliott Smith
In my parlance the config property would be a breaking change, whereas the LWT 
behaviour would be a performance regression.  This latter might cause partial 
outages or service degradation, but refusing to start a prod cluster without 
manual intervention is potentially a much worse situation, and even more 
surprising for a patch upgrade. 

On 24/11/2020, 01:05, "Paulo Motta"  wrote:

Isn't the plan to change LWT implementation (and performance expectation)
in a patch version? This is a breaking change by itself, I'm just proposing
to make the trade-off choice explicit in the yaml to prevent unexpected
performance degradation during upgrade (for users who are not aware of the
change).

Just to make it clear, I'm proposing having a "lwt_legacy_mode: false"
uncommented in the default yaml with a descriptive comment about
CASSANDRA-12126, so new users will always get the new behavior, but users
using a yaml template based on a previous 3.X version will not be able to
start the node because this property will be missing. I believe the
majority of operators will just update their yaml with "lwt_legacy_mode:
false" and move on with their upgrades, but people wanting to keep the
previous performance will become aware of the breaking change and set it to
true.

Em seg., 23 de nov. de 2020 às 21:07, Benedict Elliott Smith <
bened...@apache.org> escreveu:

> What do you mean by minor upgrade? We can't break patch upgrades for any
> of 3.x, as this could also cause surprise outages.
>
> On 23/11/2020, 23:51, "Paulo Motta"  wrote:
>
>  I was thinking about the YAML requirement during the 3.X minor
> upgrade to
> make the decision explicit (need to update yaml) rather than implicit
> (by
> upgrading you agree with the change), since the latter can go
> unnoticed by
> those who don't pay attention to NEWS.txt
>
> Em seg., 23 de nov. de 2020 às 20:03, Benedict Elliott Smith <
> bened...@apache.org> escreveu:
>
> > What's the value of the yaml? The user is likely to have upgraded to
> > latest 3.x as part of the upgrade process to 4.0, so they'll already
> have
> > had a decision made for them. If correctness didn't break anything,
> there
> > doesn't any longer seem much point in offering a choice?
> >
> > On 23/11/2020, 22:45, "Brandon Williams"  wrote:
> >
> > +1 to both as well.
> >
> > On Mon, Nov 23, 2020, 4:42 PM Blake Eggleston
> > 
> > wrote:
> >
> > > +1 to correctness, and I like the yaml idea
> > >
> > > > On Nov 23, 2020, at 4:20 AM, Paulo Motta <
> pauloricard...@gmail.com
> > >
> > > wrote:
> > > >
> > > > +1 to defaulting for correctness.
> > > >
> > > > In addition to that, how about making it a mandatory
> cassandra.yaml
> > > > property defaulting to correctness? This would make upgrades
> with
> > an old
> > > > cassandra.yaml fail unless an option is explicitly 
specified,
> > making
> > > > operators aware of the issue and forcing them to make a
> choice.
> > > >
> > > >> Em seg., 23 de nov. de 2020 às 07:30, Benjamin Lerer <
> > > >> benjamin.le...@datastax.com> escreveu:
> > > >>
> > > >> Thank you very much to everybody that provided feedback. It
> > helped a
> > > lot to
> > > >> limit our options.
> > > >>
> > > >> Unfortunately, it seems that some poor soul (me, really!!!)
> will
> > have to
> > > >> make the final call between #3 and #4.
> > > >>
> > > >> If I reformulate the question to: Do we default to
> *correctness
> > *or to
> > > >> *performance*?
> > > >>
> > > >> I would choose to default to *correctness*.
> > > >>
> > > >> Of course the situation is more complex than that but it
> seems
> > that
> > > >> somebody ha

Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

2020-11-23 Thread Benedict Elliott Smith
What do you mean by minor upgrade? We can't break patch upgrades for any of 
3.x, as this could also cause surprise outages.

On 23/11/2020, 23:51, "Paulo Motta"  wrote:

 I was thinking about the YAML requirement during the 3.X minor upgrade to
make the decision explicit (need to update yaml) rather than implicit (by
upgrading you agree with the change), since the latter can go unnoticed by
those who don't pay attention to NEWS.txt

Em seg., 23 de nov. de 2020 às 20:03, Benedict Elliott Smith <
bened...@apache.org> escreveu:

> What's the value of the yaml? The user is likely to have upgraded to
> latest 3.x as part of the upgrade process to 4.0, so they'll already have
> had a decision made for them. If correctness didn't break anything, there
> doesn't any longer seem much point in offering a choice?
>
> On 23/11/2020, 22:45, "Brandon Williams"  wrote:
>
> +1 to both as well.
>
> On Mon, Nov 23, 2020, 4:42 PM Blake Eggleston
> 
> wrote:
>
> > +1 to correctness, and I like the yaml idea
> >
> > > On Nov 23, 2020, at 4:20 AM, Paulo Motta  >
> > wrote:
> > >
> > > +1 to defaulting for correctness.
> > >
> > > In addition to that, how about making it a mandatory 
cassandra.yaml
> > > property defaulting to correctness? This would make upgrades with
> an old
> > > cassandra.yaml fail unless an option is explicitly specified,
> making
> > > operators aware of the issue and forcing them to make a choice.
> > >
> > >> Em seg., 23 de nov. de 2020 às 07:30, Benjamin Lerer <
> > >> benjamin.le...@datastax.com> escreveu:
> > >>
> > >> Thank you very much to everybody that provided feedback. It
> helped a
> > lot to
> > >> limit our options.
> > >>
> > >> Unfortunately, it seems that some poor soul (me, really!!!) will
> have to
> > >> make the final call between #3 and #4.
> > >>
> > >> If I reformulate the question to: Do we default to *correctness
> *or to
> > >> *performance*?
> > >>
> > >> I would choose to default to *correctness*.
> > >>
> > >> Of course the situation is more complex than that but it seems
> that
> > >> somebody has to make a call and live with it. It seems to me that
> being
> > >> blamed for choosing correctness is easier to live with ;-)
> > >>
> > >> Benjamin
> > >>
> > >> PS: I tried to push the choice on Sylvain but he dodged the
> bullet.
> > >>
> > >> On Sat, Nov 21, 2020 at 12:30 AM Benedict Elliott Smith <
> > >> bened...@apache.org>
> > >> wrote:
> > >>
> > >>> I think I meant #4 __‍♂️
> > >>>
> > >>> On 20/11/2020, 21:11, "Blake Eggleston"
>  > >
> > >>> wrote:
> > >>>
> > >>>I’d also prefer #3 over #4
> > >>>
> > >>>> On Nov 20, 2020, at 10:03 AM, Benedict Elliott Smith <
> > >>> bened...@apache.org> wrote:
> > >>>>
> > >>>> Well, I expressed a preference for #3 over #4, particularly 
for
> > >> the
> > >>> 3.x series.  However at this point, I think the lack of a clear
> project
> > >>> decision means we can punt it back to you and Sylvain to make
> the final
> > >>> call.
> > >>>>
> > >>>> On 20/11/2020, 16:23, "Benjamin Lerer" <
> > >> benjamin.le...@datastax.com>
> > >>> wrote:
> > >>>>
> > >>>>   I will try to summarize the discussion to clarify the 
outcome.
> > >>>>
> > >>>>   Mick is in favor of #4
> > >>>>   Summanth is in favor of #4
> > >>>>   Sylvain answer was not clear for me. I understood it like I
> > >>> prefer #3 to #4
>

Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

2020-11-23 Thread Benedict Elliott Smith
What's the value of the yaml? The user is likely to have upgraded to latest 3.x 
as part of the upgrade process to 4.0, so they'll already have had a decision 
made for them. If correctness didn't break anything, there doesn't any longer 
seem much point in offering a choice?

On 23/11/2020, 22:45, "Brandon Williams"  wrote:

+1 to both as well.

On Mon, Nov 23, 2020, 4:42 PM Blake Eggleston 
wrote:

> +1 to correctness, and I like the yaml idea
>
> > On Nov 23, 2020, at 4:20 AM, Paulo Motta 
> wrote:
> >
> > +1 to defaulting for correctness.
> >
> > In addition to that, how about making it a mandatory cassandra.yaml
> > property defaulting to correctness? This would make upgrades with an old
> > cassandra.yaml fail unless an option is explicitly specified, making
> > operators aware of the issue and forcing them to make a choice.
> >
> >> Em seg., 23 de nov. de 2020 às 07:30, Benjamin Lerer <
> >> benjamin.le...@datastax.com> escreveu:
> >>
> >> Thank you very much to everybody that provided feedback. It helped a
> lot to
> >> limit our options.
> >>
> >> Unfortunately, it seems that some poor soul (me, really!!!) will have 
to
> >> make the final call between #3 and #4.
> >>
> >> If I reformulate the question to: Do we default to *correctness *or to
> >> *performance*?
> >>
> >> I would choose to default to *correctness*.
> >>
> >> Of course the situation is more complex than that but it seems that
> >> somebody has to make a call and live with it. It seems to me that being
> >> blamed for choosing correctness is easier to live with ;-)
> >>
> >> Benjamin
> >>
> >> PS: I tried to push the choice on Sylvain but he dodged the bullet.
> >>
> >> On Sat, Nov 21, 2020 at 12:30 AM Benedict Elliott Smith <
    > >> bened...@apache.org>
> >> wrote:
> >>
> >>> I think I meant #4 __‍♂️
> >>>
> >>> On 20/11/2020, 21:11, "Blake Eggleston"  >
> >>> wrote:
> >>>
> >>>I’d also prefer #3 over #4
> >>>
> >>>> On Nov 20, 2020, at 10:03 AM, Benedict Elliott Smith <
> >>> bened...@apache.org> wrote:
> >>>>
> >>>> Well, I expressed a preference for #3 over #4, particularly for
> >> the
> >>> 3.x series.  However at this point, I think the lack of a clear 
project
> >>> decision means we can punt it back to you and Sylvain to make the 
final
> >>> call.
> >>>>
> >>>> On 20/11/2020, 16:23, "Benjamin Lerer" <
> >> benjamin.le...@datastax.com>
> >>> wrote:
> >>>>
> >>>>   I will try to summarize the discussion to clarify the outcome.
> >>>>
> >>>>   Mick is in favor of #4
> >>>>   Summanth is in favor of #4
> >>>>   Sylvain answer was not clear for me. I understood it like I
> >>> prefer #3 to #4
> >>>>   and I am also fine with #1
> >>>>   Jeff is in favor of #3 and will understand #4
> >>>>   David is in favor #3 (fix bug and add flag to roll back to old
> >>> behavior) in
> >>>>   4.0 and #4 in 3.0 and 3.11
> >>>>
> >>>>   Do not hesitate to correct me if I misunderstood your answer.
> >>>>
> >>>>   Based on these answers it seems clear that most people prefer to
> >>> go for #3
> >>>>   or #4.
> >>>>
> >>>>   The choice between #3 (fix correctness opt-in to current
> >>> behavior) and #4
> >>>>   (current behavior opt-in to correctness) is a bit less clear
> >>> specially if
> >>>>   we consider the 3.X branches or 4.0.
> >>>>
> >>>>   Does anybody as some idea on how to choose between those 2
> >>> choices or some
> >>>>   extra opinions on #3 versus #4?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>   On Wed, Nov 18, 2020 at 9:45 PM David Capwell <
 

Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

2020-11-20 Thread Benedict Elliott Smith
I think I meant #4 __‍♂️

On 20/11/2020, 21:11, "Blake Eggleston"  wrote:

I’d also prefer #3 over #4

> On Nov 20, 2020, at 10:03 AM, Benedict Elliott Smith 
 wrote:
> 
> Well, I expressed a preference for #3 over #4, particularly for the 3.x 
series.  However at this point, I think the lack of a clear project decision 
means we can punt it back to you and Sylvain to make the final call.
> 
> On 20/11/2020, 16:23, "Benjamin Lerer"  
wrote:
> 
>I will try to summarize the discussion to clarify the outcome.
> 
>Mick is in favor of #4
>Summanth is in favor of #4
>Sylvain answer was not clear for me. I understood it like I prefer #3 
to #4
>and I am also fine with #1
>Jeff is in favor of #3 and will understand #4
>David is in favor #3 (fix bug and add flag to roll back to old 
behavior) in
>4.0 and #4 in 3.0 and 3.11
> 
>Do not hesitate to correct me if I misunderstood your answer.
> 
>Based on these answers it seems clear that most people prefer to go 
for #3
>or #4.
> 
>The choice between #3 (fix correctness opt-in to current behavior) and 
#4
>(current behavior opt-in to correctness) is a bit less clear specially 
if
>we consider the 3.X branches or 4.0.
> 
>Does anybody as some idea on how to choose between those 2 choices or 
some
>extra opinions on #3 versus #4?
> 
> 
> 
> 
> 
> 
>>On Wed, Nov 18, 2020 at 9:45 PM David Capwell  
wrote:
>> 
>> I feel that #4 (fix bug and add flag to roll back to old behavior) is 
best.
>> 
>> About the alternative implementation, I am fine adding it to 3.x and 4.0,
>> but should treat it as a different path disabled by default that you can
>> opt-into, with a plan to opt-in by default "eventually".
>> 
>> On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith <
>> bened...@apache.org>
>> wrote:
>> 
>>> Perhaps there might be broader appetite to weigh in on which major
>>> releases we might target for work that fixes the correctness bug without
>>> serious performance regression?
>>> 
>>> i.e., if we were to fix the correctness bug now, introducing a serious
>>> performance regression (either opt-in or opt-out), but were to land work
>>> without this problem for 5.0, would there be appetite to backport this
>> work
>>> to any of 4.0, 3.11 or 3.0?
>>> 
>>> 
>>> On 18/11/2020, 18:31, "Jeff Jirsa"  wrote:
>>> 
>>>This is complicated and relatively few people on earth understand it,
>>> so
>>>having little feedback is mostly expected, unfortunately.
>>> 
>>>    My normal emotional response is "correctness is required, opt-in to
>>>performance improvements that sacrifice strict correctness", but I'm
>>> also
>>>sure this is going to surprise people, and would understand / accept
>> #4
>>>(default to current, opt-in to correct).
>>> 
>>> 
>>>On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith <
>>> bened...@apache.org>
>>>wrote:
>>> 
>>>> It doesn't seem like there's much enthusiasm for any of the options
>>>> available here...
>>>> 
>>>> On 12/11/2020, 14:37, "Benedict Elliott Smith" <
>> bened...@apache.org
>>>> 
>>>> wrote:
>>>> 
>>>>> Is the new implementation a separate, distinctly modularized
>>> new
>>>> body of work
>>>> 
>>>>It’s primarily a distinct, modularised and new body of work,
>>> however
>>>> there is some shared code that has been modified - namely
>>> PaxosState, in
>>>> which legacy code is maintained but modified for compatibility, and
>>> the
>>>> system.paxos table (which receives a new column, and slightly
>>> modified
>>>> serialization code).  It is conceptually an optimised version of
>> the
>>>> existing algorithm.
>>>> 
>>>>If there's a chance of being of value to 4.0, I can try to put
>>> up a
>>>> patch next week alongside a high level descri

Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

2020-11-20 Thread Benedict Elliott Smith
Well, I expressed a preference for #3 over #4, particularly for the 3.x series. 
 However at this point, I think the lack of a clear project decision means we 
can punt it back to you and Sylvain to make the final call.

On 20/11/2020, 16:23, "Benjamin Lerer"  wrote:

I will try to summarize the discussion to clarify the outcome.

Mick is in favor of #4
Summanth is in favor of #4
Sylvain answer was not clear for me. I understood it like I prefer #3 to #4
and I am also fine with #1
Jeff is in favor of #3 and will understand #4
David is in favor #3 (fix bug and add flag to roll back to old behavior) in
4.0 and #4 in 3.0 and 3.11

Do not hesitate to correct me if I misunderstood your answer.

Based on these answers it seems clear that most people prefer to go for #3
or #4.

The choice between #3 (fix correctness opt-in to current behavior) and #4
(current behavior opt-in to correctness) is a bit less clear specially if
we consider the 3.X branches or 4.0.

Does anybody as some idea on how to choose between those 2 choices or some
extra opinions on #3 versus #4?






On Wed, Nov 18, 2020 at 9:45 PM David Capwell  wrote:

> I feel that #4 (fix bug and add flag to roll back to old behavior) is 
best.
>
> About the alternative implementation, I am fine adding it to 3.x and 4.0,
> but should treat it as a different path disabled by default that you can
> opt-into, with a plan to opt-in by default "eventually".
>
    > On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith <
> bened...@apache.org>
> wrote:
>
> > Perhaps there might be broader appetite to weigh in on which major
> > releases we might target for work that fixes the correctness bug without
> > serious performance regression?
> >
> > i.e., if we were to fix the correctness bug now, introducing a serious
> > performance regression (either opt-in or opt-out), but were to land work
> > without this problem for 5.0, would there be appetite to backport this
> work
> > to any of 4.0, 3.11 or 3.0?
> >
> >
> > On 18/11/2020, 18:31, "Jeff Jirsa"  wrote:
> >
> > This is complicated and relatively few people on earth understand 
it,
> > so
> > having little feedback is mostly expected, unfortunately.
> >
> > My normal emotional response is "correctness is required, opt-in to
> > performance improvements that sacrifice strict correctness", but I'm
> > also
> > sure this is going to surprise people, and would understand / accept
> #4
> > (default to current, opt-in to correct).
> >
> >
> > On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith <
    > > bened...@apache.org>
> > wrote:
> >
> > > It doesn't seem like there's much enthusiasm for any of the 
options
> > > available here...
> > >
> > > On 12/11/2020, 14:37, "Benedict Elliott Smith" <
> bened...@apache.org
> > >
> > > wrote:
> > >
> > > > Is the new implementation a separate, distinctly modularized
> > new
> > > body of work
> > >
> > > It’s primarily a distinct, modularised and new body of work,
> > however
> > > there is some shared code that has been modified - namely
> > PaxosState, in
> > > which legacy code is maintained but modified for compatibility, 
and
> > the
> > > system.paxos table (which receives a new column, and slightly
> > modified
> > > serialization code).  It is conceptually an optimised version of
> the
> > > existing algorithm.
> > >
> > > If there's a chance of being of value to 4.0, I can try to put
> > up a
> > > patch next week alongside a high level description of the changes.
> > >
> > > > But a performance regression is a regression, I'm not
> > shrugging it
> > > off.
> > >
> > > I don't want to give the impression I'm shrugging off the
> > correctness
> > > issue either. It's a serious issue to fix, but since all 
successful
> > updates
> > > to the database are linearizable, I think it's likely that many
> > > applications behave correctly with the present semantics, or at
   

Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

2020-11-18 Thread Benedict Elliott Smith
Perhaps there might be broader appetite to weigh in on which major releases we 
might target for work that fixes the correctness bug without serious 
performance regression?

i.e., if we were to fix the correctness bug now, introducing a serious 
performance regression (either opt-in or opt-out), but were to land work 
without this problem for 5.0, would there be appetite to backport this work to 
any of 4.0, 3.11 or 3.0? 


On 18/11/2020, 18:31, "Jeff Jirsa"  wrote:

This is complicated and relatively few people on earth understand it, so
having little feedback is mostly expected, unfortunately.

My normal emotional response is "correctness is required, opt-in to
performance improvements that sacrifice strict correctness", but I'm also
sure this is going to surprise people, and would understand / accept #4
(default to current, opt-in to correct).


On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith 
wrote:

> It doesn't seem like there's much enthusiasm for any of the options
> available here...
>
    > On 12/11/2020, 14:37, "Benedict Elliott Smith" 
> wrote:
>
> > Is the new implementation a separate, distinctly modularized new
> body of work
>
> It’s primarily a distinct, modularised and new body of work, however
> there is some shared code that has been modified - namely PaxosState, in
> which legacy code is maintained but modified for compatibility, and the
> system.paxos table (which receives a new column, and slightly modified
> serialization code).  It is conceptually an optimised version of the
> existing algorithm.
>
> If there's a chance of being of value to 4.0, I can try to put up a
> patch next week alongside a high level description of the changes.
>
> > But a performance regression is a regression, I'm not shrugging it
> off.
>
> I don't want to give the impression I'm shrugging off the correctness
> issue either. It's a serious issue to fix, but since all successful 
updates
> to the database are linearizable, I think it's likely that many
> applications behave correctly with the present semantics, or at least
> encounter only transient errors. No doubt many also do not, but I have no
> idea of the ratio.
>
> The regression isn't itself a simple issue either - depending on the
> topology and message latencies it is not difficult to produce inescapable
> contention, i.e. guaranteed timeouts - that might persist as long as
> clients continue to retry. It could be quite a serious degradation of
> service to impose on our users.
>
> I don't pretend to know the correct way to make a decision balancing
> these considerations, but I am perhaps more concerned about imposing
> service outages than I am temporarily maintaining semantics our users have
> apparently accepted for years - though I absolutely share your
> embarrassment there.
>
>
> On 12/11/2020, 12:41, "Joshua McKenzie"  wrote:
>
> Is the new implementation a separate, distinctly modularized new
> body of
> work or does it make substantial changes to existing
> implementation and
> subsume it?
>
> On Thu, Nov 12, 2020 at 3:56 AM Sylvain Lebresne <
> lebre...@gmail.com> wrote:
>
> > Regarding option #4, I'll remark that experience tends to
> suggest users
> > don't consistently read the `NEWS.txt` file on upgrade, so
> option #4 will
> > likely essentially mean "LWT has a correctness issue, but once
> it broke
> > your data enough that you'll notice, you'll be able to dig the
> proper flag
> > to fix it for next time". I guess it's better than nothing, of
> course, but
> > I'll admit that defaulting to "opt-in correctness", especially
> for a
> > feature (LWT) that exists uniquely to provide additional
> guarantees, is
> > something I have a hard rallying behind.
> >
> > But a performance regression is a regression, I'm not shrugging
> it off.
> > Still, I feel we shouldn't leave LWT with a fairly serious known
> > correctness bug and I frankly feel bad for "the project" that
> this has been
> > known for so long without action, so I'm a bit biased in wanting
> to get it
> > fixed asap.
> >
> > But maybe I'm overstating the urgenc

Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

2020-11-18 Thread Benedict Elliott Smith
It doesn't seem like there's much enthusiasm for any of the options available 
here...

On 12/11/2020, 14:37, "Benedict Elliott Smith"  wrote:

> Is the new implementation a separate, distinctly modularized new body of 
work

It’s primarily a distinct, modularised and new body of work, however there 
is some shared code that has been modified - namely PaxosState, in which legacy 
code is maintained but modified for compatibility, and the system.paxos table 
(which receives a new column, and slightly modified serialization code).  It is 
conceptually an optimised version of the existing algorithm.

If there's a chance of being of value to 4.0, I can try to put up a patch 
next week alongside a high level description of the changes.

> But a performance regression is a regression, I'm not shrugging it off.

I don't want to give the impression I'm shrugging off the correctness issue 
either. It's a serious issue to fix, but since all successful updates to the 
database are linearizable, I think it's likely that many applications behave 
correctly with the present semantics, or at least encounter only transient 
errors. No doubt many also do not, but I have no idea of the ratio.

The regression isn't itself a simple issue either - depending on the 
topology and message latencies it is not difficult to produce inescapable 
contention, i.e. guaranteed timeouts - that might persist as long as clients 
continue to retry. It could be quite a serious degradation of service to impose 
on our users.

I don't pretend to know the correct way to make a decision balancing these 
considerations, but I am perhaps more concerned about imposing service outages 
than I am temporarily maintaining semantics our users have apparently accepted 
for years - though I absolutely share your embarrassment there.


On 12/11/2020, 12:41, "Joshua McKenzie"  wrote:

Is the new implementation a separate, distinctly modularized new body of
work or does it make substantial changes to existing implementation and
subsume it?

On Thu, Nov 12, 2020 at 3:56 AM Sylvain Lebresne  
wrote:

> Regarding option #4, I'll remark that experience tends to suggest 
users
> don't consistently read the `NEWS.txt` file on upgrade, so option #4 
will
> likely essentially mean "LWT has a correctness issue, but once it 
broke
> your data enough that you'll notice, you'll be able to dig the proper 
flag
> to fix it for next time". I guess it's better than nothing, of 
course, but
> I'll admit that defaulting to "opt-in correctness", especially for a
> feature (LWT) that exists uniquely to provide additional guarantees, 
is
> something I have a hard rallying behind.
>
> But a performance regression is a regression, I'm not shrugging it 
off.
> Still, I feel we shouldn't leave LWT with a fairly serious known
> correctness bug and I frankly feel bad for "the project" that this 
has been
> known for so long without action, so I'm a bit biased in wanting to 
get it
> fixed asap.
>
> But maybe I'm overstating the urgency here, and maybe option #1 is a 
better
> way forward.
>
> --
> Sylvain
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

2020-11-12 Thread Benedict Elliott Smith
> Is the new implementation a separate, distinctly modularized new body of work

It’s primarily a distinct, modularised and new body of work, however there is 
some shared code that has been modified - namely PaxosState, in which legacy 
code is maintained but modified for compatibility, and the system.paxos table 
(which receives a new column, and slightly modified serialization code).  It is 
conceptually an optimised version of the existing algorithm.

If there's a chance of being of value to 4.0, I can try to put up a patch next 
week alongside a high level description of the changes.

> But a performance regression is a regression, I'm not shrugging it off.

I don't want to give the impression I'm shrugging off the correctness issue 
either. It's a serious issue to fix, but since all successful updates to the 
database are linearizable, I think it's likely that many applications behave 
correctly with the present semantics, or at least encounter only transient 
errors. No doubt many also do not, but I have no idea of the ratio.

The regression isn't itself a simple issue either - depending on the topology 
and message latencies it is not difficult to produce inescapable contention, 
i.e. guaranteed timeouts - that might persist as long as clients continue to 
retry. It could be quite a serious degradation of service to impose on our 
users.

I don't pretend to know the correct way to make a decision balancing these 
considerations, but I am perhaps more concerned about imposing service outages 
than I am temporarily maintaining semantics our users have apparently accepted 
for years - though I absolutely share your embarrassment there.


On 12/11/2020, 12:41, "Joshua McKenzie"  wrote:

Is the new implementation a separate, distinctly modularized new body of
work or does it make substantial changes to existing implementation and
subsume it?

On Thu, Nov 12, 2020 at 3:56 AM Sylvain Lebresne  wrote:

> Regarding option #4, I'll remark that experience tends to suggest users
> don't consistently read the `NEWS.txt` file on upgrade, so option #4 will
> likely essentially mean "LWT has a correctness issue, but once it broke
> your data enough that you'll notice, you'll be able to dig the proper flag
> to fix it for next time". I guess it's better than nothing, of course, but
> I'll admit that defaulting to "opt-in correctness", especially for a
> feature (LWT) that exists uniquely to provide additional guarantees, is
> something I have a hard rallying behind.
>
> But a performance regression is a regression, I'm not shrugging it off.
> Still, I feel we shouldn't leave LWT with a fairly serious known
> correctness bug and I frankly feel bad for "the project" that this has 
been
> known for so long without action, so I'm a bit biased in wanting to get it
> fixed asap.
>
> But maybe I'm overstating the urgency here, and maybe option #1 is a 
better
> way forward.
>
> --
> Sylvain
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

2020-11-11 Thread Benedict Elliott Smith
In my opinion, a similar calculus should be applied to 3.0 and 3.11.  This is 
a(n arguably quite serious) bug, so whatever is not overly onerous to backport 
should be considered while they are supported. The work under discussion has 
two components: a replacement to the core consensus algorithm, and mechanisms 
to ensure safety across range movements. The latter might be more invasive for 
3.x, but the former should be quite easy to backport and as such probably quite 
well justified.

> can it also pluggable (either opt-in or opt-out)?

I think pluggable means something different to opt-in/opt-out, at least to me.  
I'm all for more pluggability, and also for more optionality, but the decision 
is very sensitive to context. We need to be able to select between our options, 
which for consensus practically means supporting live migration - which is 
exceptionally challenging in any general sense (and perhaps inherently 
non-pluggable).

As to future development for consensus, I personally hope the work we are 
discussing here will be a strong platform for it, but obviously that's for the 
community to decide later on. I think the work to take it forwards to something 
epaxos-like will not be that herculean, with some incremental milestones en 
route. But that's a totally different discussion for the future, and either a 
CEP or a small intercollegiate working group.


On 11/11/2020, 18:48, "Michael Semb Wever"  wrote:


> Regarding CASSANDRA-12126 and 4.0 we are facing several options and
> Benedict, Sylvain and I wanted to get the community feedback on them.
> 
> We can:
> 
>1. Try to use Benedict proposal for 4.0 if the community has the
>appetite for it. The main issue there is some potential extra delay 
for 4.0
>2. Do nothing for 4.0. Meaning do not commit the current patch. We have
>lived a long time with that issue and we can probably wait a bit more 
for a
>proper solution.
>3. Commit the patch as such, fixing the correctness but introducing
>potentially some performance issue until we release a better solution.
>4. Changing the patch to default to the current behavior but allowing
>people to enable the new one if the correctness is a problem for them.
> 


If these options are for 4.0, is it then (4) that it is getting applied to 
3.0 and 3.11 ?

If that is the case then I would vote on also applying (4) to 4.0, given we 
are now in front of beta4. Please let's not further delay 4.0.

Post 4.0, if (1) is as described "a parallel implementation of the same 
underlying Paxos algorithm" can it also pluggable (either opt-in or opt-out)? 
And would/could EPaxos become pluggable too in a similar manner (if it 
eventuates)? I'm in favour on providing more pluggable interfaces into C*, 
along with the code quality improvements that's going to have to be accompanied 
with. 



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

2020-11-11 Thread Benedict Elliott Smith
It's been there since the beginning.

If we were to consider the alternative proposal for 4.0, it would not have to 
be blocking for release. I had planned to come forward after 4.0, primarily 
because I did not want to create further political complexities for the project 
at this time, but also because I do not presently have the time to produce all 
of the documentation we might like for such a proposal. However, the work is 
ready, has already been reviewed by multiple committers, has had more extensive 
testing than any feature I'm aware of to date, and could be made available for 
4.0 in fairly short order. While the work itself is non-trivial, the work to 
integrate it is not complex.  It would also be optional, and configurable at 
runtime.

The only likely blocker would be the process of review, and any other due 
diligence the project might want to undertake.  Absolutely not something I 
advocate for or against an accelerated timescale on.  I have no personal 
preference for the approach taken, just providing this for context.


On 11/11/2020, 16:18, "Joshua McKenzie"  wrote:

How old is the C-12126 surfaced defect? i.e. is this a thing we've had
since initial introduction of paxos or is it a regression we introduced
somewhere along the way?

On Wed, Nov 11, 2020 at 11:03 AM Benjamin Lerer 

wrote:

> CASSANDRA-12126 addresses one correctness issue of Light Weight
> Transactions. Unfortunately, the current patch developed by Sylvain and
> Benedict requires an extra round trip between the coordinator and the
> replicas for SERIAL and LOCAL_SERIAL reads.
> After some experimentations, Benedict discovered that this extra round 
trip
> could lead to a significant increase in timeouts for read-heavy workloads.
>
> Users for which this behavior is a problem will be able to switch back to
> the old behavior using a system property, therefore choosing performance
> versus correctness.
>
> On the side, Benedict has worked on another approach that does not suffer
> from that performance problem and also addresses some LWT correctness
> issues that can happen when adding or removing nodes. He initially 
intended
> to deliver that improvement in 4.X but can try to incorporate it into 4.0.
>
> Regarding CASSANDRA-12126 and 4.0 we are facing several options and
> Benedict, Sylvain and I wanted to get the community feedback on them.
>
> We can:
>
>1. Try to use Benedict proposal for 4.0 if the community has the
>appetite for it. The main issue there is some potential extra delay for
> 4.0
>2. Do nothing for 4.0. Meaning do not commit the current patch. We have
>lived a long time with that issue and we can probably wait a bit more
> for a
>proper solution.
>3. Commit the patch as such, fixing the correctness but introducing
>potentially some performance issue until we release a better solution.
>4. Changing the patch to default to the current behavior but allowing
>people to enable the new one if the correctness is a problem for them.
>
>   Thanks in advance for your feedback.
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Supported upgrade path for 4.0

2020-10-10 Thread Benedict Elliott Smith
This sounds eminently sensible to me.

On 09/10/2020, 19:42, "Joshua McKenzie"  wrote:

Fair point on uncertainties and delaying decisions until strictly required
so we have more data.

I want to nuance my earlier proposal and what we document (sorry for the
multiple messages; my time is fragmented enough these days that I only have
thin slices to engage w/stuff like this).

I think we should do a "From → To" model for both testing and supporting
upgrades and have a point of view as a project for each currently supported
version of C* in the "From" list. Specifically - we test and recommend the
following paths:

   1. 2.1 → 3.0 → 4.0
   2. 3.0 → 4.0 (subset of 1)
   3. 3.11 → 4.0

There's no value whatsoever in hopping through an interim version if a
leapfrog is expected to be as tested and stable. The only other alternative
would be to recommend 2.1 → 3.11 → 4.0 (as Mick alluded to) but that just
exposes to more deltas from the tick-tock .X line for no added value as you
mentioned.

We could re-apply the "from-to" testing and support model in future
releases w/whatever is supported at that time. That way users will be able
to have a single source of truth on what the project recommends and vets
for going from wherever they are to the latest.


On Fri, Oct 09, 2020 at 12:05 PM, Benedict Elliott Smith <
bened...@apache.org> wrote:

> There is a sizeable cohort of us who I expect to be primarily focused on
> 3.0->4.0, so if you have a cohort focusing primarily on 3.11->4.0 I think
> we'll be in good shape.
>
> For all subsequent major releases, we test and officially support only 1
> major back
>
> I think we should wait to see what happens before committing ourselves to
> something like this - things like release cadence etc will matter a lot.
> That is *definitely* not to say that I disagree with you, just that I 
think
> more project future-context is needed to make a decision like this. I
> expect we'll have lots more fun (hopefully positive) conversations around
> topics like this in the coming year, as I have no doubt we all want to
> evolve our approach to releases, and there's no knowing what we'll end up
> deciding (we have done some crazy things in the past __ ).
>
> On 09/10/2020, 16:46, "Joshua McKenzie"  wrote:
>
> I think it's a clean and simple heuristic for the project to say "you can
> safely upgrade to adjacent major versions".
>
> The difficulty we face with 3.0 is that it has made many contributors very
> wary of pre 4.0 code and with good reason. Your point about conservative
> users upgrading later in a cycle resonates Benedict, and reflects on the
> confidence we should or should not have in 3.11. I think it's also
> important to realize that many cluster upgrades can take months, so it's
> not a transient exposure to unknowns in a release.
>
> I propose the following compromise:
>
> 1. For 4.0 GA, we consider the following upgrade paths "tested and
> supported": 2.1 → 3.0 → 3.11 → 4.0, and 2.1 → 3.0 → 4.0
> 2. For all subsequent major releases, we test and officially support only
> 1 major back
> 3. Any contributor can optionally meet whatever bar we set for "tested and
> supported" to allow leapfrogging versions, but we won't constrain GA on
> that.
>
> We have to pay down our debt right now, but if we have to continue to do
> this in the future we're not learning from our mistakes.
>
> Speaking for DataStax, we don't have enough resources to work through the
> new testing work on 40_quality_test, the defects that David is surfacing
> like crazy (well done!), and validating 2 major upgrade paths. If you and 
a
> set of contributors could take on the 3.0 → 4.0 path Benedict, that'd be a
    > great help. I also assume we could all collaborate on the tooling / infra 
/
> approaches we use for this validation so it wouldn't be a complete 
re-work.
>
> On Fri, Oct 09, 2020 at 11:02 AM, Benedict Elliott Smith < benedict@
> apache.org> wrote:
>
> Since email is very unclear and context gets lost, I'm personally OK with
> officially supporting all of these upgrade paths, but the spectre was
> raised that this might lead to lost labour due to an increased support
> burden. My view is that 3.0->4.0 is probably a safer upgrade path for 
users
> and as a result a lower support cost to the project, so I would be happy 
to
> deprecate 3.0->3.11 if this helps alleviate the concerns of others that
    > this wou

Re: Supported upgrade path for 4.0

2020-10-09 Thread Benedict Elliott Smith
There is a sizeable cohort of us who I expect to be primarily focused on 
3.0->4.0, so if you have a cohort focusing primarily on 3.11->4.0 I think we'll 
be in good shape.

> For all subsequent major releases, we test and officially support only 1 
> major back

I think we should wait to see what happens before committing ourselves to 
something like this - things like release cadence etc will matter a lot.  That 
is *definitely* not to say that I disagree with you, just that I think more 
project future-context is needed to make a decision like this.  I expect we'll 
have lots more fun (hopefully positive) conversations around topics like this 
in the coming year, as I have no doubt we all want to evolve our approach to 
releases, and there's no knowing what we'll end up deciding (we have done some 
crazy things in the past __ ).


On 09/10/2020, 16:46, "Joshua McKenzie"  wrote:

I think it's a clean and simple heuristic for the project to say "you can
safely upgrade to adjacent major versions".

The difficulty we face with 3.0 is that it has made many contributors very
wary of pre 4.0 code and with good reason. Your point about conservative
users upgrading later in a cycle resonates Benedict, and reflects on the
confidence we should or should not have in 3.11. I think it's also
important to realize that many cluster upgrades can take months, so it's
not a transient exposure to unknowns in a release.

I propose the following compromise:

   1. For 4.0 GA, we consider the following upgrade paths "tested and
   supported": 2.1 → 3.0 → 3.11 → 4.0, and 2.1 → 3.0 → 4.0
   2. For all subsequent major releases, we test and officially support
   only 1 major back
   3. Any contributor can optionally meet whatever bar we set for "tested
   and supported" to allow leapfrogging versions, but we won't constrain GA 
on
   that.

We have to pay down our debt right now, but if we have to continue to do
this in the future we're not learning from our mistakes.

Speaking for DataStax, we don't have enough resources to work through the
new testing work on 40_quality_test, the defects that David is surfacing
like crazy (well done!), and validating 2 major upgrade paths. If you and a
set of contributors could take on the 3.0 → 4.0 path Benedict, that'd be a
great help. I also assume we could all collaborate on the tooling / infra /
approaches we use for this validation so it wouldn't be a complete re-work.



    On Fri, Oct 09, 2020 at 11:02 AM, Benedict Elliott Smith <
bened...@apache.org> wrote:

> Since email is very unclear and context gets lost, I'm personally OK with
> officially supporting all of these upgrade paths, but the spectre was
> raised that this might lead to lost labour due to an increased support
> burden. My view is that 3.0->4.0 is probably a safer upgrade path for 
users
> and as a result a lower support cost to the project, so I would be happy 
to
> deprecate 3.0->3.11 if this helps alleviate the concerns of others that
> this would be costly to the project. Alternatively, if we want to support
> both but some feel 3.0->4.0 is burdensome, I would be happy to focus on
> 3.0->4.0 while they focus on the paths I would be happy to deprecate.
>
> On 09/10/2020, 15:49, "Benedict Elliott Smith" 
> wrote:
>
> Yeah, and perhaps even drop 2.1 (2.2) -> 3.11 when 4.0 appears.
>
> I think there's anyway a big difference between supported and encouraged.
> I think we should encourage 2.1->3.0->4.0, while maintaining support for
> 2.2->3.0 and 3.0->3.11 for critical bugs only, and 3.11->4.0 in the normal
> way given the userbase that is already on 3.11.
>
> we can expect it to be *stable enough to upgrade through*
>
> I don't know that this is true at all. Most bugs are not found by the
> general userbase, and the most conservative (hence most likely to spot
> problems on upgrade) are generally very late to the party. 2.1(2.2)->3.0 
is
> still discovering bugs today, many years after this metric was passed for
> 3.0 - largely as the more sophisticated users upgrade.
>
> On 09/10/2020, 15:40, "Marcus Eriksson"  wrote:
>
> My suggestion for "supported" upgrade paths would be;
>
> 2.1 (2.2) -> 3.0 -> 4.0
> 2.1 (2.2) -> 3.11 -> 4.0
>
> and drop support for 3.0 -> 3.11 when we release 4.0
>
> /Marcus
>
> On 9 October 2020 at 16:12:12, Joshua McKenzie (jmcken...@apache.org)
> wrote:
>
> Some data that I believe is relevant here.
>
> Numerically it's safe to assume there's over

Re: Supported upgrade path for 4.0

2020-10-09 Thread Benedict Elliott Smith
Since email is very unclear and context gets lost, I'm personally OK with 
officially supporting all of these upgrade paths, but the spectre was raised 
that this might lead to lost labour due to an increased support burden. My view 
is that 3.0->4.0 is probably a safer upgrade path for users and as a result a 
lower support cost to the project, so I would be happy to deprecate 3.0->3.11 
if this helps alleviate the concerns of others that this would be costly to the 
project. Alternatively, if we want to support both but some feel 3.0->4.0 is 
burdensome, I would be happy to focus on 3.0->4.0 while they focus on the paths 
I would be happy to deprecate.


On 09/10/2020, 15:49, "Benedict Elliott Smith"  wrote:

Yeah, and perhaps even drop 2.1 (2.2) -> 3.11 when 4.0 appears.

I think there's anyway a big difference between supported and encouraged.  
I think we should encourage 2.1->3.0->4.0, while maintaining support for 
2.2->3.0 and 3.0->3.11 for critical bugs only, and 3.11->4.0 in the normal way 
given the userbase that is already on 3.11.

> we can expect it to be *stable enough to upgrade through*

I don't know that this is true at all.  Most bugs are not found by the 
general userbase, and the most conservative (hence most likely to spot problems 
on upgrade) are generally very late to the party.  2.1(2.2)->3.0 is still 
discovering bugs today, many years after this metric was passed for 3.0 - 
largely as the more sophisticated users upgrade.


On 09/10/2020, 15:40, "Marcus Eriksson"  wrote:

My suggestion for "supported" upgrade paths would be;

2.1 (2.2) -> 3.0 -> 4.0
2.1 (2.2) -> 3.11 -> 4.0

and drop support for 3.0 -> 3.11 when we release 4.0

/Marcus



On 9 October 2020 at 16:12:12, Joshua McKenzie (jmcken...@apache.org) 
wrote:
> Some data that I believe is relevant here.
>  
> Numerically it's safe to assume there's over 10,000 ASF C* clusters 
out in
> the world (5,500 in China alone). In surveys (both informal polling 
and
> primary research), at least 1/3rd of folks are running the 3.X latest 
if I
> recall correctly.
>  
> Basic conclusions we can draw from these data points:
> 1) There are thousands of clusters running some form of post 3.0, so 
we can
> expect it to be *stable enough to upgrade through*
> 2) We have to support at least 3.11 → 4.0
>  
> If 1/3rd of our users are running 2.1, 1/3rd 3.0, and 1/3rd 3.11
> (hand-waving, probably more in the 25 vs. 40 etc but splitting hairs),
> there's clearly a significant value-add in usability of skipping 
majors
> (3.0->4.0). Depending on how we define "done" and "supported" for 
upgrade
> testing, this will represent a significant development burden.
>  
> From a *functional MVP* perspective on what upgrade paths we need to
> support, the absolute minimum would be 2.1 → 3.0 → 3.11 → 4.0
>  
> If anyone wants to step in and officially support the 3.0 → 4.0 line,
> that's fantastic both for the project community and for users. But as 
far
> as basic table stakes, I can't think of a logical reason 3.0 → 4.0 as 
an
> upgraded path should be considered a blocker for releasing 4.0 GA.
>  
>  
>  
> On Fri, Oct 09, 2020 at 9:53 AM, Mick Semb Wever wrote:
>  
> > At The Last Pickle we have always recommended avoiding 3.0, 
including
> > upgrading from 2.2 directly to 3.11.
> > We (now DataStax) will continue to recommend that folk upgrade to 
the
> > latest 3.11 before upgrading to 4.0.
> >
> > To clarify that^, if it wasn't obvious, I wasn't making a statement 
about
> > DataStax at at large, but about those of us at TLP and now the team
> > providing the consulting for Apache Cassandra from DataStax.
> >
>  


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Supported upgrade path for 4.0

2020-10-09 Thread Benedict Elliott Smith
Yeah, and perhaps even drop 2.1 (2.2) -> 3.11 when 4.0 appears.

I think there's anyway a big difference between supported and encouraged.  I 
think we should encourage 2.1->3.0->4.0, while maintaining support for 2.2->3.0 
and 3.0->3.11 for critical bugs only, and 3.11->4.0 in the normal way given the 
userbase that is already on 3.11.

> we can expect it to be *stable enough to upgrade through*

I don't know that this is true at all.  Most bugs are not found by the general 
userbase, and the most conservative (hence most likely to spot problems on 
upgrade) are generally very late to the party.  2.1(2.2)->3.0 is still 
discovering bugs today, many years after this metric was passed for 3.0 - 
largely as the more sophisticated users upgrade.


On 09/10/2020, 15:40, "Marcus Eriksson"  wrote:

My suggestion for "supported" upgrade paths would be;

2.1 (2.2) -> 3.0 -> 4.0
2.1 (2.2) -> 3.11 -> 4.0

and drop support for 3.0 -> 3.11 when we release 4.0

/Marcus



On 9 October 2020 at 16:12:12, Joshua McKenzie (jmcken...@apache.org) wrote:
> Some data that I believe is relevant here.
>  
> Numerically it's safe to assume there's over 10,000 ASF C* clusters out in
> the world (5,500 in China alone). In surveys (both informal polling and
> primary research), at least 1/3rd of folks are running the 3.X latest if I
> recall correctly.
>  
> Basic conclusions we can draw from these data points:
> 1) There are thousands of clusters running some form of post 3.0, so we 
can
> expect it to be *stable enough to upgrade through*
> 2) We have to support at least 3.11 → 4.0
>  
> If 1/3rd of our users are running 2.1, 1/3rd 3.0, and 1/3rd 3.11
> (hand-waving, probably more in the 25 vs. 40 etc but splitting hairs),
> there's clearly a significant value-add in usability of skipping majors
> (3.0->4.0). Depending on how we define "done" and "supported" for upgrade
> testing, this will represent a significant development burden.
>  
> From a *functional MVP* perspective on what upgrade paths we need to
> support, the absolute minimum would be 2.1 → 3.0 → 3.11 → 4.0
>  
> If anyone wants to step in and officially support the 3.0 → 4.0 line,
> that's fantastic both for the project community and for users. But as far
> as basic table stakes, I can't think of a logical reason 3.0 → 4.0 as an
> upgraded path should be considered a blocker for releasing 4.0 GA.
>  
>  
>  
> On Fri, Oct 09, 2020 at 9:53 AM, Mick Semb Wever wrote:
>  
> > At The Last Pickle we have always recommended avoiding 3.0, including
> > upgrading from 2.2 directly to 3.11.
> > We (now DataStax) will continue to recommend that folk upgrade to the
> > latest 3.11 before upgrading to 4.0.
> >
> > To clarify that^, if it wasn't obvious, I wasn't making a statement 
about
> > DataStax at at large, but about those of us at TLP and now the team
> > providing the consulting for Apache Cassandra from DataStax.
> >
>  


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Supported upgrade path for 4.0

2020-10-09 Thread Benedict Elliott Smith
> Would it be necessary to go from 3.0 to 3.11 on the way to 4.0? I didn't
> think that was required.

That's what's being discussed, and Mick is proposing requiring it officially, 
to reduce support burden.

> What has been fixed in 3.0 that hasn't been merged into 3.11 ?

Nothing that I'm aware of, but how many bugs are unique to 3.11 that have not 
been discovered?

> Dropping support for upgrading from 3.0 to 3.11 

Nobody is proposing dropping support, but my personal preference would be to 
officially endorse encouraging users to go directly 3.0->4.0, which would 
reduce the support burden for 3.0->3.11 and 3.11->4.0, as many users will skip 
3.11 entirely if we encourage them to do so.  If you would prefer to officially 
encourage 3.0->3.11->4.0, and I would prefer to officially encourage 3.0->4.0, 
it seems reasonable to split the support burden for the paths we want to 
officially endorse, and endorse both?


On 09/10/2020, 09:47, "Mick Semb Wever"  wrote:

I would personally prefer the community to officially recommend skipping
> 3.11 to users that have not yet upgraded, as 3.0 and 4.0 have each had 
much
> more attention given to them over the past several years.



What has been fixed in 3.0 that hasn't been merged into 3.11 ?

Dropping support for upgrading from 3.0 to 3.11 would be a weird deviation
from the general practice of being able to upgrade from one major version
to the next.



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Supported upgrade path for 4.0

2020-10-09 Thread Benedict Elliott Smith
I would personally prefer the community to officially recommend skipping 3.11 
to users that have not yet upgraded, as 3.0 and 4.0 have each had much more 
attention given to them over the past several years.  This would naturally lead 
to fewer issues filed for 3.0->3.11 and 3.11->4.0, as fewer users take this 
upgrade path.  

Perhaps if others want to explicitly encourage the 3.0->3.11->4.0 upgrade path, 
we can split our resources accordingly?



On 09/10/2020, 07:49, "Mick Semb Wever"  wrote:

Anyone have an opinion here or any formal prior art for us to build on?
>


Maybe this question should be more phrased as to which upgrade paths each
individual has time in helping and fixing users out?

If you are voting for official support for the 3.0 upgrade path then that
should imply you are putting up your hand in helping
provide that official support in the community.  (Whatever officially
supported is deemed to be)

I am only for supporting upgrades from latest 3.11. It makes life a lot
simpler for all of us, and helps focus our community time on CEPs and
otherwise maintaining our supported branches.

At The Last Pickle we have always recommended avoiding 3.0, including
upgrading from 2.2 directly to 3.11.

We (now DataStax) will continue to recommend that folk upgrade to the
latest 3.11 before upgrading to 4.0.

regards,
Mick



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [VOTE] Release dtest-api 0.0.5

2020-09-25 Thread Benedict Elliott Smith
+1

On 25/09/2020, 15:45, "Oleksandr Petrov"  wrote:

Proposing the test build of in-jvm dtest API 0.0.5 for release.

Repository:

https://gitbox.apache.org/repos/asf?p=cassandra-in-jvm-dtest-api.git;a=shortlog;h=refs/tags/0.0.5
Candidate SHA:

https://github.com/apache/cassandra-in-jvm-dtest-api/commit/f900334d2f61f0b10640ba7ae15958f26df72d92
tagged with 0.0.5
Artifact:

https://repository.apache.org/content/repositories/orgapachecassandra-1219/org/apache/cassandra/dtest-api/0.0.5/

Key signature: 9E66CEC6106D578D0B1EB9BFF1000962B7F6840C

Changes since last release:

  * CASSANDRA-16109: If user has not set nodeCount, use the node id
topology size
  * CASSANDRA-16057: Update in-jvm dtest to expose stdout and stderr for
nodetool
  * CASSANDRA-16120: Add ability for jvm-dtest to grep instance logs
  * CASSANDRA-16101: Add method to ignore uncaught throwables
  * CASSANDRA-16109: Collect dc/rack information and validate when building
  * CASSANDRA-15386: Default to 3 datadirs in in-jvm dtests
  * CASSANDRA-16101: Add method to fetch uncaught exceptions

The vote will be open for 24 hours. Everyone who has tested the build is
invited to vote. Votes by PMC members are considered binding. A vote passes
if there are at least three binding +1s.

-- Alex



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Creating a branch for 5.0 …?

2020-09-24 Thread Benedict Elliott Smith
The discussion on the present topic has not concluded, and if we are making an 
exception to 4.0 only then it really needs to.

Members of one organisation have been pushing hard for feature development to 
proceed, arguing it harms unnamed third parties.  A request that these third 
parties be asked to participate in the discussion has so far gone unanswered.  
It is reasonable that this is answered before a vote, since this is the entire 
basis of the argument in favour of branching.

Given this is the basis of argument, I would also propose a less contentious 
vote, should one be undertaken: to create a cassandra-5.0 branch that is open 
only to contributions from those unaffiliated by employment with any existing 
committers.  This seems to alleviate the concerns precipitating this 
discussion, while mitigating the concerns of those who are opposed to it.



On 24/09/2020, 17:02, "Jake Luciani"  wrote:

The vote was to unfreeze new changes at beta, so logically that means
non-bugfix work goes into trunk.

Jordan, thanks.   That is a more recent vote so thanks.  That being said,
under that line Benedict comments this needs to be discussed.
So how about we just have a Vote on branching cassandra-4.0 and the issue
will be decided?

Jake




On Thu, Sep 24, 2020 at 11:53 AM Benedict Elliott Smith 

wrote:

> I'm not sure what you are referring to here, that vote said nothing about
> branching at beta.
>
> The most recent vote on the topic anyway was for the Release Lifecycle
> process, which stipulates branching at GA.
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle
>
> We can vote to modify this document, or to make an exception, but I am
> aware of no other vote stipulating anything about the point at which we
> branch.
>
>
> On 24/09/2020, 16:49, "Jake Luciani"  wrote:
>
> > Today the community still has in force an explicit vote prohibiting
> thee
> merge of this work.
>
> You referred to an explicit vote here.  I assume that was the one you
> were
> referring to?  Yes, the community should decide.
> Call a vote if you think the community thinks we should continue the
> freeze
> vs continuing to rely on beliefs about the community.
>
> I'm simply pointing out the branching of 4.0 post beta was the plan of
    > last
    > record.
>
> Jake
>
> On Thu, Sep 24, 2020 at 11:44 AM Benedict Elliott Smith <
> bened...@apache.org>
> wrote:
>
> > The community does everything through discussion and consensus.
> Does that
> > include branching, or not?
> >
> > If there is no consensus, a vote is held.  Whether or not you
> consider the
> > vote from 2018 still valid, you still need to seek the consent of 
the
> > community for your action today.  Or is that not sacrosanct anymore?
> >
> >
> > On 24/09/2020, 16:22, "Jake Luciani"  wrote:
> >
> > I'm sorry I see no issue with branching 4.0 as it was the thing
> we
> > voted on
> > back in 2018.  If you wish to extend the freeze you should call
> a new
> > vote.
> >
> > On Thu, Sep 24, 2020 at 11:15 AM Benedict Elliott Smith <
> > bened...@apache.org>
> > wrote:
> >
> > > Nobody has any problem with an external repository being
> > maintained.  Just
> > > bear in mind the normal process will need to take place to
> merge to
> > the ASF
> > > repository, and that there may be feedback and review requests
> to
> > address,
> > > so merge order and diffs will probably change.
> > >
> > >
> > > On 24/09/2020, 16:05, "Brandon Williams" 
> wrote:
> > >
> > > On Thu, Sep 24, 2020 at 9:55 AM Benedict Elliott Smith
> > >  wrote:
> > > >
> > > > You do not have the authority to unilaterally overrule
> the
> > community
> > > process.  This is a serious breach of your responsibilities as
> a
> > member of
> > > the PMC.
> > >
> > > Feel free to complain that I'm creating branches we intend
> 

Re: Creating a branch for 5.0 …?

2020-09-24 Thread Benedict Elliott Smith
I'm not sure what you are referring to here, that vote said nothing about 
branching at beta.

The most recent vote on the topic anyway was for the Release Lifecycle process, 
which stipulates branching at GA.

https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle

We can vote to modify this document, or to make an exception, but I am aware of 
no other vote stipulating anything about the point at which we branch.


On 24/09/2020, 16:49, "Jake Luciani"  wrote:

> Today the community still has in force an explicit vote prohibiting thee
merge of this work.

You referred to an explicit vote here.  I assume that was the one you were
referring to?  Yes, the community should decide.
Call a vote if you think the community thinks we should continue the freeze
vs continuing to rely on beliefs about the community.

I'm simply pointing out the branching of 4.0 post beta was the plan of last
record.

Jake

On Thu, Sep 24, 2020 at 11:44 AM Benedict Elliott Smith 

wrote:

> The community does everything through discussion and consensus.  Does that
> include branching, or not?
>
> If there is no consensus, a vote is held.  Whether or not you consider the
> vote from 2018 still valid, you still need to seek the consent of the
> community for your action today.  Or is that not sacrosanct anymore?
>
>
> On 24/09/2020, 16:22, "Jake Luciani"  wrote:
>
> I'm sorry I see no issue with branching 4.0 as it was the thing we
> voted on
> back in 2018.  If you wish to extend the freeze you should call a new
> vote.
    >
> On Thu, Sep 24, 2020 at 11:15 AM Benedict Elliott Smith <
> bened...@apache.org>
> wrote:
>
> > Nobody has any problem with an external repository being
> maintained.  Just
> > bear in mind the normal process will need to take place to merge to
> the ASF
> > repository, and that there may be feedback and review requests to
> address,
> > so merge order and diffs will probably change.
> >
> >
> > On 24/09/2020, 16:05, "Brandon Williams"  wrote:
> >
> > On Thu, Sep 24, 2020 at 9:55 AM Benedict Elliott Smith
> >  wrote:
> > >
> > > You do not have the authority to unilaterally overrule the
> community
> > process.  This is a serious breach of your responsibilities as a
> member of
> > the PMC.
> >
> > Feel free to complain that I'm creating branches we intend to
> someday,
> > perhaps even in 2020, release.
> >
> > > I have deleted this branch, and will do so again if you repeat
> this.
> >
> > This would create some interesting tickets for INFRA, but I 
won't
> > waste their time with you either. Whether either of us has the
> > authority to do such on ASF infrastructure is irrelevant, since
> that
> > is the only thing that can be argued here.  The ASL absolutely
> allows
> > people to innovate on their own with the code, so let's just
> move the
> > bits.
> >
> > Those who wish to innovate,
> > https://github.com/driftx/cassandra/tree/cassandra-5.0 is now
> open for
> > business, PRs accepted. This will be maintained to track trunk
> on the
> > ASF servers.
> >
> > I guess this is the apache way.
> >
> >
>  -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
> >
> >
> > 
-
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>
> --
> http://twitter.com/tjake
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 
http://twitter.com/tjake



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Creating a branch for 5.0 …?

2020-09-24 Thread Benedict Elliott Smith
The community does everything through discussion and consensus.  Does that 
include branching, or not?

If there is no consensus, a vote is held.  Whether or not you consider the vote 
from 2018 still valid, you still need to seek the consent of the community for 
your action today.  Or is that not sacrosanct anymore?


On 24/09/2020, 16:22, "Jake Luciani"  wrote:

I'm sorry I see no issue with branching 4.0 as it was the thing we voted on
back in 2018.  If you wish to extend the freeze you should call a new vote.

On Thu, Sep 24, 2020 at 11:15 AM Benedict Elliott Smith 

wrote:

> Nobody has any problem with an external repository being maintained.  Just
> bear in mind the normal process will need to take place to merge to the 
ASF
> repository, and that there may be feedback and review requests to address,
> so merge order and diffs will probably change.
>
>
> On 24/09/2020, 16:05, "Brandon Williams"  wrote:
>
> On Thu, Sep 24, 2020 at 9:55 AM Benedict Elliott Smith
>  wrote:
> >
> > You do not have the authority to unilaterally overrule the community
> process.  This is a serious breach of your responsibilities as a member of
> the PMC.
>
> Feel free to complain that I'm creating branches we intend to someday,
> perhaps even in 2020, release.
>
> > I have deleted this branch, and will do so again if you repeat this.
>
> This would create some interesting tickets for INFRA, but I won't
> waste their time with you either. Whether either of us has the
> authority to do such on ASF infrastructure is irrelevant, since that
> is the only thing that can be argued here.  The ASL absolutely allows
> people to innovate on their own with the code, so let's just move the
> bits.
>
> Those who wish to innovate,
> https://github.com/driftx/cassandra/tree/cassandra-5.0 is now open for
> business, PRs accepted. This will be maintained to track trunk on the
> ASF servers.
>
> I guess this is the apache way.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 
http://twitter.com/tjake



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Creating a branch for 5.0 …?

2020-09-24 Thread Benedict Elliott Smith
Nobody has any problem with an external repository being maintained.  Just bear 
in mind the normal process will need to take place to merge to the ASF 
repository, and that there may be feedback and review requests to address, so 
merge order and diffs will probably change.


On 24/09/2020, 16:05, "Brandon Williams"  wrote:

On Thu, Sep 24, 2020 at 9:55 AM Benedict Elliott Smith
 wrote:
>
> You do not have the authority to unilaterally overrule the community 
process.  This is a serious breach of your responsibilities as a member of the 
PMC.

Feel free to complain that I'm creating branches we intend to someday,
perhaps even in 2020, release.

> I have deleted this branch, and will do so again if you repeat this.

This would create some interesting tickets for INFRA, but I won't
waste their time with you either. Whether either of us has the
authority to do such on ASF infrastructure is irrelevant, since that
is the only thing that can be argued here.  The ASL absolutely allows
people to innovate on their own with the code, so let's just move the
bits.

Those who wish to innovate,
https://github.com/driftx/cassandra/tree/cassandra-5.0 is now open for
business, PRs accepted. This will be maintained to track trunk on the
ASF servers.

I guess this is the apache way.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Creating a branch for 5.0 …?

2020-09-24 Thread Benedict Elliott Smith
You do not have the authority to unilaterally overrule the community process.  
This is a serious breach of your responsibilities as a member of the PMC.

I have deleted this branch, and will do so again if you repeat this.

As discussed, nobody can police what you work on, but the community does decide 
what is merged.  Today the community still has in force an explicit vote 
prohibiting thee merge of this work.  You must conduct a vote to rescind this 
decision.  

Given the proposers of this policy have failed to respond to the most recent 
query on the topic, that would also seem very problematic to me.



On 24/09/2020, 15:31, "Brandon Williams"  wrote:

It's been a while now for this thread, but it seems to me that it has
been established:

1. This is an opensource project and anyone is free to work on any
part of it that they choose. Nobody has authority over this other than
the contributor.
2. Some people are concerned that allowing innovation (in code) will
make 4.0 take longer to release and cause them the pain of merging up
yet another branch.

So, here's what I've done, in an effort to make a space for both of
these groups to operate: the exact same thing we've done for every
release in the past.  I created a branch for the 4.0 release.

WHAT HAVE YOU DONE?!

Alright, calm down.  You're in group 2, don't worry:  you have your
4.0 branch! You can completely focus on it, and if that's all you want
to do, that's fine! YOU DO NOT NEED TO MERGE TO TRUNK.  Those who wish
to volunteer their time merging from 4.0 to trunk will pick up this
mantle.  If nobody does and I get exhausted, we'll just abort it and
delete the branch, no big deal.  One more time to make this crystal
clear: IF YOU DO NOT WANT TO MERGE FROM 4.0 TO TRUNK, YOU ARE ABSOLVED
FROM THIS DUTY.  The branch is named, unsurprisingly, 'cassandra-4.0'

As for those who wish to begin working on new features, trunk is now
open for business.

The merge order is now 2.1->2.2->3.0->3.11->4.0 and _optionally_,
should you _choose_, trunk.

The show must go on.

On Wed, Sep 16, 2020 at 12:08 PM Dinesh Joshi  wrote:
>
> Maybe you should ask these people to bring their contributions or issues 
directly to the dev list. You don’t need to disclose their names or contact 
information.
>
> Contributing to the project involves engaging the community. We’re still 
open to discussions even if the patches may not land immediately.
>
> If they don’t talk to the dev list and won’t make a case for their 
contribution (assuming it’s a big one) we can’t discuss possible ways forward 
to accept it.
>
> It also seems that these folks are interested in contributing new 
features to Cassandra. When the community is working towards stabilizing 4.0, 
contributing to that effort will help build goodwill. We’re averse to one off, 
drive-by contributions. I am not assuming that’s the case but the fact is that 
we’re discussing 5.0 here.
>
> Dinesh
>
> > On Sep 16, 2020, at 6:00 AM, Joshua McKenzie  
wrote:
> >
> > People aren't lining up waiting to contribute to the project until we
> > accept non-4.0 quality-based contributions. There is a discrete window 
of
> > opportunity where we as a project can make a first impression on folks
> > interested in joining our community, and signals from people, the data 
we
> > have available about contributors, as well as basic logic are all
> > consistent that we are turning away new contributors, likely 
permanently.
> > They're moving on to other projects, since "apparently the Cassandra
> > project isn't interested in new contributions" (interviewees words 2 
weeks
> > ago, not mine). Or same sentiment expressed by multiple major companies
> > looking to find a storage coordination layer to put in front of their
> > storage offerings, for instance.
> >
> > And sorry I can't give you specific names, dates, quotations, and/or
> > contact information Benedict; it seems this rankles you as you continue 
to
> > use terms like "hypothetical" and "mythical" to describe the very real
> > humans I have spoken with over the course of the past year on this 
topic.
> > If my constraints of confidentiality from the people I've interacted 
with
> > are unacceptable for you in a discussion like this and you don't trust 
me
> > enough to know I wouldn't overtly lie to try and shift an Overton 
Window,
> > we should probably go ahead and agree to disagree on this conversation 
and
> > let committers go forward and do what they think best for the project.
> >
> >
&

Re: [DISCUSS] CEP-7 Storage Attached Index

2020-09-23 Thread Benedict Elliott Smith
FWIW, I personally look forward to receiving that contribution when the time is 
right.

On 23/09/2020, 18:45, "Josh McKenzie"  wrote:

talking about that would involve some bits of information DataStax might
not be ready to share?

At the risk of derailing, I've been poking and prodding this week at we
contributors at DS getting our act together w/a draft CEP for donating the
trie-based indices to the ASF project.

More to come; the intention is certainly to contribute that code. The lack
of a destination to merge it into (i.e. no 5.0-dev branch) is removing
significant urgency from the process as well (not to open a 3rd Pandora's
box), but there's certainly an interrelatedness to the conversations going
on.

---
Josh McKenzie


Sent via Superhuman 


On Wed, Sep 23, 2020 at 12:48 PM, Caleb Rackliffe 
wrote:

> As long as we can construct the on-disk indexes efficiently/directly from
> a Memtable-attached index on flush, there's room to try other data
> structures. Most of the innovation in SAI is around the layout of postings
> (something we can expand on if people are interested) and having a
> natively row-oriented design that scales w/ multiple indexed columns on
> single SSTables. There are some broader implications of using the trie 
that
> reach outside SAI itself, but talking about that would involve some bits 
of
> information DataStax might not be ready to share?
>
> On Wed, Sep 23, 2020 at 11:00 AM Jeremiah D Jordan < jeremiah.jordan@
> gmail.com> wrote:
>
> Short question: looking forward, how are we going to maintain three 2i
> implementations: SASI, SAI, and 2i?
>
> I think one of the goals stated in the CEP is for SAI to have parity with
> 2i such that it could eventually replace it.
>
> On Sep 23, 2020, at 10:34 AM, Oleksandr Petrov <
>
> oleksandr.pet...@gmail.com> wrote:
>
> Short question: looking forward, how are we going to maintain three 2i
> implementations: SASI, SAI, and 2i?
>
> Another thing I think this CEP is missing is rationale and motivation
> about why trie-based indexes were chosen over, say, B-Tree. We did have a
> short discussion about this on Slack, but both arguments that I've heard
> (space-saving and keeping a small subset of nodes in memory) work only
>
> for
>
> the most primitive implementation of a B-Tree. Fully-occupied prefix
>
> B-Tree
>
> can have similar properties. There's been a lot of research on B-Trees
>
> and
>
> optimisations in those. Unfortunately, I do not have an implementation
> sitting around for a direct comparison, but I can imagine situations when
> B-Trees may perform better because of simpler
>
> construction.
>
> Maybe we should even consider prototyping a prefix B-Tree to have a more
> fair comparison.
>
> Thank you,
> -- Alex
>
> On Thu, Sep 10, 2020 at 9:12 AM Jasonstack Zhao Yang < jasonstack.zhao@
> gmail.com> wrote:
>
> Thank you Patrick for hosting Cassandra Contributor Meeting for CEP-7
>
> SAI.
>
> The recorded video is available here:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/
> 2020-09-01+Apache+Cassandra+Contributor+Meeting
>
> On Tue, 1 Sep 2020 at 14:34, Jasonstack Zhao Yang < jasonstack.zhao@gmail.
> com>
> wrote:
>
> Thank you, Charles and Patrick
>
> On Tue, 1 Sep 2020 at 04:56, Charles Cao  wrote:
>
> Thank you, Patrick!
>
> On Mon, Aug 31, 2020 at 12:59 PM Patrick McFadin 
> wrote:
>
> I just moved it to 8AM for this meeting to better accommodate APAC.
>
> Please
>
> see the update here:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/
> 2020-08-01+Apache+Cassandra+Contributor+Meeting
>
> Patrick
>
> On Mon, Aug 31, 2020 at 10:04 AM Charles Cao 
>
> wrote:
>
> Patrick,
>
> 11AM PST is a bad time for the people in the APAC timezone. Can we move it
> to 7 or 8AM PST in the morning to accommodate their needs ?
>
> ~Charles
>
> On Fri, Aug 28, 2020 at 4:37 PM Patrick McFadin 
> wrote:
>
> Meeting scheduled.
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/
> 2020-08-01+Apache+Cassandra+Contributor+Meeting
>
> Tuesday September 1st, 11AM PST. I added a basic bullet for the
>
> agenda
>
> but
>
> if there is more, edit away.
>
> Patrick
>
> On Thu, Aug 27, 2020 at 11:31 AM Jasonstack Zhao Yang < jasonstack.zhao@
> gmail.com> wrote:
>
> +1
>
> On Thu, 27 Aug 2020 at 04:52, Ekaterina Dimitrova <
>
> e.dimitr...@gmail.com>
>
> wrote:
>
> +1
>
> On Wed, 26 Aug 2020 at 16:48, 

Re: [DISCUSS] Next steps for Kubernetes operator SIG

2020-09-23 Thread Benedict Elliott Smith
Perhaps it helps to widen the field of discussion to the dev list?

It might help if each of the stakeholder organisations state their view on the 
situation, including why they would or would not support a given 
approach/operator, and what (preferably specific) circumstances might lead them 
to change their mind?

I realise there are meeting logs, but getting a wider discourse with 
non-stakeholder input might help to build a community consensus?  It doesn't 
seem like it can hurt at this point, anyway.


On 23/09/2020, 17:13, "John Sanda"  wrote:

I want to point out that pretty much everything being  discussed in this
thread has been discussed at length during the SIG meetings. I think it is
worth noting because we are pretty much still have the same conversation.

On Wed, Sep 23, 2020 at 12:03 PM Benedict Elliott Smith 

wrote:

> I don't think there's anything about a code drop that's not "The Apache
> Way"
>
> If there's a consensus (or even strong majority) amongst invested parties,
> I don't see why we could not adopt an operator directly into the project.
>
> It's possible a green field approach might lead to fewer hard feelings, as
> everyone is in the same boat. Perhaps all operators are also suboptimal 
and
> could be improved with a rewrite? But I think coordinating a lot of
> different entities around an empty codebase is particularly challenging.  
I
> actually think it could be better for cohesion and collaboration to have a
> suboptimal but substantive starting point.
>
>
> On 23/09/2020, 16:11, "Stefan Miklosovic" <
> stefan.mikloso...@instaclustr.com> wrote:
>
> I think that from Instaclustr it was stated quite clearly multiple
> times that we are "fine to throw it away" if there is something better
> and more wide-spread.Indeed, we have invested a lot of time in the
> operator but it was not useless at all, we gained a lot of quite
> unique knowledge how to put all pieces together. However, I think that
> this space is going to be quite fragmented and "balkanized", which is
> not always a bad thing, but in a quite narrow area as Kubernetes
> operator is, I just do not see how 4 operators are going to be
> beneficial for ordinary people ("official" from community, ours,
> Datastax one and CassKop (without any significant order)). Sure,
> innovation and healthy competition is important but to what extent ...
> One can start a Cassandra cluster on Kubernetes just so many times
> differently and nobody really likes a vendor lock-in. People wanting
> to run a cluster on K8S realise that there are three operators, each
> backed by a private business entity, and the community operator is not
> there ... Huh, interesting ... One may even start to question what is
> wrong with these folks that it takes three companies to build their
> own solution.
>
> Having said that, to my perception, Cassandra community just does not
> have enough engineers nor contributors to keep 4 operators alive at
> the same time (I wish I was wrong) so the idea of selecting the best
> one or to merge obvious things and approaches together is
> understandable, even if it meant we eventually sunset ours. In
> addition, nobody from big players is going to contribute to the code
> base of the other one, for obvious reasons, so channeling and
> directing this effort into something common for a community seems to
> be the only reasonable way of cooperation.
>
> It is quite hard to bootstrap this if the donation of the code in big
> chunks / whole repo is out of question as it is not the "Apache way"
> (there was some thread running here about this in more depth a while
> ago) and we basically need to start from scratch which is quite
> demotivating, we are just inventing the wheel and nobody is up to it.
> It is like people are waiting for that to happen so they can jump in
> "once it is the thing" but it will never materialise or at least the
> hurdle to kick it off is unnecessarily high. Nobody is going to invest
> in this heavily if there is already a working operator from companies
> mentioned above. As I understood it, one reason of not choosing the
> way of donating it all is that "the learning and community building
> should happen in organic manner and we just can not accept the
> donation", but is not it true that it is easier to buil

Re: [DISCUSS] Next steps for Kubernetes operator SIG

2020-09-23 Thread Benedict Elliott Smith
I don't think there's anything about a code drop that's not "The Apache Way"

If there's a consensus (or even strong majority) amongst invested parties, I 
don't see why we could not adopt an operator directly into the project.

It's possible a green field approach might lead to fewer hard feelings, as 
everyone is in the same boat. Perhaps all operators are also suboptimal and 
could be improved with a rewrite? But I think coordinating a lot of different 
entities around an empty codebase is particularly challenging.  I actually 
think it could be better for cohesion and collaboration to have a suboptimal 
but substantive starting point.


On 23/09/2020, 16:11, "Stefan Miklosovic"  
wrote:

I think that from Instaclustr it was stated quite clearly multiple
times that we are "fine to throw it away" if there is something better
and more wide-spread.Indeed, we have invested a lot of time in the
operator but it was not useless at all, we gained a lot of quite
unique knowledge how to put all pieces together. However, I think that
this space is going to be quite fragmented and "balkanized", which is
not always a bad thing, but in a quite narrow area as Kubernetes
operator is, I just do not see how 4 operators are going to be
beneficial for ordinary people ("official" from community, ours,
Datastax one and CassKop (without any significant order)). Sure,
innovation and healthy competition is important but to what extent ...
One can start a Cassandra cluster on Kubernetes just so many times
differently and nobody really likes a vendor lock-in. People wanting
to run a cluster on K8S realise that there are three operators, each
backed by a private business entity, and the community operator is not
there ... Huh, interesting ... One may even start to question what is
wrong with these folks that it takes three companies to build their
own solution.

Having said that, to my perception, Cassandra community just does not
have enough engineers nor contributors to keep 4 operators alive at
the same time (I wish I was wrong) so the idea of selecting the best
one or to merge obvious things and approaches together is
understandable, even if it meant we eventually sunset ours. In
addition, nobody from big players is going to contribute to the code
base of the other one, for obvious reasons, so channeling and
directing this effort into something common for a community seems to
be the only reasonable way of cooperation.

It is quite hard to bootstrap this if the donation of the code in big
chunks / whole repo is out of question as it is not the "Apache way"
(there was some thread running here about this in more depth a while
ago) and we basically need to start from scratch which is quite
demotivating, we are just inventing the wheel and nobody is up to it.
It is like people are waiting for that to happen so they can jump in
"once it is the thing" but it will never materialise or at least the
hurdle to kick it off is unnecessarily high. Nobody is going to invest
in this heavily if there is already a working operator from companies
mentioned above. As I understood it, one reason of not choosing the
way of donating it all is that "the learning and community building
should happen in organic manner and we just can not accept the
donation", but is not it true that it is easier to build a community
around something which is already there rather than trying to build it
around an idea which is quite hard to dedicate to?

On Wed, 23 Sep 2020 at 15:28, Joshua McKenzie  wrote:
>
> > I think there's significant value to the community in trying to coalesce
> on a single approach,
> I agree. Unfortunately in this case, the parties with a vested interest 
and
> written operators came to the table and couldn't agree to coalesce on a
> single approach. John Sanda attempted to start an initiative to write a
> best-of-breed combining choice parts of each operator, but that effort did
> not gain traction.
>
> Which is where my hypothesis comes from that if there were a clear "better
> fit" operator to start from we wouldn't be in a deadlock; the correct
> choice would be obvious. Reasonably so, every engineer that's written
> something is going to want that something to be used and not thrown away 
in
> favor of another something without strong evidence as to why that's the
> better choice.
>
> As far as I know, nobody has made a clear case as to a more compelling
> place to start in terms of an operator donation the project then
> collaborates on. There's no mass adoption evidence nor feature enumeration
> that I know of for any of the appro

Re: [DISCUSS] Next steps for Kubernetes operator SIG

2020-09-23 Thread Benedict Elliott Smith
I think there's significant value to the community in trying to coalesce on a 
single approach, earlier than later.  This is an opportunity to expand the 
number of active organisations involved directly in the Apache Cassandra 
project, as well as to more quickly expand the project's functionality into an 
area we consider urgent and important.  I think it would be a real shame to 
waste this opportunity.  No doubt it will be hard, as organisations have 
certain built-in investments in their own approaches.

I haven't participated in these calls as I do not consider myself to have the 
relevant experience and expertise, and have other focuses on the project.  I 
just wanted to voice a vote in favour of trying to bring the different 
organisations together on a single approach if possible.  Is there anything the 
project can do to help this happen?
 

On 23/09/2020, 03:04, "Ben Bromhead"  wrote:

I think there is certainly an appetite to donate and standardise on a given
operator (as mentioned in this thread).

I personally found the SIG hard to participate in due to time zones and the
synchronous nature of it.

So while it was a great forum to dive into certain details for a subset of
participants and a worthwhile endeavour, I wouldn't paint it as an accurate
reflection of community intent.

I don't think that any participants want to continue down the path of  "let
a thousand flowers bloom". That's why we are looking towards CasKop (as
well as a number of technical reasons).

Some of the recorded meetings and outputs can also be found if you are
interested in some primary sources

https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Kubernetes+Operator+SIG
.

From what I understand second-hand from talking to people on the SIG calls,
> there was a general inability to agree on an existing operator as a
> starting point and not much engagement on taking best of breed from the
> various to combine them. Seems to leave us in the "let a thousand flowers
> bloom" stage of letting operators grow in the ecosystem and seeing which
> ones meet the needs of end users before talking about adopting one into 
the
> foundation.
>
> Great to hear that you folks are joining forces though! Bodes well for C*
> users that are wanting to run things on k8s.
>
>
>
> On Tue, Sep 22, 2020 at 4:26 AM, Ben Bromhead  
wrote:
>
> > For what it's worth, a quick update from me:
> >
> > CassKop now has at least two organisations working on it substantially
> > (Orange and Instaclustr) as well as the numerous other contributors.
> >
> > Internally we will also start pointing others towards CasKop once a few
> > things get merged. While we are not yet sunsetting our operator yet, it
> is
> > certainly looking that way.
> >
> > I'd love to see the community adopt it as a starting point for working
> > towards whatever level of functionality is desired.
> >
> > Cheers
> >
> > Ben
> >
> > On Fri, Sep 11, 2020 at 2:37 PM John Sanda  wrote:
> >
> > On Thu, Sep 10, 2020 at 5:27 PM Josh McKenzie 
> > wrote:
> >
> > There's basically 1 java driver in the C* ecosystem. We have 3? 4? or
> >
> > more
> >
> > operators in the ecosystem. Has one of them hit a clear supermajority of
> > adoption that makes it the de facto default and makes sense to pull it
> >
> > into
> >
> > the project?
> >
> > We as a project community were pretty slow to move on building a PoV
> >
> > around
> >
> > kubernetes so we find ourselves in a situation with a bunch of 
contenders
> > for inclusion in the project. It's not clear to me what heuristics we'd
> >
> > use
> >
> > to gauge which one would be the best fit for inclusion outside letting
> > community adoption speak.
> >
> > ---
> > Josh McKenzie
> >
> > We actually talked a good bit on the SIG call earlier today about
> > heuristics. We need to document what functionality an operator should
> > include at level 0, level 1, etc. We did discuss this a good bit during
> > some of the initial SIG meetings, but I guess it wasn't really a focal
> > point at the time. I think we should also provide references to existing
> > operator projects and possibly other related projects. This would 
benefit
> > both community users as well as people working on these projects.
> >
> > - John
> >
> > --
> >
> > Ben Bromhead
> >
> > Instaclustr | www.instaclustr.com | @instaclustr
> >  | (650) 284 9692
> >
>


-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
 | (650) 284 9692



-

Re: [VOTE] Accept the Harry donation

2020-09-16 Thread Benedict Elliott Smith
+1

On 16/09/2020, 10:45, "Mick Semb Wever"  wrote:

This vote is about officially accepting the Harry donation from Alex Petrov
and Benedict Elliott Smith, that was worked on in CASSANDRA-15348.

The Incubator IP Clearance has been filled out at
http://incubator.apache.org/ip-clearance/apache-cassandra-harry.html

This vote is a required part of the IP Clearance process. It follows the
same voting rules as releases, i.e. from the PMC a minimum of three +1s and
no -1s.

Please cast your votes:
   [ ] +1 Accept the contribution into Cassandra
   [ ] -1 Do not



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Creating a branch for 5.0 …?

2020-09-16 Thread Benedict Elliott Smith
> I know.  I recognise that is a frustrating aspect of this discussion.  It
> is something hard to move on.

So how about we wait until there's a concrete example we can discuss as a 
community?  If we don't have one, it doesn't seem pressing.


On 16/09/2020, 08:23, "Mick Semb Wever"  wrote:

> Can you provide some concrete examples of your own?



On a tangent, I really appreciate the work done in the post-mortem analysis
of the 3.0 storage rewrite and just how long that took to find and fix bugs
it caused.  The more of that we do the better our QA process will become
and the more we will feel justified/safe in raising concerns about large
patches coming in at the wrong time/place.



> Ironically, this entire proposal so far rests on hypothetical lost
> contributions by hypothetical companies and individuals.
>


I know.  I recognise that is a frustrating aspect of this discussion.  It
is something hard to move on.



> I would also like to take issue with a talking point running through much
> of this discussion, that those who are focused on quality assurance have
> "different priorities" to those who now want to ship features into 5.0: we
> also want to ship features, we're just doing the work the project agreed
> upon as a prerequisite to that.
>


Yes, we have to keep bringing this back to the context that this is an
exception we would be making for specific new contributors we recognise we
would otherwise lose.

An analogy I see here is how the open source work is done out in the open
but sometimes with new contributors we may make the exception to mentor
them through a patch or two in private to give them a safe space to build
confidence before meeting community rules and precedence.

I'm hoping that the community transcends the "QA vs New Features"
dichotomy, e.g. with good CI/CD.  I think this is now the project's biggest
potential with how the PMC is now spread.  That said, AFAIK we are still
waiting on testing/QA requirements/clarifications for 4.0-rc.  The best
opportunity we have for QA/CI improvements that will be foundational post
4.0 is now.



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Creating a branch for 5.0 …?

2020-09-15 Thread Benedict Elliott Smith
> But I would suggest that we are more productive when
> raising and discussing concrete examples and specific patches

You make a good point.  Can you provide some concrete examples of your own? 
Ironically, this entire proposal so far rests on hypothetical lost 
contributions by hypothetical companies and individuals.

I would also like to take issue with a talking point running through much of 
this discussion, that those who are focused on quality assurance have 
"different priorities" to those who now want to ship features into 5.0: we also 
want to ship features, we're just doing the work the project agreed upon as a 
prerequisite to that.


On 15/09/2020, 22:00, "Mick Semb Wever"  wrote:

We know we are turning away more and more contributions and new potential
> dev community with our 4.0 feature freeze, and it has been going on for a
> while now.
>
> I would like to suggest we create a cassandra-5.0 branch where we can
> start to queue up all reviewed and ready-to-go post-4.0 commits.
>




I am going to take a stab at closing the loop on this thread.

So far no one has indicated any desire to maintain a cassandra-5.0 branch.
While people have expressed concerns about what it would mean for the
release date and quality of 4.0-rc. As a community we don't have an answer
to these concerns. But I would suggest that we are more productive when
raising and discussing concrete examples and specific patches where-ever we
see a potential impact, like we have done with the messaging system
rewrite, those bugs that slipped 4.0-alpha, and the byte array backed cells
rewrite.

Since a number of people have asked off-list for more detail and
clarification on how the cassandra-5.0 branch would work in a way that
doesn't require community voting/approval, and incase anyone does step up
to take it on, the following is a more detailed writeup to the workflow i
was thinking…

1. Patches are reviewed by two Committers on tickets that are marked `4.x`.
   a. These patches are not relevant for any current versions (2.2, 3.0,
3.11, 4.0)
   b. If these patches require a CEP, then they must have first passed the
CEP.
   c. These are patches from new contributors that we would otherwise lose.
   d. Reviewers are not retreating from 4.0-rc efforts.

2. When successfully reviewed, the single commit that makes the patch is
committed to the cassandra-5.0 branch.
3. The ticket is transitioned to "Ready to Commit", and a comment added
that the patch now resides in the cassandra-5.0 branch.

4. At regular intervals, the cassandra-5.0 maintainers rebase (and rerere)
the branch off trunk.
   a. ci-cassandra.a.o runs CI on the cassandra-5.0

5. When 4.0 is branched and the feature freeze is announced over, an email
to the dev ML is sent that the patches parked in the cassandra-5.0 will
soon be committed.
   a. There needs to be a balance here between appreciating late-reviewers
who were busy Doing The Right Thing being given a chance to provide
feedback, and that two trusted committers have already signed off on the
patch.

6. The cassandra-5.0 branch is fast-forward merged into trunk (minus any
commits that have had reviews re-opened on them).



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Creating a branch for 5.0 …?

2020-09-12 Thread Benedict Elliott Smith
   > Also, fwiw, it seems unlikely this conversation will be any different
>
> than
>
> the one we had on the same topic 11 weeks ago:
> https://lists.apache.org/thread.html/
> raf3592f2297abfb120563d216eeea26bfb3a6e048b246492815954ff%40%3Cdev.
> cassandra.apache.org%3E
> .
>
> Jordan
>
> On Fri, Sep 11, 2020 at 5:44 AM Benedict Elliott Smith  org> wrote:
>
> if we do these contributions in secret
>
> Are you aware of any work happening (or expected to happen) in this way?
> This seems a very different problem than the one the thread was opened
> with.
>
> it will be even harder for folk to put in late reviews
>
> It is always harder to revert and revisit committed work, than to review
> work that has not been merged. So the flood gates you expect to open will
> still flood those people working on 4.0, only worse. There is also no
>
> such
>
> thing as a "late review" in this context; the review happens, at whatever
> pace is necessary, as agreed recently by the community. If an
>
> organisation
>
> drops several huge patches, progress will quite reasonably be slow. The
> best way to mitigate this would be to invest more of those secret
>
> resources
>
> into shipping 4.0, so the project can be on an even keel.
>
> On 11/09/2020, 13:06, "Mick Semb Wever"  wrote:
>
> For significant new feature work, the option of working in a public,
>
> long-running, trunk-based feature branch is available. If we look at
>
> a
>
> specific example like CEP-7/SAI, I’m not sure how it would benefit
>
> much
>
> from a 5.0 branch, at least until it fundamentally depended on other
> 5.0-targeted work.
>
> Caleb, I'm seeing an important value to the branch (given there's no
> inter-dependencies between patches) is the CI builds on the cassandra-5.0
> branch, and the efforts of rebasing centralised from many feature
>
> branches
>
> to one preview branch.
>
> Raising the CEP process is interesting. Anything significant enough to
> warrant a CEP still has to go through that process (which has limited
> throughput atm) and I can't imagine anything that size making it to the
> cassandra-5.0 before we got to 4.0-rc (which is hopefully in a few
>
> months).
>
> But we are sending the clear signal that we are no longer shutting out
> these contributions.
>
> Maybe the effort should be done in the area of getting more people on
>
> board technically so they can start to review things themselves
>
> (which
>
> indeed takes a lot of time and patience) instead of creating a new branch
> so they can pile up their stuff there.
>
> Stefan, the cassandra-5.0 is not a substitute for reviews. Good habits in
> preparation for reviews: like rebasing your feature branch, having CI
> results ready to view; and the review process itself remains exactly the
> same, and will take the same time as before.
>
> You do have strong review preparation habits in place. I can see that the
> CI builds (not just a selection of tests but the whole complete pipeline)
> being part of the value you are taking advantage of here. We want to
> re-apply that value also to the cassandra-5.0 branch with its patches
>
> that
>
> are post-review yet, not yet merged to trunk. That CI would help smoke
>
> out
>
> the combination (sequence) of reviewed patches all put together, and
> easing
> the burden of the re-review of those patches, before they land in trunk.
>
> Again… if the feature freeze is now a quickly shortening window, it's
> going
> to be very limited to what might make it into such a branch, so mostly
> about sending the signal that this final hurdle can be worked around if
>
> it
>
> means we retain any such significant new contributions.
>
> Work conducted without the engagement of the community can also expect to
>
> be heavily revised when the community finally engages with it, as
>
> signalled
>
> with the CEP process.
>
> Benedict, good point and it loops into what Caleb touches on. The CEP
> intends to bring out community involvement earlier in the development
> cycle, to avoid the late revisions. And under the feature freeze the CEP
> process is an obvious bottleneck and I don't think we can get around
>
> that.
>
   

Re: Creating a branch for 5.0 …?

2020-09-11 Thread Benedict Elliott Smith
> if we do these contributions in secret

Are you aware of any work happening (or expected to happen) in this way?  This 
seems a very different problem than the one the thread was opened with.

> it will be even harder for folk to put in late reviews

It is always harder to revert and revisit committed work, than to review work 
that has not been merged.  So the flood gates you expect to open will still 
flood those people working on 4.0, only worse. There is also no such thing as a 
"late review" in this context; the review happens, at whatever pace is 
necessary, as agreed recently by the community.  If an organisation drops 
several huge patches, progress will quite reasonably be slow.  The best way to 
mitigate this would be to invest more of those secret resources into shipping 
4.0, so the project can be on an even keel.



On 11/09/2020, 13:06, "Mick Semb Wever"  wrote:

For significant new feature work, the option of working in a public,
> long-running, trunk-based feature branch is available. If we look at a
> specific example like CEP-7/SAI, I’m not sure how it would benefit much
> from a 5.0 branch, at least until it fundamentally depended on other
> 5.0-targeted work.
>


Caleb, I'm seeing an important value to the branch (given there's no
inter-dependencies between patches) is the CI builds on the cassandra-5.0
branch, and the efforts of rebasing centralised from many feature branches
to one preview branch.

Raising the CEP process is interesting. Anything significant enough to
warrant a CEP still has to go through that process (which has limited
throughput atm) and I can't imagine anything that size making it to the
cassandra-5.0 before we got to 4.0-rc (which is hopefully in a few months).
But we are sending the clear signal that we are no longer shutting out
these contributions.


Maybe the effort should be done in the area of getting more people on
> board technically so they can start to review things themselves (which
> indeed takes a lot of time and patience) instead of creating a new
> branch so they can pile up their stuff there.



Stefan, the cassandra-5.0 is not a substitute for reviews. Good habits in
preparation for reviews: like rebasing your feature branch, having CI
results ready to view; and the review process itself remains exactly the
same, and will take the same time as before.

You do have strong review preparation habits in place. I can see that the
CI builds (not just a selection of tests but the whole complete pipeline)
being part of the value you are taking advantage of here.  We want to
re-apply that value also to the cassandra-5.0 branch with its patches that
are post-review yet, not yet merged to trunk. That CI would help smoke out
the combination (sequence) of reviewed patches all put together, and easing
the burden of the re-review of those patches,  before they land in trunk.

Again… if the feature freeze is now a quickly shortening window, it's going
to be very limited to what might make it into such a branch, so mostly
about sending the signal that this final hurdle can be worked around if it
means we retain any such significant new contributions.


Work conducted without the engagement of the community can also expect to
> be heavily revised when the community finally engages with it, as 
signalled
> with the CEP process.



Benedict, good point and it loops into what Caleb touches on. The CEP
intends to bring out community involvement earlier in the development
cycle, to avoid the late revisions. And under the feature freeze the CEP
process is an obvious bottleneck and I don't think we can get around that.

As far as dev involvement goes, it doesn't stop just because something is
merged to trunk, commits in trunk can also be re-reviewed and then
reverted, but that's something we want to avoid.  So yes, ofc there will be
those that want to have their say on things sitting in the 5.0 branch that
have otherwise met reviewer requirements, at the same time (as long as the
branch remains limited in its scope) this does lengthen out the dev cycle
for these contributions providing more patience and soak time for all. I
would expect that the maintainers of the branch extend the opportunity for
late reviewing to those that were doing The Right Thing focusing all their
time on getting 4.0 out, before those commits go into trunk. Opposed to
this, if we do these contributions in secret to avoid these types of
discussions, only raising them once the feature-freeze is lifted, there may
be a flood-gates rush and it will be even harder for folk to put in late
reviews. I would certainly rather see exceptions made and things done in
public (even if in a fork), though the main concern we are hearing is folk
simply walking away altogether.

I 

Re: Creating a branch for 5.0 …?

2020-09-11 Thread Benedict Elliott Smith
As I said before

> The more significant cost to the project is distracting contributors focused 
> on 4.0

This a conversation we keep coming back to, so I will highlight this phrase for 
future repetition: Work does not happen in a vacuum. The whole community bears 
a cost when new work is integrated.

I am personally not interested in further breaking the backs of those 
individuals who have been working to exhaustion for some time to fix historical 
issues, so that we can accommodate some mythical organisation that has never 
yet contributed, and only won't because they don't care to participate in the 
project's goals.  If they really do want to get involved, they can either wait 
until the project is ready or get involved in shipping 4.0.


On 11/09/2020, 10:53, "Benjamin Lerer"  wrote:

>
> People might be itching to get out, particularly those who are unlikely to
> be harmed, but most agree to stay put for the benefit of the community.
>

The freeze has been there for quite a while now and as far as I can see the
goal of all those working on C* right now is to have 4.0 out. Will there
really be a move of resources knowing that the new work will never be
released if 4.0 is not?

On one side we have the fear to delay 4.0 on the other the fear to lose
people who would want to contribute but are not interested in contributing
to testing.

I trust that there are people or companies interested in contributing but
not in testing. Amazon if I recall correctly mentioned that they wanted to
contribute and are probably not so much interested in testing 4.0. I
imagine that it might be the same for some other vendors.
As a developer looking to contribute to an open source project, I would
probably avoid projects that have been frozen for more than a year.

Allowing people with diverse goals to get involved can only help the
project in my opinion. As once you have been involved you will want your
contribution to be released.

As far as I am concerned a new branch will not change my main goal which is
to have 4.0 out of the door.


On Fri, Sep 11, 2020 at 11:03 AM Benedict Elliott Smith 

wrote:

> This is a social enterprise, and we are all able to enter into a social
> contract/convention.  This doesn't prevent someone from breaking the
> convention, or not agreeing to it, of course, but this entails social
> costs.  This is exactly how the feature-freeze has worked until now,
> curtailing development - not just merging.
>
> Work conducted without the engagement of the community can also expect to
> be heavily revised when the community finally engages with it, as 
signalled
> with the CEP process.
>
> I personally do not condone a total relaxation of the freeze, even to a
> volunteer-maintained repository.  We can perhaps think of the freeze like 
a
> pandemic lockdown: if we relax before we have the correct measures in
> place, much of the good work will be undone.  People might be itching to
> get out, particularly those who are unlikely to be harmed, but most agree
> to stay put for the benefit of the community.  However, the community 
might
> together agree to a partial-relaxation if it can be done safely.
>
>
>
>
> On 11/09/2020, 04:09, "Jeff Jirsa"  wrote:
> > On Sep 10, 2020, at 2:42 PM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
> >
> > 
> >>
> >> As I understand Sankalp's primary (and quite reasonable) argument
> the last time we discussed this
> >
> > The more significant cost to the project is distracting contributors
> focused on 4.0.  The project is bandwidth constrained right now.  Feature
> development doesn't happen in a vacuum, and some of that bandwidth will
> have to go to participating in any new feature development.  So, if 
feature
> development begins in earnest, the 4.0 ship date will slip - by how much,
> who knows?
> >
> > Of course, the new features will also get less attention than they
> should.  So it's a lose-lose in that respect.
> >
> > I think if we are to consider this, any ticket or project for 5.0
> should be subject to a consensus vote before work begins.  Work that a
> contributor - focused on the more urgent and less rewarding job of 
shipping
> 4.0 - would participate in can be deferred.  Uncontentious work, or work
> where all relevant contributors are free to participate, can make 
progress.
>
> I have no opinion on branching, but I think we all know it’s not
 

Re: Creating a branch for 5.0 …?

2020-09-11 Thread Benedict Elliott Smith
This is a social enterprise, and we are all able to enter into a social 
contract/convention.  This doesn't prevent someone from breaking the 
convention, or not agreeing to it, of course, but this entails social costs.  
This is exactly how the feature-freeze has worked until now, curtailing 
development - not just merging.

Work conducted without the engagement of the community can also expect to be 
heavily revised when the community finally engages with it, as signalled with 
the CEP process.

I personally do not condone a total relaxation of the freeze, even to a 
volunteer-maintained repository.  We can perhaps think of the freeze like a 
pandemic lockdown: if we relax before we have the correct measures in place, 
much of the good work will be undone.  People might be itching to get out, 
particularly those who are unlikely to be harmed, but most agree to stay put 
for the benefit of the community.  However, the community might together agree 
to a partial-relaxation if it can be done safely.




On 11/09/2020, 04:09, "Jeff Jirsa"  wrote:
> On Sep 10, 2020, at 2:42 PM, Benedict Elliott Smith  
wrote:
> 
> 
>> 
>> As I understand Sankalp's primary (and quite reasonable) argument the 
last time we discussed this
> 
> The more significant cost to the project is distracting contributors 
focused on 4.0.  The project is bandwidth constrained right now.  Feature 
development doesn't happen in a vacuum, and some of that bandwidth will have to 
go to participating in any new feature development.  So, if feature development 
begins in earnest, the 4.0 ship date will slip - by how much, who knows?
> 
> Of course, the new features will also get less attention than they 
should.  So it's a lose-lose in that respect.
> 
> I think if we are to consider this, any ticket or project for 5.0 should 
be subject to a consensus vote before work begins.  Work that a contributor - 
focused on the more urgent and less rewarding job of shipping 4.0 - would 
participate in can be deferred.  Uncontentious work, or work where all relevant 
contributors are free to participate, can make progress.

I have no opinion on branching, but I think we all know it’s not reasonable 
to say what people can and can’t work on in any open source project. PMC 
members and committers get an opinion on what goes in the repo, but not what 
gets worked on or reviewed by other committers. 
-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Creating a branch for 5.0 …?

2020-09-10 Thread Benedict Elliott Smith
> As I understand Sankalp's primary (and quite reasonable) argument the last 
> time we discussed this

The more significant cost to the project is distracting contributors focused on 
4.0.  The project is bandwidth constrained right now.  Feature development 
doesn't happen in a vacuum, and some of that bandwidth will have to go to 
participating in any new feature development.  So, if feature development 
begins in earnest, the 4.0 ship date will slip - by how much, who knows?

Of course, the new features will also get less attention than they should.  So 
it's a lose-lose in that respect.

I think if we are to consider this, any ticket or project for 5.0 should be 
subject to a consensus vote before work begins.  Work that a contributor - 
focused on the more urgent and less rewarding job of shipping 4.0 - would 
participate in can be deferred.  Uncontentious work, or work where all relevant 
contributors are free to participate, can make progress.


On 10/09/2020, 22:10, "Joshua McKenzie"  wrote:

I can offer my anecdata: I know of two major enterprises as well as have
had two interviewees unsolicited bring up to me that they have walked away
from or bounced off the project due to the feature freeze / branching
strategy. I may be the anomaly given the volume of people in the ecosystem
I interact with, but I assume it's more than just the ones I've seen.

Mick, correct me if I'm wrong but the merge path for a bugfix to 2.1 that
applies up to 4.0, as an example, would look like:

2.1 → 3.0 → 3.11 → trunk

With no extra work required and an accumulation of a backlog of things on
trunk not on the cassandra-5.0 branch.

If someone else were working on something in the cassandra-5.0 branch,
they'd need to rebase/rerere the 5.0 branch on trunk before making whatever
changes. In effect this would move the burden from merge resolution to
whomever was choosing to work on the cassandra-5.0 branch instead of
maintainers working on 4.0.

> Is there such a backlog of tickets that have been reviewed and not going
into 4.0.0?
Chicken or egg debate right? Nobody is going to review code for post 4.0
right now if there's nowhere to put it since it'll just atrophy and need
constant rebasing and thus re-review. Or it ends up in a fork somewhere.

As I understand Sankalp's primary (and quite reasonable) argument the last
time we discussed this, the concern was the extra work needed to merge
forward for people working on 4.0. A cassandra-5.0 branch in-tree where the
burden of maintenance would fall on people using the branch seems to
mitigate that concern.

Also, when 4.0 GA'ed wouldn't we just trunk become a 4.0 branch and then
cassandra-5.0 become trunk?




On Thu, Sep 10, 2020 at 4:32 PM, Benedict Elliott Smith  wrote:

> We know we are turning away more and more contributions
>
> Do we? I haven't been aware of much of this occurring at all.
>
> On 10/09/2020, 20:58, "Mick Semb Wever"  wrote:
>
> We know we are turning away more and more contributions and new potential
> dev community with our 4.0 feature freeze, and it has been going on for a
> while now.
>
> I would like to suggest we create a cassandra-5.0 branch where we can
> start to queue up all reviewed and ready-to-go post-4.0 commits.
>
> This is not to distract from getting 4.0 out, where our primary focus is,
> but as a stop-gap in losing those contributions. The effort of the
> cassandra-5.0 branch maintenance: rebasing (git rerere); is just upon 
those
> that wish to take it on, and the branch can be located in whatever GH fork
> those
> individuals wish to keep it in. Tickets that have been reviewed and are
> (aside from the feature-freeze) ready to be committed, can be committed to
> the `cassandra-5.0` branch while their tickets remain in "Ready to Commit"
> status. The goal of this effort would be, a) we are giving the signal to
> contributors to get involved again (even while our primary focus in on
> stabilisation and testing efforts), and b) maintaining CI status on the
> sequence of commits that are ready to go into trunk post 4.0-rc.
>
> My questions are…
> - who would be willing to help maintain this cassandra-5.0 branch?
> - should it be kept external in a GH fork? Or would you rather have the
> branch in our main git repository?
>
> regards,
> Mick
>
> - To
> unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional
> commands, e-mail: dev-h...@cassandra.apache.org
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Creating a branch for 5.0 …?

2020-09-10 Thread Benedict Elliott Smith
> We know we are turning away more and more contributions

Do we? I haven't been aware of much of this occurring at all. 

On 10/09/2020, 20:58, "Mick Semb Wever"  wrote:

We know we are turning away more and more contributions and new potential
dev community with our 4.0 feature freeze, and it has been going on for a
while now.

I would like to suggest we create a cassandra-5.0 branch where we can start
to queue up all reviewed and ready-to-go post-4.0 commits.

This is not to distract from getting 4.0 out, where our primary focus is,
but as a stop-gap in losing those contributions. The effort of the
cassandra-5.0 branch maintenance: rebasing (git rerere); is just upon those
that wish to take it on, and the branch can be located in whatever GH
fork those
individuals wish to keep it in. Tickets that have been reviewed and are
(aside from the feature-freeze) ready to be committed, can be committed to
the `cassandra-5.0` branch while their tickets remain in "Ready to Commit"
status. The goal of this effort would be, a) we are giving the signal to
contributors to get involved again (even while our primary focus in on
stabilisation and testing efforts), and b) maintaining CI status on the
sequence of commits that are ready to go into trunk post 4.0-rc.

My questions are…
 - who would be willing to help maintain this cassandra-5.0 branch?
 - should it be kept external in a GH fork? Or would you rather have the
branch in our main git repository?

regards,
Mick



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Change style guide to recommend use of @Override

2020-09-01 Thread Benedict Elliott Smith
+1

On 01/09/2020, 20:09, "Caleb Rackliffe"  wrote:

+1

On Tue, Sep 1, 2020, 2:00 PM Jasonstack Zhao Yang 

wrote:

> +1
>
> On Wed, 2 Sep 2020 at 02:45, Dinesh Joshi  wrote:
>
> > +1
> >
> > > On Sep 1, 2020, at 11:27 AM, David Capwell  wrote:
> > >
> > > Currently our style guide recommends to avoid using @Override and
> updates
> > > intellij's code style to exclude it by default; I would like to 
propose
> > we
> > > change this recommendation to use it and to update intellij's style to
> > > include it by default.
> > >
> > > @Override is used by javac to enforce that a method is in fact
> overriding
> > > from an abstract class or an interface and if this stops being true
> (such
> > > as a refactor happens) then a compiler error is thrown; when we 
default
> > to
> > > excluding, it makes it harder to detect that a refactor catches all
> > > implementations and can lead to subtle and hard to track down bugs.
> > >
> > > This proposal is for new code and would not be to go rewrite all code
> at
> > > once, but would recommend new code adopt this style, and to pull old
> code
> > > forward which is related to changes being made (similar to our stance
> on
> > > imports).
> > >
> > > If people are ok with this, I will file a JIRA, update the docs, and
> > > update intellij's formatting.
> > >
> > > Thanks for your time!
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] CEP-7 Storage Attached Index

2020-08-18 Thread Benedict Elliott Smith
> SAI will follow the same QA/Testing guideline as in CASSANDRA-15536.

CASSANDRA-15536 might set some good examples for retrospectively shoring up our 
quality assurance, but offers no prescriptions for how we approach the testing 
of new work.  I think the project needs to conclude the discussions that keep 
being started around the "definition of done" before determining what 
sufficient quality assurance looks like for this feature.

I've briefly set out some of my views in an earlier email chain that was 
initiated by Josh, that unfortunately received no response.  The project is 
generally very busy right now as we approach 4.0 release, which is partially I 
assume why there has been no movement.  Assuming no further activity from 
others, as we get closer to 4.0 (and I have more time) I will try to produce a 
more formal proposal for quality assurance for the project, to be debated and 
agreed.



On 18/08/2020, 12:02, "Jasonstack Zhao Yang"  wrote:

Mick thanks for your questions.

> During the 4.0 beta phase this was intended to be addressed, i.e.>
defining more specific QA guidelines for 4.0-rc. This would be an important
> step towards QA guidelines for all changes and CEPs post-4.0.

Agreed, I think CASSANDRA-15536
 (4.0 Quality:
Components and Test Plans) has set a good example of QA/Testing.

>  - How will this be tested, how will its QA status and lifecycle be>
defined? (per above)

SAI will follow the same QA/Testing guideline as in CASSANDRA-15536.

>  - With existing C* code needing to be changed, what is the proposed
plan> for making those changes ensuring maintained QA, e.g. is there
separate QA
> cycles planned for altering the SPI before adding a new SPI
implementation?

The plan is to have interface changes and their new implementations to be
reviewed/tested/merged at once to reduce overhead.

But if having interface changes reviewed/tested/merged separately helps
quality, I don't think anyone will object.

> - Despite being out of scope, it would be nice to have some idea from
the>  CEP author of when users might still choose afresh 2i or SASI over SAI

I'd like SAI to be the only index for users, but this is a decision to be
made by the community.

> - Who fills the roles involved?

Contributors that are still active on C* or related projects:

Andres de la Peña
Caleb Rackliffe
Dan LaRocque
Jason Rutherglen
Mike Adamson
Rocco Varela
Zhao Yang

I will shepherd.

Anyone that is interested in C* index, feel free to join us at slack
#cassandra-sai.

> - Is there a preference to use gdoc instead of the project's wiki, and>
why? (the CEP process suggest a wiki page, and feedback on why another
> approach is considered better helps evolve the CEP process itself)

Didn't notice wiki is required. Will port CEP to wiki.


On Tue, 18 Aug 2020 at 17:39, Mick Semb Wever  wrote:

> >
> > We are looking forward to the community's feedback and suggestions.
> >
>
>
> What comes immediately to mind is testing requirements. It has been
> mentioned already that the project's testability and QA guidelines are
> inadequate to successfully introduce new features and refactorings to the
> codebase. During the 4.0 beta phase this was intended to be addressed, 
i.e.
> defining more specific QA guidelines for 4.0-rc. This would be an 
important
> step towards QA guidelines for all changes and CEPs post-4.0.
>
> Questions from me
>  - How will this be tested, how will its QA status and lifecycle be
> defined? (per above)
>  - With existing C* code needing to be changed, what is the proposed plan
> for making those changes ensuring maintained QA, e.g. is there separate QA
> cycles planned for altering the SPI before adding a new SPI 
implementation?
>  - Despite being out of scope, it would be nice to have some idea from the
> CEP author of when users might still choose afresh 2i or SASI over SAI,
>  - Who fills the roles involved? Who are the contributors in this DataStax
> team? Who is the shepherd? Are there other stakeholders willing to be
> involved?
>  - Is there a preference to use gdoc instead of the project's wiki, and
> why? (the CEP process suggest a wiki page, and feedback on why another
> approach is considered better helps evolve the CEP process itself)
>
> cheers,
> Mick
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [Vote] Remove Windows support from 4.0+

2020-08-10 Thread Benedict Elliott Smith
I think it would seem a bit more legitimate to remove support without a normal 
deprecation cycle if we've reached out with an explicit request for donation of 
support, making it explicit that we will be removing support if none is 
forthcoming.  But just my 2c; like most people on this list, even discussing 
Windows support barely registers on my priority list.

On 10/08/2020, 17:53, "Jordan West"  wrote:

It wasn't directly regarding removing support but we did reach out to
cassandra-users@ for testing 4.0 on Windows and got no response:
https://www.mail-archive.com/user@cassandra.apache.org/msg60234.html

Jordan


On Mon, Aug 10, 2020 at 4:16 AM Benedict Elliott Smith 
wrote:

> Have we considered first asking the user list if there's anyone willing to
> donate resources to maintain compatibility?
>
> I know I have in the (distant) past handled Jira filed by (production)
> Windows users.  I don’t know how prevalent they are, but perhaps we should
> offer them a chance to step up before cutting them off?  I understand
> nobody presently involved has the resources or inclination to maintain
> them, but if the effort is low it is not infeasible that somebody else
> might.
>
> On 10/08/2020, 12:11, "Aleksey Yeshchenko" 
> wrote:
>
> +1
>
> > On 10 Aug 2020, at 04:14, Yuki Morishita  wrote:
> >
> > As per the discussion(*), I propose to remove Windows support from
> 4.0
> > release and onward.
> >
> > Windows scripts are not maintained and we lack windows test
> > environments. WIndows users can  use docker or cloud environments to
> > set up Cassandra application development.
> >
> > If the vote pass, I will create the following tickets to officially
> > remove Windows support from 4.0:
> >
> > - Remove Windows scripts and add notice to NEWS.txt
> > - Update "Getting Started" documents for Windows users (to direct
> them
> > to use docker or cloud)
> >
> > Regards,
> > Yuki
> >
> > --
> > *:
> 
https://mail-archives.apache.org/mod_mbox/cassandra-dev/202007.mbox/%3CCAGM0Up_3GoPucCP-U18L1akzBXS1eJoKbui997%3DajcCfKJQdng%40mail.gmail.com%3E
> >
> > 
-
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



  1   2   3   4   >