Re: discuss: add to_human_size function

2024-04-18 Thread Ariel Weisberg
Hi,

I think it’s a good quality of life improvement, but I am someone who believes 
in a rich set of built-in functions being a good thing.

A format function is a bit more scope and kind of orthogonal. It would still be 
good to have shorthand functions for things like size.

Ariel

On Tue, Apr 9, 2024, at 8:09 AM, Štefan Miklošovič wrote:
> Hi,
> 
> I want to propose CASSANDRA-19546. It would be possible to convert raw 
> numbers to something human-friendly. 
> There are cases when we write just a number of bytes in our system tables but 
> these numbers are just hard to parse visually. Users can indeed use this for 
> their tables too if they find it useful.
> 
> Also, a user can indeed write a UDF for this but I would prefer if we had 
> something baked in.
> 
> Does this make sense to people? Are there any other approaches to do this? 
> 
> https://issues.apache.org/jira/browse/CASSANDRA-19546
> https://github.com/apache/cassandra/pull/3239/files
> 
> Regards


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-18 Thread Ariel Weisberg
Hi,

If there is a faster/better way to replace a node why not  have Cassandra 
support that natively without the sidecar so people who aren’t running the 
sidecar can benefit?

Copying files over a network shouldn’t be slow in C* and it would also already 
have all the connectivity issues solved.

Regards,
Ariel

On Fri, Apr 5, 2024, at 6:46 AM, Venkata Hari Krishna Nukala wrote:
> Hi all,
> 
> I have filed CEP-40 [1] for live migrating Cassandra instances using the 
> Cassandra Sidecar.
> 
> When someone needs to move all or a portion of the Cassandra nodes belonging 
> to a cluster to different hosts, the traditional approach of Cassandra node 
> replacement can be time-consuming due to repairs and the bootstrapping of new 
> nodes. Depending on the volume of the storage service load, replacements 
> (repair + bootstrap) may take anywhere from a few hours to days.
> 
> Proposing a Sidecar based solution to address these challenges. This solution 
> proposes transferring data from the old host (source) to the new host 
> (destination) and then bringing up the Cassandra process at the destination, 
> to enable fast instance migration. This approach would help to minimise node 
> downtime, as it is based on a Sidecar solution for data transfer and avoids 
> repairs and bootstrap.
> 
> Looking forward to the discussions.
> 
> [1] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
> 
> Thanks!
> Hari


Re: Harry in-tree (Forked from "Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?")

2024-01-02 Thread Ariel Weisberg
拾

Thanks for your work on this. Excited to have an easier way to write tests that 
leverage schema and data that also covers more.

Ariel
On Sat, Dec 23, 2023, at 9:17 AM, Alex Petrov wrote:
> Thanks everyone, Harry is now in tree! Of course, that's just a small 
> milestone, hope it'll prove as useful as I expect it to be.
> 
> https://github.com/apache/cassandra/commit/439d1b122af334bf68c159b82ef4e4879c210bd5
> 
> Happy holidays!
> --Alex
> 
> On Sat, Dec 23, 2023, at 11:10 AM, Mick Semb Wever wrote:
>>
>>   
>>> I strongly believe that bringing Harry in-tree will help to lower the 
>>> barrier for fuzz test and simplify co-development of Cassandra and Harry. 
>>> Previously, it has been rather difficult to debug edge cases because I had 
>>> to either re-compile an in-jvm dtest jar and bring it to Harry, or 
>>> re-compile a Harry jar and bring it to Cassandra, which is both tedious and 
>>> time consuming. Moreover, I believe we have missed at very least one RT 
>>> regression [2] because Harry was not in-tree, as its tests would've caught 
>>> the issue even with the model that existed.
>>> 
>>> For other recently found issues, I think having Harry in-tree would have 
>>> substantially lowered a turnaround time, and allowed me to share repros 
>>> with developers of corresponding features much quicker.
>> 
>> 
>> Agree, looking forward to getting to know and writing Harry tests.  Thank 
>> you Alex, happy holidays :) 
>> 
> 


Re: [DISCUSS] CEP-39: Cost Based Optimizer

2024-01-02 Thread Ariel Weisberg
Hi,

I am burying the lede, but it's important to keep an eye on runtime-adaptive vs 
planning time optimization as the cost/benefits vary greatly between the two 
and runtime adaptive can be a game changer. Basically CBO optimizes for query 
efficiency and startup time at the expense of not handling some queries well 
and runtime adaptive is cheap/free for expensive queries and can handle cases 
that CBO can't.

Generally speaking I am +1 on the introduction of a CBO, since it seems like 
there exists things that would benefit from it materially (and many of the 
associated refactors/cleanup) and it aligns with my north star that includes 
joins.

Do we all have the same north star that Cassandra should eventually support 
joins? Just curious if that is controversial.

I don't feel like this CEP in particular should need to really nail down 
exactly how distributed estimates work since we can start with using local 
estimates as a proxy for the entire cluster and then improve. If someone has 
bandwidth to do a separate CEP for that then sure that would be great, but this 
seems big enough in scope already.

RE testing, continuity of performance of queries is going to be really 
important. I would really like to see that we have a fuzzed the space 
deterministically and via a collection of hand rolled cases, and can compare 
performance between versions to catch queries that regress. Hopefully we can 
agree on a baseline for releasing where we know what prior release to compare 
to and what acceptable changes in performance are.

RE prepared statements - It feels to me like trying to send the plan blob back 
and forth to get more predictable, but not absolutely predictable, plans is not 
worth it? Feels like a lot for an incremental improvement over a baseline that 
doesn't exist yet, IOW it doesn't feel like something for V1. Maybe it ends up 
in YAGNI territory.

The north star of predictable behavior for queries is a *very* important one 
because it means the world to users, but CBO is going to make mistakes all over 
the place. It's simply unachievable even with accurate statistics because it's 
very hard to tell how predicates will behave on a column.

This segues nicely into the importance of adaptive execution :-) It's how you 
rescue the queries that CBO doesn't handle  well for any reason such as bugs, 
bad statistics, missing features. Re-ordering predicate evaluation, switching 
indexes, and re-ordering joins can all be done on the fly.

CBO is really a performance optimization since adaptive approaches will allow 
any query to complete with some wasted resources.

If my pager were waking me up at night and I wanted to stem the bleeding I 
would reach for runtime adaptive over CBO because I know it will catch more 
cases even if it is slower to execute up front.

What is the nature of the queries we are looking solve right now? Are they long 
running heavy hitters, or short queries that explode if run incorrectly, or a 
mix of both?

Ariel

On Tue, Dec 12, 2023, at 8:29 AM, Benjamin Lerer wrote:
> Hi everybody,
> 
> I would like to open the discussion on the introduction of a cost based 
> optimizer to allow Cassandra to pick the best execution plan based on the 
> data distribution.Therefore, improving the overall query performance.
> 
> This CEP should also lay the groundwork for the future addition of features 
> like joins, subqueries, OR/NOT and index ordering.
> 
> The proposal is here: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer
> 
> Thank you in advance for your feedback.


Re: Future direction for the row cache and OHC implementation

2023-12-18 Thread Ariel Weisberg
Hi,

Thanks for the generous offer. Before you do that can you give me a chance to 
add back support for Caffeine for the row cache so you can test the option of 
switching back to an on-heap row cache?

Ariel

On Thu, Dec 14, 2023, at 9:28 PM, Jon Haddad wrote:
> I think we should probably figure out how much value it actually provides by 
> getting some benchmarks around a few use cases along with some profiling.  
> tlp-stress has a --rowcache flag that I added a while back to be able to do 
> this exact test.  I was looking for a use case to profile and write up so 
> this is actually kind of perfect for me.  I can take a look in January when 
> I'm back from the holidays.
> 
> Jon
> 
> On Thu, Dec 14, 2023 at 5:44 PM Mick Semb Wever  wrote:
>>
>>
>> 
>>> I would avoid taking away a feature even if it works in narrow set of 
>>> use-cases. I would instead suggest -
>>> 
>>> 1. Leave it disabled by default.
>>> 2. Detect when Row Cache has a low hit rate and warn the operator to turn 
>>> it off. Cassandra should ideally detect this and do it automatically.
>>> 3. Move to Caffeine instead of OHC.
>>> 
>>> I would suggest having this as the middle ground.
>> 
>> 
>>  
>> Yes, I'm ok with this. (2) can also be a guardrail: soft value when to warn, 
>> hard value when to disable.


Re: Future direction for the row cache and OHC implementation

2023-12-15 Thread Ariel Weisberg
Hi,

I did get one response from Robert indicating that he didn’t want to do the 
work to contribute it.

I offered to do the work and asked for permission to contribute it and no 
response. Followed up later with a ping and also no response.

Ariel

On Fri, Dec 15, 2023, at 9:58 PM, Josh McKenzie wrote:
>> I have reached out to the original maintainer about it and it seems like if 
>> we want to keep using it we will need to start releasing it under a new 
>> package from a different repo.
> 
>> the current maintainer is not interested in donating it to the ASF
> Is that the case Ariel or could you just not reach Robert?
> 
> On Fri, Dec 15, 2023, at 11:55 AM, Jeremiah Jordan wrote:
>>> from a maintenance and
>>> integration testing perspective I think it would be better to keep the
>>> ohc in-tree, so we will be aware of any issues immediately after the
>>> full CI run.
>> 
>> From the original email bringing OHC in tree is not an option because the 
>> current maintainer is not interested in donating it to the ASF.  Thus the 
>> option 1 of some set of people forking it to their own github org and 
>> maintaining a version outside of the ASF C* project.
>> 
>> -Jeremiah
>> 
>> On Dec 15, 2023 at 5:57:31 AM, Maxim Muzafarov  wrote:
>>> Ariel,
>>> thank you for bringing this topic to the ML.
>>> 
>>> I may be missing something, so correct me if I'm wrong somewhere in
>>> the management of the Cassandra ecosystem.  As I see it, the problem
>>> right now is that if we fork the ohc and put it under its own root,
>>> the use of that row cache is still not well tested (the same as it is
>>> now). I am particularly emphasising the dependency management side, as
>>> any version change/upgrade in Cassandra and, as a result of that
>>> change a new set of libraries in the classpath should be tested
>>> against this integration.
>>> 
>>> So, unless it is being widely used by someone else outside of the
>>> community (which it doesn't seem to be), from a maintenance and
>>> integration testing perspective I think it would be better to keep the
>>> ohc in-tree, so we will be aware of any issues immediately after the
>>> full CI run.
>>> 
>>> I'm also +1 for not deprecating it, even if it is used in narrow
>>> cases, while the cost of maintaining its source code remains quite low
>>> and it brings some benefits.
>>> 
>>> On Fri, 15 Dec 2023 at 05:39, Ariel Weisberg  wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> To add some additional context.
>>>> 
>>>> The row cache is disabled by default and it is already pluggable, but 
>>>> there isn’t a Caffeine implementation present. I think one used to exist 
>>>> and could be resurrected.
>>>> 
>>>> I personally also think that people should be able to scratch their own 
>>>> itch row cache wise so removing it entirely just because it isn’t commonly 
>>>> used isn’t the right move unless the feature is very far out of scope for 
>>>> Cassandra.
>>>> 
>>>> Auto enabling/disabling the cache is a can of worms that could result in 
>>>> performance and reliability inconsistency as the DB enables/disables the 
>>>> cache based on heuristics when you don’t want it to. It being off by 
>>>> default seems good enough to me.
>>>> 
>>>> RE forking, we could create a GitHub org for OHC and then add people to 
>>>> it. There are some examples of dependencies that haven’t been contributed 
>>>> to the project that live outside like CCM and JAMM.
>>>> 
>>>> Ariel
>>>> 
>>>> On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote:
>>>> 
>>>> I would avoid taking away a feature even if it works in narrow set of 
>>>> use-cases. I would instead suggest -
>>>> 
>>>> 1. Leave it disabled by default.
>>>> 2. Detect when Row Cache has a low hit rate and warn the operator to turn 
>>>> it off. Cassandra should ideally detect this and do it automatically.
>>>> 3. Move to Caffeine instead of OHC.
>>>> 
>>>> I would suggest having this as the middle ground.
>>>> 
>>>> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in 
>>>> a later release
>>>> 
>>>> 
>>>> 
>>>> 
>>>> I'm for deprecating and removing it.
>>>> It constantly trips users up and just causes pain.
>>>> 
>>>> Yes it works in some very narrow situations, but those situations often 
>>>> change over time and again just bites the user.  Without the row-cache I 
>>>> believe users would quickly find other, more suitable and lasting, 
>>>> solutions.
>>>> 
>>>> 
> 


Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Ariel Weisberg
Hi,

To add some additional context.

The row cache is disabled by default and it is already pluggable, but there 
isn’t a Caffeine implementation present. I think one used to exist and could be 
resurrected.

I personally also think that people should be able to scratch their own itch 
row cache wise so removing it entirely just because it isn’t commonly used 
isn’t the right move unless the feature is very far out of scope for Cassandra.

Auto enabling/disabling the cache is a can of worms that could result in 
performance and reliability inconsistency as the DB enables/disables the cache 
based on heuristics when you don’t want it to. It being off by default seems 
good enough to me.

RE forking, we could create a GitHub org for OHC and then add people to it. 
There are some examples of dependencies that haven’t been contributed to the 
project that live outside like CCM and JAMM.

Ariel

On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote:
> I would avoid taking away a feature even if it works in narrow set of 
> use-cases. I would instead suggest -
> 
> 1. Leave it disabled by default.
> 2. Detect when Row Cache has a low hit rate and warn the operator to turn it 
> off. Cassandra should ideally detect this and do it automatically.
> 3. Move to Caffeine instead of OHC.
> 
> I would suggest having this as the middle ground.
> 
>> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
>> 
>>   
>>   
>>> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a 
>>> later release
>> 
>> 
>> 
>> I'm for deprecating and removing it.
>> It constantly trips users up and just causes pain.
>> 
>> Yes it works in some very narrow situations, but those situations often 
>> change over time and again just bites the user.  Without the row-cache I 
>> believe users would quickly find other, more suitable and lasting, solutions.


Future direction for the row cache and OHC implementation

2023-12-14 Thread Ariel Weisberg
Hi,

Now seems like a good time to discuss the future direction of the row cache and 
its only implementation OHC (https://github.com/snazy/ohc).

OHC is currently unmaintained and we don’t have the ability to release maven 
artifacts for it or commit to the original repo. I have reached out to the 
original maintainer about it and it seems like if we want to keep using it we 
will need to start releasing it under a new package from a different repo.

I see four directions we could pursue.

1. Fork OHC and start publishing under a new package name and continue to use it
2. Replace OHC with a different cache implementation like Caffeine which would 
move it on heap
3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a 
later release
4. Do work to make a row cache not necessary and deprecate it later (or maybe 
now)

I would like to find out what people know about row cache usage in the wild so 
we can use that to inform the future direction as well as the general thinking 
about what we should do with it going forward.

Thanks,
Ariel


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Ariel Weisberg
Hi,

Support for multiple storage backends including remote storage backends is a 
pretty high value piece of functionality. I am happy to see there is interest 
in that.

I think that `ChannelProxyFactory` as an integration point is going to quickly 
turn into a dead end as we get into really using multiple storage backends. We 
need to be able to list files and really the full range of filesystem 
interactions that Java supports should work with any backend to make 
development, testing, and using existing code straightforward.

It's a little more work to get C* to creates paths for alternate backends where 
appropriate, but that works is probably necessary even with 
`ChanelProxyFactory` and munging UNIX paths (vs supporting multiple 
Fileystems). There will probably also be backend specific behaviors that show 
up above the `ChannelProxy` layer that will depend on the backend.

Ideally there would be some config to specify several backend filesystems and 
their individual configuration that can be used, as well as configuration and 
support for a "backend file router" for file creation (and opening) that can be 
used to route files to the backend most appropriate.

Regards,
Ariel

On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of 
> the standard storage space.  
> 
> There are two desires  driving this change:
>  1. The ability to temporarily move some keyspaces/tables to storage outside 
> the normal directory tree to other disk so that compaction can occur in 
> situations where there is not enough disk space for compaction and the 
> processing to the moved data can not be suspended.
>  2. The ability to store infrequently used data on slower cheaper storage 
> layers.
> I have a working POC implementation [2] though there are some issues still to 
> be solved and much logging to be reduced.
> 
> I look forward to productive discussions,
> Claude
> 
> [1] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory 
> 
> 


Re: [DISCUSS] Lift MessagingService.minimum_version to 40 in trunk

2023-03-21 Thread Ariel Weisberg
Hi,

I am pretty strongly in favor just to keep the amount of code kept around for 
serialization/deserialization and caching serialized sizes for different 
versions under control.

5.0 will have changes necessitating using another version so it will be adding 
to the clutter.

Ariel

On Mon, Mar 13, 2023, at 9:05 AM, Mick Semb Wever wrote:
> If we do not recommend and do not test direct upgrades from 3.x to
> 5.x, we have the opportunity to clean up a fair chunk of code by
> making `MessagingService.minimum_version=40`
>
> As Cassandra versions 4.x and  5.0 are all on
> `MessagingService.current_version=40` this would mean lifting
> MessagingService.minimum_version would make it equal to the
> current_version.
>
> Today already we don't allow mixed-version streaming.  The only
> argument I can see for keeping minimum_version=30 is for supporting
> non-streaming messages between 3.x and 5.0 nodes, which I can't find a
> basis for.
>
> An _example_ of the code that can be cleaned up is in the patch
> attached to the ticket:
> CASSANDRA-18314 – Lift MessagingService.minimum_version to 40
>
> What do you think?


Re: Welcome Patrick McFadin as Cassandra Committer

2023-02-09 Thread Ariel Weisberg
Welcome Patrick! Thank you for your years of contributions to the community.

On Thu, Feb 2, 2023, at 12:58 PM, Benjamin Lerer wrote:
> The PMC members are pleased to announce that Patrick McFadin has accepted
> the invitation to become committer today.
> 
> Thanks a lot, Patrick, for everything you have done for this project and its 
> community through the years.
> 
> Congratulations and welcome!
> 
> The Apache Cassandra PMC members


Re: [VOTE] CEP-21 Transactional Cluster Metadata

2023-02-06 Thread Ariel Weisberg
+1

On Mon, Feb 6, 2023, at 11:15 AM, Sam Tunnicliffe wrote:
> Hi everyone,
> 
> I would like to start a vote on this CEP.
> 
> Proposal:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata
> 
> Discussion:
> https://lists.apache.org/thread/h25skwkbdztz9hj2pxtgh39rnjfzckk7
> 
> The vote will be open for 72 hours.
> A vote passes if there are at least three binding +1s and no binding vetoes.
> 
> Thanks,
> Sam


Re: CASSANDRA-14482

2019-02-15 Thread Ariel Weisberg
Hi,

I am +1 since it's an additional compressor and not the default.

Ariel

On Fri, Feb 15, 2019, at 11:41 AM, Dinesh Joshi wrote:
> Hey folks,
> 
> Just wanted to get a pulse on whether we can proceed with ZStd support. 
> The consensus on the ticket was that it’s a very valuable addition 
> without any risk of destabilizing 4.0. It’s ready to go if there aren’t 
> any objections.
> 
> Dinesh
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: cqlsh tests and Python 3

2019-02-11 Thread Ariel Weisberg
Hi,

Do you mean Python 2/3 compatibility? 

This has been discussed earlier and I think that being compatible with both is 
an easier sell.

Ariel

> On Feb 11, 2019, at 1:24 PM, dinesh.jo...@yahoo.com.INVALID 
>  wrote:
> 
> Hey all,
> We've gotten the cqlsh tests running in the Cassandra repo (these are 
> distinct from the cqlsh tests in dtests repo). They're in Python 2.7 and 
> using the nosetests. We'd like to make them consistent with the rest of the 
> tests which means moving them to Python 3 & Pytest framework. However this 
> would involve migrating cqlsh to Python 3. Does anybody have any concerns if 
> we move cqlsh to Python 3? Please note that Python 2 is EOL'd and will be 
> unsupported in about 10 months.
> So here are the options -
> 1. Leave cqlsh in Python 2.7 & nosetests. Just make sure they're running as 
> part of the build process.2. Move cqlsh to Python 3 & pytests.3. Leave cqlsh 
> in Python 2.7 but move to Pytests. This option doesn't really add much value 
> though.
> Thanks,
> Dinesh


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [VOTE] Release Apache Cassandra 3.11.4

2019-02-07 Thread Ariel Weisberg
Hi,

Vinay, thank you for diagnosing this.

+1 on the release then since this a test bug and 13004 has already extensively 
litigated the UX of this.

Ariel

On Thu, Feb 7, 2019, at 3:49 AM, Vinay Chella wrote:
> Hi Ariel,
> 
> test_simple_bootstrap_mixed_versions issue is related to CASSANDRA-13004
> <https://issues.apache.org/jira/browse/CASSANDRA-13004>, which introduced
> "cassandra.force_3_0_protocol_version" for schema migrations during
> upgrades from 3.0.14 upwards. This flag is missing in
> `test_simple_bootstrap_mixed_versions` upgrade test while we are
> adding/bootstrapping 3.11.4 node to an existing 3.5 version of C* node.
> This resulted in `ks` keyspace schema/data not being bootstrapped to the
> new node.
> 
> I debugged and confirmed that MigrationManager::is30Compatible
> <https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/MigrationManager.java#L181-L185>
> is returning false which is forcing 
> MigrationManager::shouldPullSchemaFrom
> <https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/MigrationManager.java#L168-L177>
> to return false as well.
> 
> *From debug logs:*
> DEBUG [GossipStage:1] 2019-02-06 23:20:47,392 MigrationManager.java:115 -
> Not pulling schema because versions match or shouldPullSchemaFrom returned
> false
> 
> Here is the updated dtest branch:
> https://github.com/vinaykumarchella/cassandra-dtest/tree/fix_failing_upgradetest
> 
> dtests on CircleCI: https://circleci.com/gh/vinaykumarchella/cassandra/345
> 
> P.S: While MigrationManager
> <https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/MigrationManager.java#L181-L185>
> confirms that schema migrations from pre 3.11 are not allowed without
> `cassandra.force_3_0_protocol_version` option, release notes for 3.11 
> are
> confusing - docs
> <https://github.com/apache/cassandra/blob/cassandra-3.11/NEWS.txt#L173-L174>
> 
> Let me know if this looks good to you, I will send a patch to
> cassandra-dtest
> 
> 
> 
> Thanks,
> Vinay Chella
> 
> 
> On Wed, Feb 6, 2019 at 8:07 PM Vinay Chella  wrote:
> 
> > Hi Ariel,
> >
> > Sure, I am volunteering to debug this. Will update the progress here.
> >
> > Thanks,
> > Vinay
> >
> >
> > On Wed, Feb 6, 2019 at 1:41 PM Ariel Weisberg  wrote:
> >
> >> Hi,
> >>
> >> It fails consistently. I don't know why the data is not evenly
> >> distributed. Can someone volunteer to debug this failing test to make sure
> >> there isn't an issue with bootstrap in 3.11?
> >>
> >> https://circleci.com/gh/aweisberg/cassandra/2593
> >>
> >> Thanks,
> >> Ariel
> >> On Wed, Feb 6, 2019, at 3:11 PM, Ariel Weisberg wrote:
> >> > Hi,
> >> >
> >> > -0
> >> >
> >> > bootstrap_upgrade_test.py test_simple_bootstrap_mixed_versions fails
> >> > because it doesn't see the expected on disk size within 30% of the
> >> > expected value. It's bootstrapping a new version node and runs cleanup
> >> > on the existing node. If the data were evenly distributed the on disk
> >> > size should be similar.
> >> >
> >> > https://circleci.com/gh/aweisberg/cassandra/2591#tests/containers/40
> >> >
> >> > I don't have time to see if this reproduces manually. I kicked off the
> >> > tests again to see if reproduces.
> >> > https://circleci.com/gh/aweisberg/cassandra/2593
> >> >
> >> > Ariel
> >> >
> >> > On Wed, Feb 6, 2019, at 5:02 AM, Marcus Eriksson wrote:
> >> > > +1
> >> > >
> >> > > Den ons 6 feb. 2019 kl 10:52 skrev Benedict Elliott Smith <
> >> > > bened...@apache.org>:
> >> > >
> >> > > > +1
> >> > > >
> >> > > > > On 6 Feb 2019, at 08:01, Tommy Stendahl <
> >> tommy.stend...@ericsson.com>
> >> > > > wrote:
> >> > > > >
> >> > > > > +1 (non-binding)
> >> > > > >
> >> > > > > /Tommy
> >> > > > >
> >> > > > > On lör, 2019-02-02 at 18:31 -0600, Michael Shuler wrote:
> >> > > > >
> >> > > > > I propose the following artifacts for release as 3.11.4.
> >> > > > >
> >> > > > > sha1: fd47391aae13bcf4ee995abcde1b0e180372d193
> >> > > > > Git:
> >>

Re: [VOTE] Release Apache Cassandra 3.11.4

2019-02-06 Thread Ariel Weisberg
Hi,

It fails consistently. I don't know why the data is not evenly distributed. Can 
someone volunteer to debug this failing test to make sure there isn't an issue 
with bootstrap in 3.11? 

https://circleci.com/gh/aweisberg/cassandra/2593

Thanks,
Ariel
On Wed, Feb 6, 2019, at 3:11 PM, Ariel Weisberg wrote:
> Hi,
> 
> -0
> 
> bootstrap_upgrade_test.py test_simple_bootstrap_mixed_versions fails 
> because it doesn't see the expected on disk size within 30% of the 
> expected value. It's bootstrapping a new version node and runs cleanup 
> on the existing node. If the data were evenly distributed the on disk 
> size should be similar.
> 
> https://circleci.com/gh/aweisberg/cassandra/2591#tests/containers/40
> 
> I don't have time to see if this reproduces manually. I kicked off the 
> tests again to see if reproduces. 
> https://circleci.com/gh/aweisberg/cassandra/2593
> 
> Ariel
> 
> On Wed, Feb 6, 2019, at 5:02 AM, Marcus Eriksson wrote:
> > +1
> > 
> > Den ons 6 feb. 2019 kl 10:52 skrev Benedict Elliott Smith <
> > bened...@apache.org>:
> > 
> > > +1
> > >
> > > > On 6 Feb 2019, at 08:01, Tommy Stendahl 
> > > wrote:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > /Tommy
> > > >
> > > > On lör, 2019-02-02 at 18:31 -0600, Michael Shuler wrote:
> > > >
> > > > I propose the following artifacts for release as 3.11.4.
> > > >
> > > > sha1: fd47391aae13bcf4ee995abcde1b0e180372d193
> > > > Git:
> > > >
> > > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.11.4-tentative
> > > > Artifacts:
> > > >
> > > https://repository.apache.org/content/repositories/orgapachecassandra-1170/org/apache/cassandra/apache-cassandra/3.11.4/
> > > > Staging repository:
> > > >
> > > https://repository.apache.org/content/repositories/orgapachecassandra-1170/
> > > >
> > > > The Debian and RPM packages are available here:
> > > > http://people.apache.org/~mshuler
> > > >
> > > > The vote will be open for 72 hours (longer if needed).
> > > >
> > > > [1]: CHANGES.txt:
> > > >
> > > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.11.4-tentative
> > > > [2]: NEWS.txt:
> > > >
> > > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.11.4-tentative
> > > >
> > > >
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [VOTE] Release Apache Cassandra 3.11.4

2019-02-06 Thread Ariel Weisberg
Hi,

-0

bootstrap_upgrade_test.py test_simple_bootstrap_mixed_versions fails because it 
doesn't see the expected on disk size within 30% of the expected value. It's 
bootstrapping a new version node and runs cleanup on the existing node. If the 
data were evenly distributed the on disk size should be similar.

https://circleci.com/gh/aweisberg/cassandra/2591#tests/containers/40

I don't have time to see if this reproduces manually. I kicked off the tests 
again to see if reproduces. https://circleci.com/gh/aweisberg/cassandra/2593

Ariel

On Wed, Feb 6, 2019, at 5:02 AM, Marcus Eriksson wrote:
> +1
> 
> Den ons 6 feb. 2019 kl 10:52 skrev Benedict Elliott Smith <
> bened...@apache.org>:
> 
> > +1
> >
> > > On 6 Feb 2019, at 08:01, Tommy Stendahl 
> > wrote:
> > >
> > > +1 (non-binding)
> > >
> > > /Tommy
> > >
> > > On lör, 2019-02-02 at 18:31 -0600, Michael Shuler wrote:
> > >
> > > I propose the following artifacts for release as 3.11.4.
> > >
> > > sha1: fd47391aae13bcf4ee995abcde1b0e180372d193
> > > Git:
> > >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.11.4-tentative
> > > Artifacts:
> > >
> > https://repository.apache.org/content/repositories/orgapachecassandra-1170/org/apache/cassandra/apache-cassandra/3.11.4/
> > > Staging repository:
> > >
> > https://repository.apache.org/content/repositories/orgapachecassandra-1170/
> > >
> > > The Debian and RPM packages are available here:
> > > http://people.apache.org/~mshuler
> > >
> > > The vote will be open for 72 hours (longer if needed).
> > >
> > > [1]: CHANGES.txt:
> > >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.11.4-tentative
> > > [2]: NEWS.txt:
> > >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.11.4-tentative
> > >
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [VOTE] Release Apache Cassandra 3.0.18

2019-02-06 Thread Ariel Weisberg
Hi,

+1

Upgrade test look OK. There are failures I also found against 2.2 plus some 
other test bugs. Nothing that looks like a product bug.
https://circleci.com/gh/aweisberg/cassandra/2589

Ariel

On Wed, Feb 6, 2019, at 5:02 AM, Marcus Eriksson wrote:
> +1
> 
> Den ons 6 feb. 2019 kl 10:53 skrev Benedict Elliott Smith <
> bened...@apache.org>:
> 
> > +1
> >
> > > On 6 Feb 2019, at 08:01, Tommy Stendahl 
> > wrote:
> > >
> > > +1 (non-binding)
> > >
> > > /Tommy
> > >
> > > On lör, 2019-02-02 at 18:32 -0600, Michael Shuler wrote:
> > >
> > > I propose the following artifacts for release as 3.0.18.
> > >
> > > sha1: edd52cef50a6242609a20d0d84c8eb74c580035e
> > > Git:
> > >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.18-tentative
> > > Artifacts:
> > >
> > https://repository.apache.org/content/repositories/orgapachecassandra-1171/org/apache/cassandra/apache-cassandra/3.0.18/
> > > Staging repository:
> > >
> > https://repository.apache.org/content/repositories/orgapachecassandra-1171/
> > >
> > > The Debian and RPM packages are available here:
> > > http://people.apache.org/~mshuler
> > >
> > > The vote will be open for 72 hours (longer if needed).
> > >
> > > [1]: CHANGES.txt:
> > >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.0.18-tentative
> > > [2]: NEWS.txt:
> > >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.0.18-tentative
> > >
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [VOTE] Release Apache Cassandra 2.2.14

2019-02-06 Thread Ariel Weisberg
Hi,

+1

Upgrade tests:
https://circleci.com/gh/aweisberg/cassandra/2587

Known issue https://issues.apache.org/jira/browse/CASSANDRA-14155 which is not 
a blocker
There is a test failure on a thrift connection not being open. Might be  test 
bug. If it's a product bug it's probably not that serious.

Ariel
On Wed, Feb 6, 2019, at 5:03 AM, Marcus Eriksson wrote:
> +1
> 
> Den ons 6 feb. 2019 kl 10:53 skrev Benedict Elliott Smith <
> bened...@apache.org>:
> 
> > +1
> >
> > > On 6 Feb 2019, at 05:09, Jeff Jirsa  wrote:
> > >
> > > +1
> > >
> > > On Sat, Feb 2, 2019 at 4:32 PM Michael Shuler 
> > > wrote:
> > >
> > >> I propose the following artifacts for release as 2.2.14.
> > >>
> > >> sha1: af91658353ba601fc8cd08627e8d36bac62e936a
> > >> Git:
> > >>
> > >>
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.14-tentative
> > >> Artifacts:
> > >>
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1172/org/apache/cassandra/apache-cassandra/2.2.14/
> > >> Staging repository:
> > >>
> > https://repository.apache.org/content/repositories/orgapachecassandra-1172/
> > >>
> > >> The Debian and RPM packages are available here:
> > >> http://people.apache.org/~mshuler
> > >>
> > >> The vote will be open for 72 hours (longer if needed).
> > >>
> > >> [1]: CHANGES.txt:
> > >>
> > >>
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.2.14-tentative
> > >> [2]: NEWS.txt:
> > >>
> > >>
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.2.14-tentative
> > >>
> > >>
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [VOTE] Release Apache Cassandra 2.2.14

2019-02-05 Thread Ariel Weisberg
Hi,

Can we also run the upgrade tests? We should do that as part of the release 
process. I can do that tomorrow.

Ariel

On Tue, Feb 5, 2019, at 1:11 PM, Joseph Lynch wrote:
> 2.2.14-tentative unit and dtest run:
> https://circleci.com/gh/jolynch/cassandra/tree/2.2.14-tentative
> 
> unit tests: 0 failures
> dtests: 5 failures
> * test_closing_connections - thrift_hsha_test.TestThriftHSHA (
> https://issues.apache.org/jira/browse/CASSANDRA-14595)
> * test_multi_dc_tokens_default - token_generator_test.TestTokenGenerator
> * test_multi_dc_tokens_murmur3 - token_generator_test.TestTokenGenerator
> * test_multi_dc_tokens_random - token_generator_test.TestTokenGenerator
> * test_multiple_repair - repair_tests.incremental_repair_test.TestIncRepair
> (flake?)
> 
> I've cut https://issues.apache.org/jira/browse/CASSANDRA-15012 for fixing
> the TestTokenGenerator tests, it looks straightforward.
> 
> +1 non binding
> 
> -Joey
> 
> On Sat, Feb 2, 2019 at 4:32 PM Michael Shuler 
> wrote:
> 
> > I propose the following artifacts for release as 2.2.14.
> >
> > sha1: af91658353ba601fc8cd08627e8d36bac62e936a
> > Git:
> >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.14-tentative
> > Artifacts:
> >
> > https://repository.apache.org/content/repositories/orgapachecassandra-1172/org/apache/cassandra/apache-cassandra/2.2.14/
> > Staging repository:
> > https://repository.apache.org/content/repositories/orgapachecassandra-1172/
> >
> > The Debian and RPM packages are available here:
> > http://people.apache.org/~mshuler
> >
> > The vote will be open for 72 hours (longer if needed).
> >
> > [1]: CHANGES.txt:
> >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.2.14-tentative
> > [2]: NEWS.txt:
> >
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.2.14-tentative
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Implementing an Abstract Replication Strategy

2019-01-29 Thread Ariel Weisberg
Hi,

Cassandra expects a replication strategy to accept a description of a 
consistent hash ring and then use that description to determine what ranges on 
the consistent hash ring each node replicates.

If you implement the API those operations should all just work. 

I'm not sure what the implicit expectations of rebalancing and add/remove/node 
and so on are. This is despite the fact that I was staring at that code for 6 
months 6 months ago. Most of the code basically looks at a before picture from 
the replication strategy, and an after picture and moves data around until 
those two match.

Depending on the changes your replication strategy makes in responses to 
changes in the ring that code might not ship the data around. There are 
assumptions like when you move a node the data that needs to be streamed and 
fetched can all be done at that one node. You need to make sure that whatever 
state changes occur on ring changes can actually be realized by the 
add/remove/rebalance code. They also need to be done online in a system that is 
continuing to accept reads and writes so things like overlapping group 
memberships need to be taken into account.

It's a hard problem, but easier to talk about once we know what you want the 
replication strategy to do.

Ariel


On Tue, Jan 29, 2019, at 3:52 PM, Seyed Hossein Mortazavi wrote:
> I'm working on changing Cassandra for an academic project where the goal is
> to change the replicas are determined for each partition using static
> parameters that are set outside of Cassandra. I've read online that this
> can be achieved by extending the AbstractReplicationStrategy class. I have
> the following questions
> 
> 1- If we add/remove nodes, and Cassandra goes through the process of
> re-balancing, are functions from my class called?
> 2- For Paxos lightweight transactions, are my functions called?
> 3- Can I run into other problems? If yes, where?
> 
> Thank you very much

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Git Repo Migration

2019-01-04 Thread Ariel Weisberg
+1

On Fri, Jan 4, 2019, at 5:49 AM, Sam Tunnicliffe wrote:
> As per the announcement on 7th December 2018[1], ASF infra are planning 
> to shutdown the service behind git-wip-us.apache.org and migrate all 
> existing repos to gitbox.apache.org 
> 
> There are further details in the original mail, but apparently one of 
> the benefits of the migration is that we'll have full write access via 
> Github, including the ability finally to close PRs. This affects the 
> cassandra, cassandra-dtest and cassandra-build repos (but not the new 
> cassandra-sidecar repo).
> 
> A pre-requisite of the migration is to demonstrate consensus within the 
> community, so to satisfy that formality I'm starting this thread to 
> gather any objections or specific requests regarding the timing of the 
> move.
> 
> I'll collate responses in a week or so and file the necessary INFRA Jira.
> 
> Thanks,
> Sam
> 
> [1] 
> https://lists.apache.org/thread.html/667772efdabf49a0a23d585539c127f335477e033f1f9b6f5079aced@%3Cdev.cassandra.apache.org%3E
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [VOTE] Change Jira Workflow

2018-12-18 Thread Ariel Weisberg
+1

On Mon, Dec 17, 2018, at 10:19 AM, Benedict Elliott Smith wrote:
> I propose these changes 
> *
>  
> to the Jira Workflow for the project.  The vote will be open for 72 
> hours**.
> 
> I am, of course, +1.
> 
> * With the addendum of the mailing list discussion 
> ;
>  
> in case of any conflict arising from a mistake on my part in the wiki, 
> the consensus reached by polling the mailing list will take precedence.
> ** I won’t be around to close the vote, as I will be on vacation.  
> Everyone is welcome to ignore the result until I get back in a couple of 
> weeks, or if anybody is eager feel free to close the vote and take some 
> steps towards implementation.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Revisit the proposal to use github PR

2018-12-13 Thread Ariel Weisberg
Hi,

Sorry I missed that point. I agree github PRs are not useful for merging.

What I do is force push the feature/bug fix branches (which is fine, github 
remembers the old versions in the PR) with everything updated and ready to 
merge, and then push those branches from my local repo to the apache repo with 
--atomic.

Ariel

On Thu, Dec 13, 2018, at 1:00 PM, Jason Brown wrote:
> To clarify my position: Github PRs are great for *reviewing* code, and the
> commentary is much easier to follow imo. But for *merging* code, esp into
> our multi-branch strategy, PRs don't fit well, unless there's some
> technique I and perhaps others are unaware of.
> 
> On Thu, Dec 13, 2018 at 9:47 AM Ariel Weisberg  wrote:
> 
> > Hi,
> >
> > I'm not clear on what github makes worse. It preserves more history then
> > the JIRA approach. When people invitably force push their branches you
> > can't tell from the link to a branch on JIRA. Github preserves the comments
> > and force push history so you know what version of the code each comment
> > applied to. Github also tracks when requests for changes are acknowledged
> > and resolved.  I have had to make the same change request many times and
> > keep track independently whether it was resolved. This has also resulting
> > in mistakes getting merged when I missed a comment that was ignored.
> >
> > Now that github can CC JIRA that also CCs to commits@. It's better then
> > JIRA comments because each comment includes a small diff of the code the
> > comment applies to. To do that in JIRA I have to manually link to the code
> > the PR and most people don't do that for every comment so some of them are
> > inscrutable after the fact. Also manually created links sometimes refer to
> > references that disappear or get force pushed. It's a bit tricky to get
> > right.
> >
> > To me arguing against leveraging a better code review workflow (whether
> > github or some other tool) is like arguing against using source control
> > tools. Sure the filesystem and home grown scripts can be used to work
> > around lack of source control, but why would you?
> >
> > I see two complaints so far. One is that github PRs encourage nitpicking.
> > I don't see tool based solution to that off the cuff. Another is that
> > github doesn't by default CC JIRA. Maybe we can just refuse to accept
> > improperly formatted PRs and look into auto rejecting ones that don't refer
> > to a ticket?
> >
> > Ariel
> >
> > On Thu, Dec 13, 2018, at 12:20 PM, Aleksey Yeschenko wrote:
> > > There are some nice benefits to GH PRs, one of them is that we could
> > > eventually set up CircleCI hooks that would explicitly prevent commits
> > > that don’t pass the tests.
> > >
> > > But handling multiple branches would indeed be annoying. Would have to
> > > either submit 1 PR per branch - which is both tedious and non-atomic -
> > > or do a mixed approach, with a PR for the oldest branch, then a manual
> > > merge upwards. The latter would be kinda meh, especially when commits
> > > for different branches diverge.
> > >
> > > For me personally, the current setup works quite well, and I mostly
> > > share Sylvain’s opinion above, for the same reasons listed.
> > >
> > > —
> > > AY
> > >
> > > > On 13 Dec 2018, at 08:15, Sylvain Lebresne  wrote:
> > > >
> > > > Fwiw, I personally find it very useful to have all discussion, review
> > > > comments included, in the same place (namely JIRA, since for better or
> > > > worse, that's what we use for tracking tickets). Typically, that means
> > > > everything gets consistently pushed to the  commits@ mailing list,
> > which I
> > > > find extremely convenient to keep track of things. I also have a theory
> > > > that the inline-comments type of review github PR give you is very
> > > > convenient for nitpicks, shallow or spur-of-the-moment comments, but
> > > > doesn't help that much for deeper reviews, and that it thus to favor
> > the
> > > > former kind of review.
> > > >
> > > > Additionally, and to Benedict's point, I happen to have first hand
> > > > experience with a PR-based process for a multi-branch workflow very
> > similar
> > > > to the one of this project, and suffice to say that I hate it with a
> > > > passion.
> > > >
> > > > Anyway, very much personal opinion here.
> > > > --
> > > > Sylvain
> > > >
> > > >
> > >

Re: Revisit the proposal to use github PR

2018-12-13 Thread Ariel Weisberg
Hi,

I'm not clear on what github makes worse. It preserves more history then the 
JIRA approach. When people invitably force push their branches you can't tell 
from the link to a branch on JIRA. Github preserves the comments and force push 
history so you know what version of the code each comment applied to. Github 
also tracks when requests for changes are acknowledged and resolved.  I have 
had to make the same change request many times and keep track independently 
whether it was resolved. This has also resulting in mistakes getting merged 
when I missed a comment that was ignored.

Now that github can CC JIRA that also CCs to commits@. It's better then JIRA 
comments because each comment includes a small diff of the code the comment 
applies to. To do that in JIRA I have to manually link to the code the PR and 
most people don't do that for every comment so some of them are inscrutable 
after the fact. Also manually created links sometimes refer to references that 
disappear or get force pushed. It's a bit tricky to get right.

To me arguing against leveraging a better code review workflow (whether github 
or some other tool) is like arguing against using source control tools. Sure 
the filesystem and home grown scripts can be used to work around lack of source 
control, but why would you?

I see two complaints so far. One is that github PRs encourage nitpicking. I 
don't see tool based solution to that off the cuff. Another is that github 
doesn't by default CC JIRA. Maybe we can just refuse to accept improperly 
formatted PRs and look into auto rejecting ones that don't refer to a ticket?

Ariel

On Thu, Dec 13, 2018, at 12:20 PM, Aleksey Yeschenko wrote:
> There are some nice benefits to GH PRs, one of them is that we could 
> eventually set up CircleCI hooks that would explicitly prevent commits 
> that don’t pass the tests.
> 
> But handling multiple branches would indeed be annoying. Would have to 
> either submit 1 PR per branch - which is both tedious and non-atomic - 
> or do a mixed approach, with a PR for the oldest branch, then a manual 
> merge upwards. The latter would be kinda meh, especially when commits 
> for different branches diverge.
> 
> For me personally, the current setup works quite well, and I mostly 
> share Sylvain’s opinion above, for the same reasons listed.
> 
> —
> AY
> 
> > On 13 Dec 2018, at 08:15, Sylvain Lebresne  wrote:
> > 
> > Fwiw, I personally find it very useful to have all discussion, review
> > comments included, in the same place (namely JIRA, since for better or
> > worse, that's what we use for tracking tickets). Typically, that means
> > everything gets consistently pushed to the  commits@ mailing list, which I
> > find extremely convenient to keep track of things. I also have a theory
> > that the inline-comments type of review github PR give you is very
> > convenient for nitpicks, shallow or spur-of-the-moment comments, but
> > doesn't help that much for deeper reviews, and that it thus to favor the
> > former kind of review.
> > 
> > Additionally, and to Benedict's point, I happen to have first hand
> > experience with a PR-based process for a multi-branch workflow very similar
> > to the one of this project, and suffice to say that I hate it with a
> > passion.
> > 
> > Anyway, very much personal opinion here.
> > --
> > Sylvain
> > 
> > 
> > On Thu, Dec 13, 2018 at 2:13 AM dinesh.jo...@yahoo.com.INVALID
> >  wrote:
> > 
> >> I've been already using github PRs for some time now. Once you specify the
> >> ticket number, the comments and discussion are persisted in Apache Jira as
> >> work log so it can be audited if desired. However, committers usually
> >> squash and commit the changes once the PR is approved. We don't use the
> >> merge feature in github. I don't believe github we can merge the commit
> >> into multiple branches through the UI. We would need to merge it into one
> >> branch and then manually merge that commit into other branches. The big
> >> upside of using github PRs is that it makes collaborating a lot easier.
> >> Downside is that it makes it very difficult to follow along the progress in
> >> Apache Jira. The messages that github posts back include huge diffs and are
> >> aweful.
> >> Dinesh
> >> 
> >>On Thursday, December 13, 2018, 1:10:12 AM GMT+5:30, Benedict Elliott
> >> Smith  wrote:
> >> 
> >> Perhaps somebody could summarise the tradeoffs?  I’m a little concerned
> >> about how it would work for our multi-branch workflow.  Would we open
> >> multiple PRs?
> >> 
> >> Could we easily link with external CircleCI?
> >> 
> >> It occurs to me, in JIRA proposal mode, that an extra required field for a
> >> permalink to GitHub for the patch would save a lot of time I spend hunting
> >> for a branch in the comments.
> >> 
> >> 
> >> 
> >> 
> >>> On 12 Dec 2018, at 19:20, jay.zhu...@yahoo.com.INVALID wrote:
> >>> 
> >>> It was discussed 1 year's ago:
> >> https://www.mail-archive.com/dev@cassandra.apache.org/msg11810.html

Re: JIRA Workflow Proposals

2018-12-12 Thread Ariel Weisberg
Hi,

Updating to reflect the new options for 1. 2, 3, and 4 remain unchanged.

1. E, D, C,  B, A

2. B, C, A

3. A

4. -.5

Ariel
On Tue, Dec 11, 2018, at 10:55 AM, Ariel Weisberg wrote:
> Hi,
> 
> Sorry I was just slow on the uptake as to what auto-populate meant RE #2.
> 
> 1. -1, while restricting editing on certain fields or issues that people 
> did not submit themselves is OK I don't think  it's reasonable to block 
> edits to subject, or description on issues a user has submitted. 
> 
> Do we actually have a problem that needs solving with restricting edits? 
> I feel like we aren't being harmed right now by the current power people 
> are wielding?
> 
> 2. B, C, A
> 
> 3. A 
> 
> 4. -.5, I really don't see Wish as something other then a synonym for 
> low priority. Only -.5 because I don't think it's that harmful either.
> 
> Ariel
> 
> On Mon, Dec 10, 2018, at 8:51 PM, Benedict Elliott Smith wrote:
> > On 10 Dec 2018, at 16:21, Ariel Weisberg  wrote:
> > > 
> > > Hi,
> > > 
> > > RE #1, does this mean if you submit a ticket and you are not a 
> > > contributor you can't modify any of the fields including description or 
> > > adding/removing attachments?
> > 
> > Attachment operations have their own permissions, like comments.  
> > Description would be prohibited though.  I don’t see this as a major 
> > problem, really; it is generally much more useful to add comments.  If 
> > we particularly want to make a subset of fields editable there is a 
> > workaround, though I’m not sure anybody would have the patience to 
> > implement it:  
> > https://confluence.atlassian.com/jira/how-can-i-control-the-editing-of-issue-fields-via-workflow-149834.html
> >  
> > <https://confluence.atlassian.com/jira/how-can-i-control-the-editing-of-issue-fields-via-workflow-149834.html>
> > 
> > > RE #2, while bugs don't necessarily have a priority it's helpful to have 
> > > it sort logically with other issue types on that field. Seems like 
> > > ideally what we want to preserve is a useful sort order without having to 
> > > populate the field manually.
> > 
> > Do you have a suggestion that achieves this besides auto-populating (if 
> > that’s even possible)?  More than happy to add suggestions to the list.
> > 
> > > RE #4, Do we need to keep wish at all?
> > 
> > I’m unclear on what you’re asking?  I included exactly this question, 
> > directly in response to your opinion that it should not be kept.  If you 
> > have more to add to your earlier view, please feel free to share it.
> > 
> > > Not voting yet just because I'm not sure on some.
> > > 
> > > Ariel
> > > 
> > > On Mon, Dec 10, 2018, at 7:43 AM, Benedict Elliott Smith wrote:
> > >> New questions.  This is the last round, before I call a proper vote on 
> > >> the modified proposal (so we can take a mandate to Infra to modify our 
> > >> JIRA workflows).  
> > >> 
> > >> Thanks again to everyone following and contributing to this discussion.  
> > >> I’m not sure any of these remaining questions are critical, but for the 
> > >> best democratic outcome it’s probably worth running them through the 
> > >> same process.  I also forgot to include (1) on the prior vote.
> > >> 
> > >> 1. Limit edits to JIRA ‘contributor’ role: +1/-1
> > >> 2. Priority on Bug issue type: (A) remove it; (B) auto-populate it; (C) 
> > >> leave it.  Please rank.
> > >> 3. Top priority: (A) Urgent; (B) Blocker.  See here for my explanation 
> > >> of why I chose Urgent 
> > >> <https://lists.apache.org/thread.html/c7b95b827d8da4efc5c017df80029676a032b150ec00bf11ca9c7fa7@%3Cdev.cassandra.apache.org%3E
> > >>  
> > >> <https://lists.apache.org/thread.html/c7b95b827d8da4efc5c017df80029676a032b150ec00bf11ca9c7fa7@%3Cdev.cassandra.apache.org%3E>>.
> > >> 4. Priority keep ‘Wish’ (to replace issue type): +1/-1
> > >> 
> > >> For 2, if we cannot remove it, we can make it non-editable and default 
> > >> to Normal; for auto-populate I propose using Severity (Low->Low, Normal-
> > >>> Normal, Critical->Urgent).  No guarantees entirely on what we can 
> > >> achieve, so a ranked choice would be ideal.
> > >> 
> > >> I have avoided splitting out another vote on the Platform field, since 
> > >> everyone was largely meh on the question of mandatoriness; it won by 
> > >> only a slim margin, because everyone was 

Re: JIRA Workflow Proposals

2018-12-11 Thread Ariel Weisberg
Hi,

Sorry I was just slow on the uptake as to what auto-populate meant RE #2.

1. -1, while restricting editing on certain fields or issues that people did 
not submit themselves is OK I don't think  it's reasonable to block edits to 
subject, or description on issues a user has submitted. 

Do we actually have a problem that needs solving with restricting edits? I feel 
like we aren't being harmed right now by the current power people are wielding?

2. B, C, A

3. A 

4. -.5, I really don't see Wish as something other then a synonym for low 
priority. Only -.5 because I don't think it's that harmful either.

Ariel

On Mon, Dec 10, 2018, at 8:51 PM, Benedict Elliott Smith wrote:
> On 10 Dec 2018, at 16:21, Ariel Weisberg  wrote:
> > 
> > Hi,
> > 
> > RE #1, does this mean if you submit a ticket and you are not a contributor 
> > you can't modify any of the fields including description or adding/removing 
> > attachments?
> 
> Attachment operations have their own permissions, like comments.  
> Description would be prohibited though.  I don’t see this as a major 
> problem, really; it is generally much more useful to add comments.  If 
> we particularly want to make a subset of fields editable there is a 
> workaround, though I’m not sure anybody would have the patience to 
> implement it:  
> https://confluence.atlassian.com/jira/how-can-i-control-the-editing-of-issue-fields-via-workflow-149834.html
>  
> <https://confluence.atlassian.com/jira/how-can-i-control-the-editing-of-issue-fields-via-workflow-149834.html>
> 
> > RE #2, while bugs don't necessarily have a priority it's helpful to have it 
> > sort logically with other issue types on that field. Seems like ideally 
> > what we want to preserve is a useful sort order without having to populate 
> > the field manually.
> 
> Do you have a suggestion that achieves this besides auto-populating (if 
> that’s even possible)?  More than happy to add suggestions to the list.
> 
> > RE #4, Do we need to keep wish at all?
> 
> I’m unclear on what you’re asking?  I included exactly this question, 
> directly in response to your opinion that it should not be kept.  If you 
> have more to add to your earlier view, please feel free to share it.
> 
> > Not voting yet just because I'm not sure on some.
> > 
> > Ariel
> > 
> > On Mon, Dec 10, 2018, at 7:43 AM, Benedict Elliott Smith wrote:
> >> New questions.  This is the last round, before I call a proper vote on 
> >> the modified proposal (so we can take a mandate to Infra to modify our 
> >> JIRA workflows).  
> >> 
> >> Thanks again to everyone following and contributing to this discussion.  
> >> I’m not sure any of these remaining questions are critical, but for the 
> >> best democratic outcome it’s probably worth running them through the 
> >> same process.  I also forgot to include (1) on the prior vote.
> >> 
> >> 1. Limit edits to JIRA ‘contributor’ role: +1/-1
> >> 2. Priority on Bug issue type: (A) remove it; (B) auto-populate it; (C) 
> >> leave it.  Please rank.
> >> 3. Top priority: (A) Urgent; (B) Blocker.  See here for my explanation 
> >> of why I chose Urgent 
> >> <https://lists.apache.org/thread.html/c7b95b827d8da4efc5c017df80029676a032b150ec00bf11ca9c7fa7@%3Cdev.cassandra.apache.org%3E
> >>  
> >> <https://lists.apache.org/thread.html/c7b95b827d8da4efc5c017df80029676a032b150ec00bf11ca9c7fa7@%3Cdev.cassandra.apache.org%3E>>.
> >> 4. Priority keep ‘Wish’ (to replace issue type): +1/-1
> >> 
> >> For 2, if we cannot remove it, we can make it non-editable and default 
> >> to Normal; for auto-populate I propose using Severity (Low->Low, Normal-
> >>> Normal, Critical->Urgent).  No guarantees entirely on what we can 
> >> achieve, so a ranked choice would be ideal.
> >> 
> >> I have avoided splitting out another vote on the Platform field, since 
> >> everyone was largely meh on the question of mandatoriness; it won by 
> >> only a slim margin, because everyone was +/- 0, and nobody responded to 
> >> back Ariel’s dissenting view.
> >> 
> >> My votes are:
> >> 1: +1
> >> 2: B,C,A
> >> 3: A
> >> 4: +0.5
> >> 
> >> 
> >> For tracking, the new consensus from the prior vote is:
> >> 1: A (+10)
> >> 2: +9 -0.1
> >> 3: +10
> >> 4: +6 -2 (=+4)
> >> 5: +2; a lot of meh.
> >> 6: +9
> >> 
> >> 
> >> 
> >>> On 7 Dec 2018, at 17:52, Ariel Weisberg  wrote:
> >>> 
> >>> Hi,
> >

Re: JIRA Workflow Proposals

2018-12-10 Thread Ariel Weisberg
Hi,

RE #1, does this mean if you submit a ticket and you are not a contributor you 
can't modify any of the fields including description or adding/removing 
attachments?

RE #2, while bugs don't necessarily have a priority it's helpful to have it 
sort logically with other issue types on that field. Seems like ideally what we 
want to preserve is a useful sort order without having to populate the field 
manually.

RE #4, Do we need to keep wish at all?

Not voting yet just because I'm not sure on some.

Ariel

On Mon, Dec 10, 2018, at 7:43 AM, Benedict Elliott Smith wrote:
> New questions.  This is the last round, before I call a proper vote on 
> the modified proposal (so we can take a mandate to Infra to modify our 
> JIRA workflows).  
> 
> Thanks again to everyone following and contributing to this discussion.  
> I’m not sure any of these remaining questions are critical, but for the 
> best democratic outcome it’s probably worth running them through the 
> same process.  I also forgot to include (1) on the prior vote.
> 
> 1. Limit edits to JIRA ‘contributor’ role: +1/-1
> 2. Priority on Bug issue type: (A) remove it; (B) auto-populate it; (C) 
> leave it.  Please rank.
> 3. Top priority: (A) Urgent; (B) Blocker.  See here for my explanation 
> of why I chose Urgent 
> <https://lists.apache.org/thread.html/c7b95b827d8da4efc5c017df80029676a032b150ec00bf11ca9c7fa7@%3Cdev.cassandra.apache.org%3E>.
> 4. Priority keep ‘Wish’ (to replace issue type): +1/-1
> 
> For 2, if we cannot remove it, we can make it non-editable and default 
> to Normal; for auto-populate I propose using Severity (Low->Low, Normal-
> >Normal, Critical->Urgent).  No guarantees entirely on what we can 
> achieve, so a ranked choice would be ideal.
> 
> I have avoided splitting out another vote on the Platform field, since 
> everyone was largely meh on the question of mandatoriness; it won by 
> only a slim margin, because everyone was +/- 0, and nobody responded to 
> back Ariel’s dissenting view.
> 
> My votes are:
> 1: +1
> 2: B,C,A
> 3: A
> 4: +0.5
> 
> 
> For tracking, the new consensus from the prior vote is:
> 1: A (+10)
> 2: +9 -0.1
> 3: +10
> 4: +6 -2 (=+4)
> 5: +2; a lot of meh.
> 6: +9
> 
> 
> 
> > On 7 Dec 2018, at 17:52, Ariel Weisberg  wrote:
> > 
> > Hi,
> > 
> > Late but.
> > 
> > 1. A
> > 2. +1
> > 3. +1
> > 4. -1
> > 5. -0
> > 6. +1
> > 
> > RE 4, I think blocker is an important priority. High and urgent mean the 
> > same thing to me. Wish is fine, but that is too similar to low if you ask 
> > me. My ideal would be low, medium, high, blocker. Medium feels weird, but 
> > it's a real thing, it's not high priority and we really want it done, but 
> > it's not low enough that we might skip it/not get to it anytime soon.
> > 
> > RE 5. I don't think I have ever used the environment field or used the 
> > contents populated in it. Doesn't mean someone else hasn't, but in terms of 
> > making the easy things easy it seems like making it required isn't so high 
> > value? I don't populate it myself usually I put it in the description or 
> > the subject without thinking.
> > 
> > It seems like the purpose of a field is to make it indexable and possibly 
> > structured. How often do we search or require structure on this field?
> > 
> > Ariel
> > 
> > On Tue, Dec 4, 2018, at 2:12 PM, Benedict Elliott Smith wrote:
> >> Ok, so after an initial flurry everyone has lost interest :)
> >> 
> >> I think we should take a quick poll (not a vote), on people’s positions 
> >> on the questions raised so far.  If people could try to take the time to 
> >> stake a +1/-1, or A/B, for each item, that would be really great.  This 
> >> poll will not be the end of discussions, but will (hopefully) at least 
> >> draw a line under the current open questions.
> >> 
> >> I will start with some verbiage, then summarise with options for 
> >> everyone to respond to.  You can scroll to the summary immediately if 
> >> you like.
> >> 
> >> - 1. Component: Multi-select or Cascading-select (i.e. only one 
> >> component possible per ticket, but neater UX)
> >> - 2. Labels: rather than litigate people’s positions, I propose we do 
> >> the least controversial thing, which is to simply leave labels intact, 
> >> and only supplement them with the new schema information.  We can later 
> >> revisit if we decide it’s getting messy.
> >> - 3. "First review completed; second review ongoing": I don’t think we 
> >> need to complicate the

Re: JIRA Workflow Proposals

2018-12-07 Thread Ariel Weisberg
Hi,

I think I managed to not get confused. I evaluatec the two separately. I don't 
like or use environment both in terms of populating the field and searching on 
it. That information could go in the description and be just as useful to me 
personally.

I have no problem with an optional platform field that is an improvement on 
environment in that it is more structured and searchable. My bar for optional 
fields is low. I guess I'm not convinced I want either though? If other people 
find it useful then because they search on it then yes we should do a better 
more structured version.

5 groups feature impact and platform. It's platform I think is less useful? I 
am +1 on feature impacts as we have impact on things like CCM and drivers that 
we need to keep track of and I do forget them at times.

Ariel

On Fri, Dec 7, 2018, at 1:17 PM, Benedict Elliott Smith wrote:
> 
> 
> > On 7 Dec 2018, at 17:52, Ariel Weisberg  wrote:
> > 
> > Hi,
> > 
> > Late but.
> 
> No harm in them continuing to roll in, I’m just cognisant of needing to 
> annoy everyone with a second poll, so no point perpetuating it past a 
> likely unassailable consensus.
> 
> > 
> > 1. A
> > 2. +1
> > 3. +1
> > 4. -1
> > 5. -0
> > 6. +1
> > 
> > RE 4, I think blocker is an important priority. High and urgent mean the 
> > same thing to me. Wish is fine, but that is too similar to low if you ask 
> > me. My ideal would be low, medium, high, blocker. Medium feels weird, but 
> > it's a real thing, it's not high priority and we really want it done, but 
> > it's not low enough that we might skip it/not get to it anytime soon.
> 
> It seems like people have really strong (and divergent) opinions about 
> Priority!  
> 
> So, to begin: I don’t think Medium is any different to Normal, in the 
> proposal?  Except Normal is, well, more accurate I think?  It is the 
> default priority, and should be used unless strong reasons otherwise.
> 
> As for Blocker vs Urgent, I obviously disagree (but not super strongly):  
> Urgent conveys more information IMO.  Blocker only says we cannot 
> release without this.  Urgent also says we must release with this, and 
> ASAP.  The meaning of a priority is anyway distinct from its name, and 
> the meaning of Urgent is described in the proposal to make this clear.  
> But, it’s easy to add a quick poll item for the top priority name.  Any 
> other suggestions, besides Urgent and Blocker?
> 
> Of course, if we remove Priority from the Bug type, I agree with others 
> that the top level priority ceases to mean anything, and there probably 
> shouldn’t be one.
> 
> Wish will already be included in the next poll.
> 
> > RE 5. I don't think I have ever used the environment field or used the 
> > contents populated in it. Doesn't mean someone else hasn't, but in terms of 
> > making the easy things easy it seems like making it required isn't so high 
> > value? I don't populate it myself usually I put it in the description or 
> > the subject without thinking.
> > It seems like the purpose of a field is to make it indexable and possibly 
> > structured. How often do we search or require structure on this field?
> 
> Are you conflating this with Q6?  The environment field was not 
> discussed, only the potential Platform field, which we _hope_ to make a 
> multi-select list.  This would make the information quite useful for 
> reporting and searching.
> 
> Environment is being removed because it is unstructured and poorly used, 
> and it looks like you have voted in favour of this?
> 
> If Platform cannot be made into an editable multi-select list, we will 
> probably not make it mandatory. Here we’re trying to gauge an ideal end 
> state - some things may need revisiting if JIRA does not play ball, 
> though that should not affect many items.
> 
> > 
> > Ariel
> > 
> > On Tue, Dec 4, 2018, at 2:12 PM, Benedict Elliott Smith wrote:
> >> Ok, so after an initial flurry everyone has lost interest :)
> >> 
> >> I think we should take a quick poll (not a vote), on people’s positions 
> >> on the questions raised so far.  If people could try to take the time to 
> >> stake a +1/-1, or A/B, for each item, that would be really great.  This 
> >> poll will not be the end of discussions, but will (hopefully) at least 
> >> draw a line under the current open questions.
> >> 
> >> I will start with some verbiage, then summarise with options for 
> >> everyone to respond to.  You can scroll to the summary immediately if 
> >> you like.
> >> 
> >> - 1. Component: Multi-select or Cascading-select (i.e. only one 
> >

Re: JIRA Workflow Proposals

2018-12-07 Thread Ariel Weisberg
Hi,

Late but.

1. A
2. +1
3. +1
4. -1
5. -0
6. +1

RE 4, I think blocker is an important priority. High and urgent mean the same 
thing to me. Wish is fine, but that is too similar to low if you ask me. My 
ideal would be low, medium, high, blocker. Medium feels weird, but it's a real 
thing, it's not high priority and we really want it done, but it's not low 
enough that we might skip it/not get to it anytime soon.

RE 5. I don't think I have ever used the environment field or used the contents 
populated in it. Doesn't mean someone else hasn't, but in terms of making the 
easy things easy it seems like making it required isn't so high value? I don't 
populate it myself usually I put it in the description or the subject without 
thinking.

It seems like the purpose of a field is to make it indexable and possibly 
structured. How often do we search or require structure on this field?

Ariel

On Tue, Dec 4, 2018, at 2:12 PM, Benedict Elliott Smith wrote:
> Ok, so after an initial flurry everyone has lost interest :)
> 
> I think we should take a quick poll (not a vote), on people’s positions 
> on the questions raised so far.  If people could try to take the time to 
> stake a +1/-1, or A/B, for each item, that would be really great.  This 
> poll will not be the end of discussions, but will (hopefully) at least 
> draw a line under the current open questions.
> 
> I will start with some verbiage, then summarise with options for 
> everyone to respond to.  You can scroll to the summary immediately if 
> you like.
> 
> - 1. Component: Multi-select or Cascading-select (i.e. only one 
> component possible per ticket, but neater UX)
> - 2. Labels: rather than litigate people’s positions, I propose we do 
> the least controversial thing, which is to simply leave labels intact, 
> and only supplement them with the new schema information.  We can later 
> revisit if we decide it’s getting messy.
> - 3. "First review completed; second review ongoing": I don’t think we 
> need to complicate the process; if there are two reviews in flight, the 
> first reviewer can simply comment that they are done when ready, and the 
> second reviewer can move the status once they are done.  If the first 
> reviewer wants substantive changes, they can move the status to "Change 
> Request” before the other reviewer completes, if they like.  Everyone 
> involved can probably negotiate this fairly well, but we can introduce 
> some specific guidance on how to conduct yourself here in a follow-up.  
> - 4. Priorities: Option A: Wish, Low, Normal, High, Urgent; Option B: 
> Wish, Low, Normal, Urgent
> - 5. Mandatory Platform and Feature. Make mandatory by introducing new 
> “All” and “None” (respectively) options, so always possible to select an 
> option.
> - 6. Environment field: Remove?
> 
> I think this captures everything that has been brought up so far, except 
> for the suggestion to make "Since Version” a “Version” - but that needs 
> more discussion, as I don’t think there’s a clear alternative proposal 
> yet.
> 
> Summary:
> 
> 1: Component. (A) Multi-select; (B) Cascading-select
> 2: Labels: leave alone +1/-1
> 3: No workflow changes for first/second review: +1/-1
> 4: Priorities: Including High +1/-1
> 5: Mandatory Platform and Feature: +1/-1
> 6: Remove Environment field: +1/-1
> 
> I will begin.
> 
> 1: A
> 2: +1
> 3: +1
> 4: +1
> 5: Don’t mind
> 6: +1
> 
> 
> 
> 
> > On 29 Nov 2018, at 22:04, Scott Andreas  wrote:
> > 
> > If I read Josh’s reply right, I think the suggestion is to periodically 
> > review active labels and promote those that are demonstrably useful to 
> > components (cf. folksonomy -> 
> > taxonomy).
> >  I hadn’t read the reply as indicating that labels should be zero’d out 
> > periodically. In any case, I agree that reviewing active labels and 
> > re-evaluating our taxonomy from time to time sounds great; I don’t think 
> > I’d zero them, though.
> > 
> > Responding to a few comments:
> > 
> > –––
> > 
> > – To Joey’s question about issues languishing in Triage: I like the idea of 
> > an SLO for the “triage” state. I am happy to commit time and resources to 
> > triaging newly-reported issues, and to JIRA pruning/gardening in general. I 
> > spent part of the weekend before last adding components to a few hundred 
> > open issues and preparing the Confluence reports mentioned in the other 
> > thread. It was calming. We can also figure out how to rotate / share this 
> > responsibility.
> > 
> > – Labels discussion: If we adopt a more structured component hierarchy to 
> > treat as our primary method of organization, keep labels around for people 
> > to use as they’d like (e.g., for custom JQL queries useful to their 
> > workflows), and periodically promote those that are widely useful, I think 
> > that sounds like a fine outcome.
> > 
> > – On Sankalp’s question of issue reporter / new contributor burden: I 
> > actually think 

Re: Request to review feature-freeze proposed tickets

2018-11-20 Thread Ariel Weisberg
Hi,

I would like to get as many of these as is feasible in. Before the feature 
freeze started 1 out of 17 JIRAs that were patch available were reviewed and 
committed.

If you didn’t have access reviewers and committers, as the one out of the 17 
did, it has been essentially impossible to get your problems with Cassandra 
fixed in 4.0.

This is basically the same as saying that despite the fact Cassandra is open 
source it does you no good because it will be years before the issues impacting 
you get fixed even if you contribute the fixes yourself.

Pulling up the ladder after getting “your own” fixes in is a sure fire way to 
fracture the community into a collection of private forks containing the fixes 
people can’t live without, and pushing people to look at alternatives.

Private forks are a serious threat to the project. The people on them are at 
risk of getting left behind and Cassandra stagnates for them and becomes 
uncompetitive. Those with the resources to maintain a seriously diverged fork 
are also the ones better positioned to be active contributors.

Regards,
Ariel

> On Nov 18, 2018, at 9:18 PM, Vinay Chella  wrote:
> 
> Hi,
> 
> We still have 15 Patch Available/ open tickets which were requested for
> reviews before the Sep 1, 2018 freeze. I am starting this email thread to
> resurface and request a review of community tickets as most of these
> tickets address vital correctness, performance, and usability bugs that
> help avoid critical production issues. I tried to provide context on why we
> feel these tickets are important to get into 4.0. If you would like to
> discuss the technical details of a particular ticket, let's try to do that
> in JIRA.
> 
> CASSANDRA-14525: Cluster enters an inconsistent state after bootstrap
> failures. (Correctness bug, Production impact, Ready to Commit)
> 
> CASSANDRA-14459: DES sends requests to the wrong nodes routinely. (SLA
> breaking latencies, Production impact, Review in progress)
> 
> CASSANDRA-14303 and CASSANDRA-14557: Currently production 3.0+ clusters
> cannot be rebuilt after node failure due to 3.0’s introduction of the
> system_auth keyspace with rf of 1. These tickets both fix the regression
> introduced in 3.0 by letting operators configure rf=3 and prevent future
> outages (Usability bug, Production impact, Patch Available).
> 
> CASSANDRA-14096: Cassandra 3.11.1 Repair Causes Out of Memory. We believe
> this may also impact 3.0 (Title says it all, Production impact, Patch
> Available)
> 
> CASSANDRA-10023: It is impossible to accurately determine local read/write
> calls on C*. This patch allows users to detect when they are choosing
> incorrect coordinators. (Usability bug (troubleshoot), Review in progress)
> 
> CASSANDRA-10789: There is no way to safely stop bad clients bringing down
> C* nodes. This patch would give operators a very important tool to use
> during production incidents to mitigate impact. (Usability bug, Production
> Impact (recovery), Patch Available)
> 
> CASSANDRA-13010: No visibility into which disk is being compacted to.
> (Usability bug, Production Impact (troubleshoot), Review in progress)
> 
> CASSANDRA-12783 - Break up large MV mutations to prevent OOMs (Title says
> it all, Production Impact, Patch InProgress/ Awaiting Feedback)
> 
> CASSANDRA-14319 - nodetool rebuild from DC lets you pass invalid
> datacenters (Usability bug, Production impact, Patch available)
> 
> CASSANDRA-13841 - Smarter nodetool rebuild. Kind of a bug but would be nice
> to get it in 4.0. (Production Impact (recovery), Patch Available)
> 
> CASSANDRA-9452: Cleanup of old configuration, confusing to new C*
> operators. (Cleanup, Patch Available)
> 
> CASSANDRA-14309: Hint window persistence across the record. This way hints
> that are accumulated over a period of time when nodes are creating are less
> likely to take down the entire cluster. (Potential Production Impact, Patch
> Available)
> 
> CASSANDRA-14291: Bug from CASSANDRA-11163? (Usability Bug, Patch Available)
> 
> CASSANDRA-10540: RangeAware compaction. 256 vnode clusters really need this
> to be able to do basic things like repair. The patch needs some rework
> after transient replication (Production impact, needs contributor time)
> 
> URL for all the tickets: JIRA
> 
> 
> 
> Let me know.
> Thanks,
> Vinay Chella


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-29 Thread Ariel Weisberg
Hi,

Seeing too many -'s for changing the representation and essentially no +1s so I 
submitted a patch for just changing the default. I could use a reviewer for 
https://issues.apache.org/jira/browse/CASSANDRA-13241

I created https://issues.apache.org/jira/browse/CASSANDRA-14857  "Use a more 
space efficient representation for compressed chunk offsets" for post 4.0.

Regards,
Ariel

On Tue, Oct 23, 2018, at 11:46 AM, Ariel Weisberg wrote:
> Hi,
> 
> To summarize who we have heard from so far
> 
> WRT to changing just the default:
> 
> +1:
> Jon Haddadd
> Ben Bromhead
> Alain Rodriguez
> Sankalp Kohli (not explicit)
> 
> -0:
> Sylvaine Lebresne 
> Jeff Jirsa
> 
> Not sure:
> Kurt Greaves
> Joshua Mckenzie
> Benedict Elliot Smith
> 
> WRT to change the representation:
> 
> +1:
> There are only conditional +1s at this point
> 
> -0:
> Sylvaine Lebresne
> 
> -.5:
> Jeff Jirsa
> 
> This 
> (https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce)
>  
> is a rough cut of the change for the representation. It needs better 
> naming, unit tests, javadoc etc. but it does implement the change.
> 
> Ariel
> On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote:
> > Sorry, to be clear - I'm +1 on changing the configuration default, but I
> > think changing the compression in memory representations warrants further
> > discussion and investigation before making a case for or against it yet.
> > An optimization that reduces in memory cost by over 50% sounds pretty good
> > and we never were really explicit that those sort of optimizations would be
> > excluded after our feature freeze.  I don't think they should necessarily
> > be excluded at this time, but it depends on the size and risk of the patch.
> > 
> > On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad  wrote:
> > 
> > > I think we should try to do the right thing for the most people that we
> > > can.  The number of folks impacted by 64KB is huge.  I've worked on a lot
> > > of clusters created by a lot of different teams, going from brand new to
> > > pretty damn knowledgeable.  I can't think of a single time over the last 2
> > > years that I've seen a cluster use non-default settings for compression.
> > > With only a handful of exceptions, I've lowered the chunk size 
> > > considerably
> > > (usually to 4 or 8K) and the impact has always been very noticeable,
> > > frequently resulting in hardware reduction and cost savings.  Of all the
> > > poorly chosen defaults we have, this is one of the biggest offenders that 
> > > I
> > > see.  There's a good reason ScyllaDB  claims they're so much faster than
> > > Cassandra - we ship a DB that performs poorly for 90+% of teams because we
> > > ship for a specific use case, not a general one (time series on memory
> > > constrained boxes being the specific use case)
> > >
> > > This doesn't impact existing tables, just new ones.  More and more teams
> > > are using Cassandra as a general purpose database, we should acknowledge
> > > that adjusting our defaults accordingly.  Yes, we use a little bit more
> > > memory on new tables if we just change this setting, and what we get out 
> > > of
> > > it is a massive performance win.
> > >
> > > I'm +1 on the change as well.
> > >
> > >
> > >
> > > On Sat, Oct 20, 2018 at 4:21 AM Sankalp Kohli 
> > > wrote:
> > >
> > >> (We should definitely harden the definition for freeze in a separate
> > >> thread)
> > >>
> > >> My thinking is that this is the best time to do this change as we have
> > >> not even cut alpha or beta. All the people involved in the test will
> > >> definitely be testing it again when we have these releases.
> > >>
> > >> > On Oct 19, 2018, at 8:00 AM, Michael Shuler 
> > >> wrote:
> > >> >
> > >> >> On 10/19/18 9:16 AM, Joshua McKenzie wrote:
> > >> >>
> > >> >> At the risk of hijacking this thread, when are we going to transition
> > >> from
> > >> >> "no new features, change whatever else you want including refactoring
> > >> and
> > >> >> changing years-old defaults" to "ok, we think we have something that's
> > >> >> stable, time to start testing"?
> > >> >
> > >> > Creating a cassandra-4.0 branch would allow trunk to, for instance, g

Re: Proposed changes to CircleCI testing workflow

2018-10-26 Thread Ariel Weisberg
Hi,

Thank you for working on this. These all sound like good changes to me.

Ariel

On Fri, Oct 26, 2018, at 10:49 AM, Stefan Podkowinski wrote:
> I'd like to give you a quick update on the work that has been done
> lately on running tests using CircleCI. Please let me know if you have
> any objections or don't think this is going into the right direction, or
> have any other feedback!
> 
> We've been using CircleCI for a while now and results are used on
> constant basis for new patches. Not only by committers, but also by
> casual contributors to run unit tests. Looks like people find the
> service valuable and we should keep using it. Therefor I'd like to make
> some improvements that will make it easier to add new tests and to
> continue making CircleCI an option for all contributors, both on paid
> and free plans.
> 
> The general idea of the changes implemented in #14806, is to consolidate
> the existing config to make it more modular and have smaller jobs that
> can be scheduled ad-hoc by the developer, instead of running a few big
> jobs on every commit. Reorganizing and breaking up the existing config
> was done using the new 2.1 config features. Starting jobs on request,
> instead of automatically, is done using the manual approval feature,
> i.e. you now have to click on that job in the workflow page in order to
> start it. I'd like to see us having smaller, more specialized groups of
> tests that we can run more selectively during development, while still
> being able to run bigger tests before committing, or firing up all of
> them during testing and releasing. Other example of smaller jobs would
> be testing coverage (#14788) or cqlsh tests (#14298). But also
> individual jobs for different ant targets, like burn, stress or benchmarks.
> 
> We'd now also be able to run tests using different docker images and
> different JDKs. I've already updated the used image to also include Java
> 11 and added unit and dtest jobs to the config for that. It's now really
> easy to run tests on Java 11, although these won't pass yet. It seems to
> be important to me to have this kind of flexibility, given the
> increasingly diverse ecosystem of Java distributions. We can also add
> jobs for packaging and doing smoke tests by installing and starting
> packages on different docker images (Redhat, Debian, Ubuntu,..) at a
> later point.
> 
> As for the paid vs free plans issue, I'd also like us to discuss how we
> can make tests faster and less resource intensive in general. As a
> desired consequence, we'd be able to move away from multi-node dtests,
> to something that can be run using the free plan. I'm looking forward to
> see if #14821 can get us into that direction. Ideally we can add these
> tests into a job that can be completed on the free plan and encourage
> contributors to add new tests there, instead of having to write a dtest,
> which they won't be able to run on CircleCI without a paid plan.
> 
> Whats changing for you as a CircleCI user?
> * All tests, except unit tests, will need to be started manually and
> will not run on every commit (this can be further discussed and changed
> anytime if needed)
> * Updating the config.yml file now requires using the CircleCI cli tool
> and should not be done directly (see #14806 for technical details)
> * High resource settings can be enabled using a script/patch, either run
> manually or as commit hook (again see ticket for details)
> * Both free and paid plan users now have more tests to run
> 
> As already mentioned, please let me know if you have any thoughts on
> this, or if you think this is going into the wrong direction.
> 
> Thanks.
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-23 Thread Ariel Weisberg
Hi,

I just asked Jeff. He is -0 and -0.5 respectively.

Ariel

On Tue, Oct 23, 2018, at 11:50 AM, Benedict Elliott Smith wrote:
> I’m +1 change of default.  I think Jeff was -1 on that though.
> 
> 
> > On 23 Oct 2018, at 16:46, Ariel Weisberg  wrote:
> > 
> > Hi,
> > 
> > To summarize who we have heard from so far
> > 
> > WRT to changing just the default:
> > 
> > +1:
> > Jon Haddadd
> > Ben Bromhead
> > Alain Rodriguez
> > Sankalp Kohli (not explicit)
> > 
> > -0:
> > Sylvaine Lebresne 
> > Jeff Jirsa
> > 
> > Not sure:
> > Kurt Greaves
> > Joshua Mckenzie
> > Benedict Elliot Smith
> > 
> > WRT to change the representation:
> > 
> > +1:
> > There are only conditional +1s at this point
> > 
> > -0:
> > Sylvaine Lebresne
> > 
> > -.5:
> > Jeff Jirsa
> > 
> > This 
> > (https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce)
> >  is a rough cut of the change for the representation. It needs better 
> > naming, unit tests, javadoc etc. but it does implement the change.
> > 
> > Ariel
> > On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote:
> >> Sorry, to be clear - I'm +1 on changing the configuration default, but I
> >> think changing the compression in memory representations warrants further
> >> discussion and investigation before making a case for or against it yet.
> >> An optimization that reduces in memory cost by over 50% sounds pretty good
> >> and we never were really explicit that those sort of optimizations would be
> >> excluded after our feature freeze.  I don't think they should necessarily
> >> be excluded at this time, but it depends on the size and risk of the patch.
> >> 
> >> On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad  wrote:
> >> 
> >>> I think we should try to do the right thing for the most people that we
> >>> can.  The number of folks impacted by 64KB is huge.  I've worked on a lot
> >>> of clusters created by a lot of different teams, going from brand new to
> >>> pretty damn knowledgeable.  I can't think of a single time over the last 2
> >>> years that I've seen a cluster use non-default settings for compression.
> >>> With only a handful of exceptions, I've lowered the chunk size 
> >>> considerably
> >>> (usually to 4 or 8K) and the impact has always been very noticeable,
> >>> frequently resulting in hardware reduction and cost savings.  Of all the
> >>> poorly chosen defaults we have, this is one of the biggest offenders that 
> >>> I
> >>> see.  There's a good reason ScyllaDB  claims they're so much faster than
> >>> Cassandra - we ship a DB that performs poorly for 90+% of teams because we
> >>> ship for a specific use case, not a general one (time series on memory
> >>> constrained boxes being the specific use case)
> >>> 
> >>> This doesn't impact existing tables, just new ones.  More and more teams
> >>> are using Cassandra as a general purpose database, we should acknowledge
> >>> that adjusting our defaults accordingly.  Yes, we use a little bit more
> >>> memory on new tables if we just change this setting, and what we get out 
> >>> of
> >>> it is a massive performance win.
> >>> 
> >>> I'm +1 on the change as well.
> >>> 
> >>> 
> >>> 
> >>> On Sat, Oct 20, 2018 at 4:21 AM Sankalp Kohli 
> >>> wrote:
> >>> 
> >>>> (We should definitely harden the definition for freeze in a separate
> >>>> thread)
> >>>> 
> >>>> My thinking is that this is the best time to do this change as we have
> >>>> not even cut alpha or beta. All the people involved in the test will
> >>>> definitely be testing it again when we have these releases.
> >>>> 
> >>>>> On Oct 19, 2018, at 8:00 AM, Michael Shuler 
> >>>> wrote:
> >>>>> 
> >>>>>> On 10/19/18 9:16 AM, Joshua McKenzie wrote:
> >>>>>> 
> >>>>>> At the risk of hijacking this thread, when are we going to transition
> >>>> from
> >>>>>> "no new features, change whatever else you want including refactoring
> >>>> and
> >>>>>> changing years-old defaults" to "ok, we think we have somethin

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-23 Thread Ariel Weisberg
Hi,

To summarize who we have heard from so far

WRT to changing just the default:

+1:
Jon Haddadd
Ben Bromhead
Alain Rodriguez
Sankalp Kohli (not explicit)

-0:
Sylvaine Lebresne 
Jeff Jirsa

Not sure:
Kurt Greaves
Joshua Mckenzie
Benedict Elliot Smith

WRT to change the representation:

+1:
There are only conditional +1s at this point

-0:
Sylvaine Lebresne

-.5:
Jeff Jirsa

This 
(https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce)
 is a rough cut of the change for the representation. It needs better naming, 
unit tests, javadoc etc. but it does implement the change.

Ariel
On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote:
> Sorry, to be clear - I'm +1 on changing the configuration default, but I
> think changing the compression in memory representations warrants further
> discussion and investigation before making a case for or against it yet.
> An optimization that reduces in memory cost by over 50% sounds pretty good
> and we never were really explicit that those sort of optimizations would be
> excluded after our feature freeze.  I don't think they should necessarily
> be excluded at this time, but it depends on the size and risk of the patch.
> 
> On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad  wrote:
> 
> > I think we should try to do the right thing for the most people that we
> > can.  The number of folks impacted by 64KB is huge.  I've worked on a lot
> > of clusters created by a lot of different teams, going from brand new to
> > pretty damn knowledgeable.  I can't think of a single time over the last 2
> > years that I've seen a cluster use non-default settings for compression.
> > With only a handful of exceptions, I've lowered the chunk size considerably
> > (usually to 4 or 8K) and the impact has always been very noticeable,
> > frequently resulting in hardware reduction and cost savings.  Of all the
> > poorly chosen defaults we have, this is one of the biggest offenders that I
> > see.  There's a good reason ScyllaDB  claims they're so much faster than
> > Cassandra - we ship a DB that performs poorly for 90+% of teams because we
> > ship for a specific use case, not a general one (time series on memory
> > constrained boxes being the specific use case)
> >
> > This doesn't impact existing tables, just new ones.  More and more teams
> > are using Cassandra as a general purpose database, we should acknowledge
> > that adjusting our defaults accordingly.  Yes, we use a little bit more
> > memory on new tables if we just change this setting, and what we get out of
> > it is a massive performance win.
> >
> > I'm +1 on the change as well.
> >
> >
> >
> > On Sat, Oct 20, 2018 at 4:21 AM Sankalp Kohli 
> > wrote:
> >
> >> (We should definitely harden the definition for freeze in a separate
> >> thread)
> >>
> >> My thinking is that this is the best time to do this change as we have
> >> not even cut alpha or beta. All the people involved in the test will
> >> definitely be testing it again when we have these releases.
> >>
> >> > On Oct 19, 2018, at 8:00 AM, Michael Shuler 
> >> wrote:
> >> >
> >> >> On 10/19/18 9:16 AM, Joshua McKenzie wrote:
> >> >>
> >> >> At the risk of hijacking this thread, when are we going to transition
> >> from
> >> >> "no new features, change whatever else you want including refactoring
> >> and
> >> >> changing years-old defaults" to "ok, we think we have something that's
> >> >> stable, time to start testing"?
> >> >
> >> > Creating a cassandra-4.0 branch would allow trunk to, for instance, get
> >> > a default config value change commit and get more testing. We might
> >> > forget again, from what I understand of Benedict's last comment :)
> >> >
> >> > --
> >> > Michael
> >> >
> >> > -
> >> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >> >
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>
> >
> > --
> > Jon Haddad
> > http://www.rustyrazorblade.com
> > twitter: rustyrazorblade
> >
> 
> 
> -- 
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Ariel Weisberg
Hi,

I ran some benchmarks on my laptop
https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16656821=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16656821

For a random read workload, varying chunk size:
Chunk size  Time
   64k 25:20
   64k 25:33  
   32k 20:01
   16k 19:19
   16k 19:14
8k 16:51
4k 15:39

Ariel
On Thu, Oct 18, 2018, at 2:55 PM, Ariel Weisberg wrote:
> Hi,
> 
> For those who were asking about the performance impact of block size on 
> compression I wrote a microbenchmark.
> 
> https://pastebin.com/RHDNLGdC
> 
>  [java] Benchmark   Mode  
> Cnt  Score  Error  Units
>  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast16kthrpt   
> 15  331190055.685 ±  8079758.044  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast32kthrpt   
> 15  353024925.655 ±  7980400.003  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast64kthrpt   
> 15  365664477.654 ± 10083336.038  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressLZ4Fast8k thrpt   
> 15  305518114.172 ± 11043705.883  ops/s
>  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast16k  thrpt   
> 15  688369529.911 ± 25620873.933  ops/s
>  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast32k  thrpt   
> 15  703635848.895 ±  5296941.704  ops/s
>  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast64k  thrpt   
> 15  695537044.676 ± 17400763.731  ops/s
>  [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast8k   thrpt   
> 15  727725713.128 ±  4252436.331  ops/s
> 
> To summarize, compression is 8.5% slower and decompression is 1% faster. 
> This is measuring the impact on compression/decompression not the huge 
> impact that would occur if we decompressed data we don't need less 
> often.
> 
> I didn't test decompression of Snappy and LZ4 high, but I did test 
> compression.
> 
> Snappy:
>  [java] CompactIntegerSequenceBench.benchCompressSnappy16k   thrpt
> 2  196574766.116  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressSnappy32k   thrpt
> 2  198538643.844  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressSnappy64k   thrpt
> 2  194600497.613  ops/s
>  [java] CompactIntegerSequenceBench.benchCompressSnappy8kthrpt
> 2  186040175.059  ops/s
> 
> LZ4 high compressor:
>  [java] CompactIntegerSequenceBench.bench16k  thrpt2  
> 20822947.578  ops/s
>  [java] CompactIntegerSequenceBench.bench32k  thrpt2  
> 12037342.253  ops/s
>  [java] CompactIntegerSequenceBench.bench64k  thrpt2   
> 6782534.469  ops/s
>  [java] CompactIntegerSequenceBench.bench8k   thrpt2  
> 32254619.594  ops/s
> 
> LZ4 high is the one instance where block size mattered a lot. It's a bit 
> suspicious really when you look at the ratio of performance to block 
> size being close to 1:1. I couldn't spot a bug in the benchmark though.
> 
> Compression ratios with LZ4 fast for the text of Alice in Wonderland was:
> 
> Chunk size 8192, ratio 0.709473
> Chunk size 16384, ratio 0.667236
> Chunk size 32768, ratio 0.634735
> Chunk size 65536, ratio 0.607208
> 
> By way of comparison I also ran deflate with maximum compression:
> 
> Chunk size 8192, ratio 0.426434
> Chunk size 16384, ratio 0.402423
> Chunk size 32768, ratio 0.381627
> Chunk size 65536, ratio 0.364865
> 
> Ariel
>  
> On Thu, Oct 18, 2018, at 5:32 AM, Benedict Elliott Smith wrote:
> > FWIW, I’m not -0, just think that long after the freeze date a change 
> > like this needs a strong mandate from the community.  I think the change 
> > is a good one.
> > 
> > 
> > 
> > 
> > 
> > > On 17 Oct 2018, at 22:09, Ariel Weisberg  wrote:
> > > 
> > > Hi,
> > > 
> > > It's really not appreciably slower compared to the decompression we are 
> > > going to do which is going to take several microseconds. Decompression is 
> > > also going to be faster because we are going to do less unnecessary 
> > > decompression and the decompression itself may be faster since it may fit 
> > > in a higher level cache better. I ran a microbenchmark comparing them.
> > > 
> > > https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16653988=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16653988
> > > 
> > > Fetching a long from memory:   56 nanoseconds
> > > Compact integer sequence   :

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-18 Thread Ariel Weisberg
Hi,

For those who were asking about the performance impact of block size on 
compression I wrote a microbenchmark.

https://pastebin.com/RHDNLGdC

 [java] Benchmark   Mode  Cnt   
   Score  Error  Units
 [java] CompactIntegerSequenceBench.benchCompressLZ4Fast16kthrpt   15  
331190055.685 ±  8079758.044  ops/s
 [java] CompactIntegerSequenceBench.benchCompressLZ4Fast32kthrpt   15  
353024925.655 ±  7980400.003  ops/s
 [java] CompactIntegerSequenceBench.benchCompressLZ4Fast64kthrpt   15  
365664477.654 ± 10083336.038  ops/s
 [java] CompactIntegerSequenceBench.benchCompressLZ4Fast8k thrpt   15  
305518114.172 ± 11043705.883  ops/s
 [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast16k  thrpt   15  
688369529.911 ± 25620873.933  ops/s
 [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast32k  thrpt   15  
703635848.895 ±  5296941.704  ops/s
 [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast64k  thrpt   15  
695537044.676 ± 17400763.731  ops/s
 [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast8k   thrpt   15  
727725713.128 ±  4252436.331  ops/s

To summarize, compression is 8.5% slower and decompression is 1% faster. This 
is measuring the impact on compression/decompression not the huge impact that 
would occur if we decompressed data we don't need less often.

I didn't test decompression of Snappy and LZ4 high, but I did test compression.

Snappy:
 [java] CompactIntegerSequenceBench.benchCompressSnappy16k   thrpt2  
196574766.116  ops/s
 [java] CompactIntegerSequenceBench.benchCompressSnappy32k   thrpt2  
198538643.844  ops/s
 [java] CompactIntegerSequenceBench.benchCompressSnappy64k   thrpt2  
194600497.613  ops/s
 [java] CompactIntegerSequenceBench.benchCompressSnappy8kthrpt2  
186040175.059  ops/s

LZ4 high compressor:
 [java] CompactIntegerSequenceBench.bench16k  thrpt2  20822947.578  
ops/s
 [java] CompactIntegerSequenceBench.bench32k  thrpt2  12037342.253  
ops/s
 [java] CompactIntegerSequenceBench.bench64k  thrpt2   6782534.469  
ops/s
 [java] CompactIntegerSequenceBench.bench8k   thrpt2  32254619.594  
ops/s

LZ4 high is the one instance where block size mattered a lot. It's a bit 
suspicious really when you look at the ratio of performance to block size being 
close to 1:1. I couldn't spot a bug in the benchmark though.

Compression ratios with LZ4 fast for the text of Alice in Wonderland was:

Chunk size 8192, ratio 0.709473
Chunk size 16384, ratio 0.667236
Chunk size 32768, ratio 0.634735
Chunk size 65536, ratio 0.607208

By way of comparison I also ran deflate with maximum compression:

Chunk size 8192, ratio 0.426434
Chunk size 16384, ratio 0.402423
Chunk size 32768, ratio 0.381627
Chunk size 65536, ratio 0.364865

Ariel
 
On Thu, Oct 18, 2018, at 5:32 AM, Benedict Elliott Smith wrote:
> FWIW, I’m not -0, just think that long after the freeze date a change 
> like this needs a strong mandate from the community.  I think the change 
> is a good one.
> 
> 
> 
> 
> 
> > On 17 Oct 2018, at 22:09, Ariel Weisberg  wrote:
> > 
> > Hi,
> > 
> > It's really not appreciably slower compared to the decompression we are 
> > going to do which is going to take several microseconds. Decompression is 
> > also going to be faster because we are going to do less unnecessary 
> > decompression and the decompression itself may be faster since it may fit 
> > in a higher level cache better. I ran a microbenchmark comparing them.
> > 
> > https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16653988=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16653988
> > 
> > Fetching a long from memory:   56 nanoseconds
> > Compact integer sequence   :   80 nanoseconds
> > Summing integer sequence   :  165 nanoseconds
> > 
> > Currently we have one +1 from Kurt to change the representation and 
> > possibly a -0 from Benedict. That's not really enough to make an exception 
> > to the code freeze. If you want it to happen (or not) you need to speak up 
> > otherwise only the default will change.
> > 
> > Regards,
> > Ariel
> > 
> > On Wed, Oct 17, 2018, at 6:40 AM, kurt greaves wrote:
> >> I think if we're going to drop it to 16k, we should invest in the compact
> >> sequencing as well. Just lowering it to 16k will have potentially a painful
> >> impact on anyone running low memory nodes, but if we can do it without the
> >> memory impact I don't think there's any reason to wait another major
> >> version to implement it.
> >> 
> >> Having said that, we should probably benchmar

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-17 Thread Ariel Weisberg
Hi,

It's really not appreciably slower compared to the decompression we are going 
to do which is going to take several microseconds. Decompression is also going 
to be faster because we are going to do less unnecessary decompression and the 
decompression itself may be faster since it may fit in a higher level cache 
better. I ran a microbenchmark comparing them.

https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=16653988=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16653988

Fetching a long from memory:   56 nanoseconds
Compact integer sequence   :   80 nanoseconds
Summing integer sequence   :  165 nanoseconds

Currently we have one +1 from Kurt to change the representation and possibly a 
-0 from Benedict. That's not really enough to make an exception to the code 
freeze. If you want it to happen (or not) you need to speak up otherwise only 
the default will change.

Regards,
Ariel

On Wed, Oct 17, 2018, at 6:40 AM, kurt greaves wrote:
> I think if we're going to drop it to 16k, we should invest in the compact
> sequencing as well. Just lowering it to 16k will have potentially a painful
> impact on anyone running low memory nodes, but if we can do it without the
> memory impact I don't think there's any reason to wait another major
> version to implement it.
> 
> Having said that, we should probably benchmark the two representations
> Ariel has come up with.
> 
> On Wed, 17 Oct 2018 at 20:17, Alain RODRIGUEZ  wrote:
> 
> > +1
> >
> > I would guess a lot of C* clusters/tables have this option set to the
> > default value, and not many of them are having the need for reading so big
> > chunks of data.
> > I believe this will greatly limit disk overreads for a fair amount (a big
> > majority?) of new users. It seems fair enough to change this default value,
> > I also think 4.0 is a nice place to do this.
> >
> > Thanks for taking care of this Ariel and for making sure there is a
> > consensus here as well,
> >
> > C*heers,
> > ---
> > Alain Rodriguez - al...@thelastpickle.com
> > France / Spain
> >
> > The Last Pickle - Apache Cassandra Consulting
> > http://www.thelastpickle.com
> >
> > Le sam. 13 oct. 2018 à 08:52, Ariel Weisberg  a écrit :
> >
> > > Hi,
> > >
> > > This would only impact new tables, existing tables would get their
> > > chunk_length_in_kb from the existing schema. It's something we record in
> > a
> > > system table.
> > >
> > > I have an implementation of a compact integer sequence that only requires
> > > 37% of the memory required today. So we could do this with only slightly
> > > more than doubling the memory used. I'll post that to the JIRA soon.
> > >
> > > Ariel
> > >
> > > On Fri, Oct 12, 2018, at 1:56 AM, Jeff Jirsa wrote:
> > > >
> > > >
> > > > I think 16k is a better default, but it should only affect new tables.
> > > > Whoever changes it, please make sure you think about the upgrade path.
> > > >
> > > >
> > > > > On Oct 12, 2018, at 2:31 AM, Ben Bromhead 
> > wrote:
> > > > >
> > > > > This is something that's bugged me for ages, tbh the performance gain
> > > for
> > > > > most use cases far outweighs the increase in memory usage and I would
> > > even
> > > > > be in favor of changing the default now, optimizing the storage cost
> > > later
> > > > > (if it's found to be worth it).
> > > > >
> > > > > For some anecdotal evidence:
> > > > > 4kb is usually what we end setting it to, 16kb feels more reasonable
> > > given
> > > > > the memory impact, but what would be the point if practically, most
> > > folks
> > > > > set it to 4kb anyway?
> > > > >
> > > > > Note that chunk_length will largely be dependent on your read sizes,
> > > but 4k
> > > > > is the floor for most physical devices in terms of ones block size.
> > > > >
> > > > > +1 for making this change in 4.0 given the small size and the large
> > > > > improvement to new users experience (as long as we are explicit in
> > the
> > > > > documentation about memory consumption).
> > > > >
> > > > >
> > > > >> On Thu, Oct 11, 2018 at 7:11 PM Ariel Weisberg 
> > > wrote:
> > > > >>
> > > > >> Hi,
> > > > >>
> > > > >

Re: Implicit Casts for Arithmetic Operators

2018-10-12 Thread Ariel Weisberg
Hi,

>From reading the spec. Precision is always implementation defined. The spec 
>specifies scale in several cases, but never precision for any type or 
>operation (addition/subtraction, multiplication, division).

So we don't implement anything remotely approaching precision and scale in CQL 
when it comes to numbers I think? So we aren't going to follow the spec for 
scale. We are already pretty far down that road so I would leave it alone. 

I don't think the spec is asking for the most approximate type. It's just 
saying the result is approximate, and the precision is implementation defined. 
We could return either float or double. I think if one of the operands is a 
double we should return a double because clearly the schema thought a double 
was required to represent that number. I would also be in favor of returning a 
double all the time so that people can expect a consistent type from 
expressions involving approximate numbers.

I am a big fan of widening for arithmetic expressions in a database to avoid 
having to error on overflow. You can go to the trouble of only widening the 
minimum amount, but I think it's simpler if we always widen to bigint and 
double. This would be something the spec allows.

Definitely if we can make overflow not occur we should and the spec allows 
that. We should also not return different types for the same operand types just 
to work around overflow if we detect we need more precision.

Ariel
On Fri, Oct 12, 2018, at 12:45 PM, Benedict Elliott Smith wrote:
> If it’s in the SQL spec, I’m fairly convinced.  Thanks for digging this 
> out (and Mike for getting some empirical examples).
> 
> We still have to decide on the approximate data type to return; right 
> now, we have float+bigint=double, but float+int=float.  I think this is 
> fairly inconsistent, and either the approximate type should always win, 
> or we should always upgrade to double for mixed operands.
> 
> The quoted spec also suggests that decimal+float=float, and decimal
> +double=double, whereas we currently have decimal+float=decimal, and 
> decimal+double=decimal
> 
> If we’re going to go with an approximate operand implying an approximate 
> result, I think we should do it consistently (and consistent with the 
> SQL92 spec), and have the type of the approximate operand always be the 
> return type.
> 
> This would still leave a decision for float+double, though.  The most 
> consistent behaviour with that stated above would be to always take the 
> most approximate type to return (i.e. float), but this would seem to me 
> to be fairly unexpected for the user.
> 
> 
> > On 12 Oct 2018, at 17:23, Ariel Weisberg  wrote:
> > 
> > Hi,
> > 
> > I agree with what's been said about expectations regarding expressions 
> > involving floating point numbers. I think that if one of the inputs is 
> > approximate then the result should be approximate.
> > 
> > One thing we could look at for inspiration is the SQL spec. Not to follow 
> > dogmatically necessarily.
> > 
> > From the SQL 92 spec regarding assignment 
> > http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
> > "
> > Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
> > FLOAT, REAL, and DOUBLE PRECISION are numbers and are all mutually
> > comparable and mutually assignable. If an assignment would result
> > in a loss of the most significant digits, an exception condition
> > is raised. If least significant digits are lost, implementation-
> > defined rounding or truncating occurs with no exception condition
> > being raised. The rules for arithmetic are generally governed by
> > Subclause 6.12, "".
> > "
> > 
> > Section 6.12 numeric value expressions:
> > "
> > 1) If the data type of both operands of a dyadic arithmetic opera-
> >tor is exact numeric, then the data type of the result is exact
> >numeric, with precision and scale determined as follows:
> > ...
> > 2) If the data type of either operand of a dyadic arithmetic op-
> >erator is approximate numeric, then the data type of the re-
> >sult is approximate numeric. The precision of the result is
> >implementation-defined.
> > "
> > 
> > And this makes sense to me. I think we should only return an exact result 
> > if both of the inputs are exact.
> > 
> > I think we might want to look closely at the SQL spec and especially when 
> > the spec requires an error to be generated. Those are sometimes in the spec 
> > to prevent subtle paths to wrong answers. Any time

Re: Implicit Casts for Arithmetic Operators

2018-10-12 Thread Ariel Weisberg
Hi,

I agree with what's been said about expectations regarding expressions 
involving floating point numbers. I think that if one of the inputs is 
approximate then the result should be approximate.

One thing we could look at for inspiration is the SQL spec. Not to follow 
dogmatically necessarily.

>From the SQL 92 spec regarding assignment 
>http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt section 4.6:
"
 Values of the data types NUMERIC, DECIMAL, INTEGER, SMALLINT,
 FLOAT, REAL, and DOUBLE PRECISION are numbers and are all mutually
 comparable and mutually assignable. If an assignment would result
 in a loss of the most significant digits, an exception condition
 is raised. If least significant digits are lost, implementation-
 defined rounding or truncating occurs with no exception condition
 being raised. The rules for arithmetic are generally governed by
 Subclause 6.12, "".
"

Section 6.12 numeric value expressions:
"
 1) If the data type of both operands of a dyadic arithmetic opera-
tor is exact numeric, then the data type of the result is exact
numeric, with precision and scale determined as follows:
...
 2) If the data type of either operand of a dyadic arithmetic op-
erator is approximate numeric, then the data type of the re-
sult is approximate numeric. The precision of the result is
implementation-defined.
"

And this makes sense to me. I think we should only return an exact result if 
both of the inputs are exact.

I think we might want to look closely at the SQL spec and especially when the 
spec requires an error to be generated. Those are sometimes in the spec to 
prevent subtle paths to wrong answers. Any time we deviate from the spec we 
should be asking why is it in the spec and why are we deviating.

Another issue besides overflow handling is how we determine precision and scale 
for expressions involving two exact types.

Ariel

On Fri, Oct 12, 2018, at 11:51 AM, Michael Burman wrote:
> Hi,
> 
> I'm not sure if I would prefer the Postgres way of doing things, which is
> returning just about any type depending on the order of operators.
> Considering it actually mentions in the docs that using numeric/decimal is
> slow and also multiple times that floating points are inexact. So doing
> some math with Postgres (9.6.5):
> 
> SELECT 2147483647::bigint*1.0::double precision returns double
> precision 2147483647
> SELECT 2147483647::bigint*1.0 returns numeric 2147483647.0
> SELECT 2147483647::bigint*1.0::real returns double
> SELECT 2147483647::double precision*1::bigint returns double 2147483647
> SELECT 2147483647::double precision*1.0::bigint returns double 2147483647
> 
> With + - we can get the same amount of mixture of returned types. There's
> no difference in those calculations, just some casting. To me
> floating-point math indicates inexactness and has errors and whoever mixes
> up two different types should understand that. If one didn't want exact
> numeric type, why would the server return such? The floating point value
> itself could be wrong already before the calculation - trying to say we do
> it lossless is just wrong.
> 
> Fun with 2.65:
> 
> SELECT 2.65::real * 1::int returns double 2.6509536743
> SELECT 2.65::double precision * 1::int returns double 2.65
> 
> SELECT round(2.65) returns numeric 4
> SELECT round(2.65::double precision) returns double 4
> 
> SELECT 2.65 * 1 returns double 2.65
> SELECT 2.65 * 1::bigint returns numeric 2.65
> SELECT 2.65 * 1.0 returns numeric 2.650
> SELECT 2.65 * 1.0::double precision returns double 2.65
> 
> SELECT round(2.65) * 1 returns numeric 3
> SELECT round(2.65) * round(1) returns double 3
> 
> So as we're going to have silly values in any case, why pretend something
> else? Also, exact calculations are slow if we crunch large amount of
> numbers. I guess I slightly deviated towards Postgres' implemention in this
> case, but I wish it wasn't used as a benchmark in this case. And most
> importantly, I would definitely want the exact same type returned each time
> I do a calculation.
> 
>   - Micke
> 
> On Fri, Oct 12, 2018 at 4:29 PM Benedict Elliott Smith 
> wrote:
> 
> > As far as I can tell we reached a relatively strong consensus that we
> > should implement lossless casts by default?  Does anyone have anything more
> > to add?
> >
> > Looking at the emails, everyone who participated and expressed a
> > preference was in favour of the “Postgres approach” of upcasting to decimal
> > for mixed float/int operands?
> >
> > I’d like to get a clear-cut decision on this, so we know what we’re doing
> > for 4.0.  Then hopefully we can move on to a collective decision on Ariel’s
> > concerns about overflow, which I think are also pressing - particularly for
> > tinyint and smallint.  This does also impact implicit casts for mixed
> > integer type operations, but an approach for 

Re: Tested to upgrade to 4.0

2018-10-12 Thread Ariel Weisberg
Hi,

Thanks for reporting this. I'll get this fixed today.

Ariel

On Fri, Oct 12, 2018, at 7:21 AM, Tommy Stendahl wrote:
> Hi,
> 
> I tested to upgrade to Cassandra 4.0. I had an existing cluster with 
> 3.0.15 and upgraded the first node but it fails to start due to a 
> NullPointerException.
> 
> The problem is the new table option "speculative_write_threshold", when 
> it doesn’t exist we get a NullPointerException.
> 
> I created a jira for this 
> CASSANDRA-14820.
> 
> Regards,
> Tommy

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-11 Thread Ariel Weisberg
Hi,

This is regarding https://issues.apache.org/jira/browse/CASSANDRA-13241

This ticket has languished for a while. IMO it's too late in 4.0 to implement a 
more memory efficient representation for compressed chunk offsets. However I 
don't think we should put out another release with the current 64k default as 
it's pretty unreasonable.

I propose that we lower the value to 16kb. 4k might never be the correct 
default anyways as there is a cost to compression and 16k will still be a large 
improvement.

Benedict and Jon Haddad are both +1 on making this change for 4.0. In the past 
there has been some consensus about reducing this value although maybe with 
more memory efficiency.

The napkin math for what this costs is:
"If you have 1TB of uncompressed data, with 64k chunks that's 16M chunks at 8 
bytes each (128MB).
With 16k chunks, that's 512MB.
With 4k chunks, it's 2G.
Per terabyte of data (pre-compression)."
https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=15886621=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15886621

By way of comparison memory mapping the files has a similar cost per 4k page of 
8 bytes. Multiple mappings makes this more expensive. With a default of 16kb 
this would be 4x less expensive than memory mapping a file. I only mention this 
to give a sense of the costs we are already paying. I am not saying they are 
directly related.

I'll wait a week for discussion and if there is consensus make the change.

Regards,
Ariel

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Implicit Casts for Arithmetic Operators

2018-10-02 Thread Ariel Weisberg
Hi,

I think overflow and the role of widening conversions are pretty linked so I'll 
continue to inject that into this discussion. Also overflow is much worse since 
most applications won't be impacted by a loss of precision when an expression 
involves an int and float, but will care quite a bit if they get some nonsense 
wrapped number in an integer only expression.

For VoltDB in practice we didn't run into issues with applications not making 
progress due to exceptions with real data due to the widening conversions. The 
range of double and long are pretty big and that hides wrap around/infinity. 

I think the proposal of having all operations return a decimal is attractive in 
that these expressions always result in a consistent type. Two pain points 
might be whether client languages have decimal support and whether there is a 
performance issue? The nice thing about always returning decimal is we can 
sidestep the issue of overflow.

I would start with seeing if that's acceptable, and if it isn't then look at 
other approaches like returning a variety of types such when doing int + int 
return a bigint or int + float return a double.

If we take an approach that allows overflow the ideal end state IMO would be to 
get all users to run Cassandra in way that overflow results in an error even in 
the context of aggregation. The road to get there is tricky, but maybe start by 
having it as an opt in tunable in cassandra.yaml. I don't know how/when we 
could ever change that as a default and it's unfortunate having an option like 
this that 99% won't know they should flip.

It seems like having the default throw on overflow is not as bad as it sounds 
if you do the widening conversions since most people won't run into them. The 
change in the column types of results sets actually sounds worse if we want to 
also improve aggregrations. Many applications won't notice if the client 
library abstracts that away, but I think there are still cases where people 
would notice the type changing.

Ariel

On Tue, Oct 2, 2018, at 11:09 AM, Benedict Elliott Smith wrote:
> This (overflow) is an excellent point, but this also affects 
> aggregations which were introduced a long time ago.  They already 
> inherit Java semantics for all of the relevant types (silent wrap 
> around).  We probably want to be consistent, meaning either changing 
> aggregations (which incurs a cost for changing API) or continuing the 
> java semantics here.
> 
> This is why having these discussions explicitly in the community before 
> a release is so critical, in my view.  It’s very easy for these semantic 
> changes to go unnoticed on a JIRA, and then ossify.
> 
> 
> > On 2 Oct 2018, at 15:48, Ariel Weisberg  wrote:
> > 
> > Hi,
> > 
> > I think we should decide based on what is least surprising as you mention, 
> > but isn't overridden by some other concern.
> > 
> > It seems to me the priorities are
> > 
> > * Correctness
> > * Performance
> > * User visible complexity
> > * Developer visible complexity
> > 
> > Defaulting to silent implicit data loss is not ideal from a correctness 
> > standpoint.
> > 
> > Doing something better like using wider types doesn't seem like a 
> > performance issue.
> > 
> > From a user standpoint doing something less lossy doesn't look more complex 
> > as long as it's consistent, and documented and doesn't change from version 
> > to version.
> > 
> > There is some developer complexity, but this is a public API and we only 
> > get one shot at this. 
> > 
> > I wonder about how overflow is handled as well. In VoltDB I think we threw 
> > on overflow and tended to just do widening conversions to make that less 
> > common. We didn't imitate another database (as far as I know) we just went 
> > with what least likely to silently corrupt data.
> > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213 
> > <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213>
> > https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764 
> > <https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764>
> > 
> > Ariel
> > 
> > On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
> >> ç introduced arithmetic operators, and alongside these 
> >> came implicit casts for their operands.  There is a semantic decision to 
> >> be made, and I think the project would do well to explicitly raise this 
> >> kind of question for wider input before release, since the project is 
> >> bound by them forever more.
> >> 
> >> In this case, the choice is between lossy and lossless casts for 
> >

Re: Implicit Casts for Arithmetic Operators

2018-10-02 Thread Ariel Weisberg
Hi,

I think we should decide based on what is least surprising as you mention, but 
isn't overridden by some other concern.

It seems to me the priorities are

* Correctness
* Performance
* User visible complexity
* Developer visible complexity

Defaulting to silent implicit data loss is not ideal from a correctness 
standpoint.

Doing something better like using wider types doesn't seem like a performance 
issue.

>From a user standpoint doing something less lossy doesn't look more complex as 
>long as it's consistent, and documented and doesn't change from version to 
>version.

There is some developer complexity, but this is a public API and we only get 
one shot at this. 

I wonder about how overflow is handled as well. In VoltDB I think we threw on 
overflow and tended to just do widening conversions to make that less common. 
We didn't imitate another database (as far as I know) we just went with what 
least likely to silently corrupt data.
https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L2213
https://github.com/VoltDB/voltdb/blob/master/src/ee/common/NValue.hpp#L3764

Ariel

On Tue, Oct 2, 2018, at 7:30 AM, Benedict Elliott Smith wrote:
> ç introduced arithmetic operators, and alongside these 
> came implicit casts for their operands.  There is a semantic decision to 
> be made, and I think the project would do well to explicitly raise this 
> kind of question for wider input before release, since the project is 
> bound by them forever more.
> 
> In this case, the choice is between lossy and lossless casts for 
> operations involving integers and floating point numbers.  In essence, 
> should:
> 
> (1) float + int = float, double + bigint = double; or
> (2) float + int = double, double + bigint = decimal; or
> (3) float + int = decimal, double + bigint = decimal
> 
> Option 1 performs a lossy implicit cast from int -> float, or bigint -> 
> double.  Simply casting between these types changes the value.  This is 
> what MS SQL Server does.
> Options 2 and 3 cast without loss of precision, and 3 (or thereabouts) 
> is what PostgreSQL does.
> 
> The question I’m interested in is not just which is the right decision, 
> but how the right decision should be arrived at.  My view is that we 
> should primarily aim for least surprise to the user, but I’m keen to 
> hear from others.
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Recommended circleci settings for DTest

2018-09-29 Thread Ariel Weisberg
Hi,

Yes I think it is. I can do it Monday.

Ariel

On Fri, Sep 28, 2018, at 7:09 PM, Jay Zhuang wrote:
> Great, thanks Ariel. I assume it also works for uTest, right? Do you 
> think
> it worth updating the doc for that
> https://github.com/apache/cassandra/blob/trunk/doc/source/development/testing.rst#circleci
> 
> 
> 
> On Fri, Sep 28, 2018 at 2:55 PM Ariel Weisberg  wrote:
> 
> > Hi,
> >
> > Apply the following diff and if you have access to the higher memory
> > containers it should run the dtests with whatever you have. You may need to
> > adjust parallelism to match whatever you paid for.
> >
> > diff --git a/.circleci/config.yml b/.circleci/config.yml
> > index 5a84f724fc..76a2c9f841 100644
> > --- a/.circleci/config.yml
> > +++ b/.circleci/config.yml
> > @@ -58,16 +58,16 @@ with_dtest_jobs_only: _dtest_jobs_only
> >- build
> >  # Set env_settings, env_vars, and workflows/build_and_run_tests based on
> > environment
> >  env_settings: _settings
> > -<<: *default_env_settings
> > -#<<: *high_capacity_env_settings
> > +#<<: *default_env_settings
> > +<<: *high_capacity_env_settings
> >  env_vars: _vars
> > -<<: *resource_constrained_env_vars
> > -#<<: *high_capacity_env_vars
> > +#<<: *resource_constrained_env_vars
> > +<<: *high_capacity_env_vars
> >  workflows:
> >  version: 2
> > -build_and_run_tests: *default_jobs
> > +#build_and_run_tests: *default_jobs
> >  #build_and_run_tests: *with_dtest_jobs_only
> > -#build_and_run_tests: *with_dtest_jobs
> > +build_and_run_tests: *with_dtest_jobs
> >  docker_image: _image kjellman/cassandra-test:0.4.3
> >  version: 2
> >  jobs:
> >
> > Ariel
> >
> > On Fri, Sep 28, 2018, at 5:47 PM, Jay Zhuang wrote:
> > > Hi,
> > >
> > > Do we have a recommended circleci setup for DTest? For example, what's
> > the
> > > minimal container number I need to finish the DTest in a reasonable
> > time. I
> > > know the free account (4 containers) is not good enough for the DTest.
> > But
> > > if the community member can pay for the cost, what's the recommended
> > > settings and steps to run that?
> > >
> > > Thanks,
> > > Jay
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Recommended circleci settings for DTest

2018-09-28 Thread Ariel Weisberg
Hi,

Apply the following diff and if you have access to the higher memory containers 
it should run the dtests with whatever you have. You may need to adjust 
parallelism to match whatever you paid for.

diff --git a/.circleci/config.yml b/.circleci/config.yml
index 5a84f724fc..76a2c9f841 100644
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@@ -58,16 +58,16 @@ with_dtest_jobs_only: _dtest_jobs_only
   - build
 # Set env_settings, env_vars, and workflows/build_and_run_tests based on 
environment
 env_settings: _settings
-<<: *default_env_settings
-#<<: *high_capacity_env_settings
+#<<: *default_env_settings
+<<: *high_capacity_env_settings
 env_vars: _vars
-<<: *resource_constrained_env_vars
-#<<: *high_capacity_env_vars
+#<<: *resource_constrained_env_vars
+<<: *high_capacity_env_vars
 workflows:
 version: 2
-build_and_run_tests: *default_jobs
+#build_and_run_tests: *default_jobs
 #build_and_run_tests: *with_dtest_jobs_only
-#build_and_run_tests: *with_dtest_jobs
+build_and_run_tests: *with_dtest_jobs
 docker_image: _image kjellman/cassandra-test:0.4.3
 version: 2
 jobs:

Ariel

On Fri, Sep 28, 2018, at 5:47 PM, Jay Zhuang wrote:
> Hi,
> 
> Do we have a recommended circleci setup for DTest? For example, what's the
> minimal container number I need to finish the DTest in a reasonable time. I
> know the free account (4 containers) is not good enough for the DTest. But
> if the community member can pay for the cost, what's the recommended
> settings and steps to run that?
> 
> Thanks,
> Jay

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Request for post-freeze merge exception

2018-09-04 Thread Ariel Weisberg
+1 Transient Replication had some rebase pain as well, but we were able to get 
through it at the last minute. The traffic on the last few days was pretty 
heavy with several substantial commits.

On Tue, Sep 4, 2018, at 2:19 PM, Jeff Jirsa wrote:
> Seems like a reasonable thing to merge to me. Nothing else has been
> committed, it was approved pre-freeze, seems like the rush to merge was
> bound to have some number of rebase casualties.
> 
> On Tue, Sep 4, 2018 at 11:15 AM Sam Tunnicliffe  wrote:
> 
> > Hey all,
> >
> > On 2018-31-08 CASSANDRA-14145 had been +1'd by two reviewers and CI was
> > green, and so it was marked Ready To Commit. This was before the 4.0
> > feature freeze but before it landed, CASSANDRA-14408, which touched a few
> > common areas of the code, was merged. I didn't have chance to finish the
> > rebase over the weekend but in the end it turned out that most of the
> > conflicts were in test code and were straightforward to resolve. I'd like
> > to commit this now; the rebase is done (& has been re-reviewed), and the CI
> > is still green so I suspect most of the community would probably be ok with
> > that. We did vote for a freeze though and I don't want to subvert or
> > undermine that decision, so I wanted to check and give a chance for anyone
> > to raise objections before I did.
> >
> > I'll wait 24 hours, and if nobody objects before then I'll merge to trunk.
> >
> > Thanks,
> > Sam
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Transient Replication 4.0 status update

2018-08-31 Thread Ariel Weisberg
Hi,

All nodes being the same (in terms of functionality) is something we wanted to 
stick with at least for now. I think we want a design that changes the 
operational, availability, and consistency story as little as possible when 
it's completed.

Ariel
On Fri, Aug 31, 2018, at 2:27 PM, Carl Mueller wrote:
> SOrry to spam this with two messages...
> 
> This ticket is also interesting because it is very close to what I imagined
> a useful use case of RF4 / RF6: being basically RF3 + hot spare where you
> marked (in the case of RF4) three nodes as primary and the fourth as hot
> standby, which may be equivalent if I understand the paper/protocol to
> RF3+1 transient.
> 
> On Fri, Aug 31, 2018 at 1:07 PM Carl Mueller 
> wrote:
> 
> > I put these questions on the ticket too... Sorry if some of them are
> > stupid.
> >
> > So are (basically) these transient nodes basically serving as centralized
> > hinted handoff caches rather than having the hinted handoffs cluttering up
> > full replicas, especially nodes that have no concern for the token range
> > involved? I understand that hinted handoffs aren't being replaced by this,
> > but is that kind of the idea?
> >
> > Are the transient nodes "sitting around"?
> >
> > Will the transient nodes have cheaper/lower hardware requirements?
> >
> > During cluster expansion, does the newly streaming node acquiring data
> > function as a temporary transient node until it becomes a full replica?
> > Likewise while shrinking, does a previously full replica function as a
> > transient while it streams off data?
> >
> > Can this help vnode expansion with multiple concurrent nodes? Admittedly
> > I'm not familiar with how much work has gone into fixing cluster expansion
> > with vnodes, it is my understanding that you typically expand only one node
> > at a time or in multiples of the datacenter size
> >
> > On Mon, Aug 27, 2018 at 12:29 PM Ariel Weisberg  wrote:
> >
> >> Hi all,
> >>
> >> I wanted to give everyone an update on how development of Transient
> >> Replication is going and where we are going to be as of 9/1. Blake
> >> Eggleston, Alex Petrov, Benedict Elliott Smith, and myself have been
> >> working to get TR implemented for 4.0. Up to now we have avoided merging
> >> anything related to TR to trunk because we weren't 100% sure we were going
> >> to make the 9/1 deadline and even minimal TR functionality requires
> >> significant changes (see 14405).
> >>
> >> We focused on getting a minimal set of deployable functionality working,
> >> and want to avoid overselling what's going to work in the first version.
> >> The feature is marked explicitly as experimental and has to be enabled via
> >> a feature flag in cassandra.yaml. The expected audience for TR in 4.0 is
> >> more experienced users who are ready to tackle deploying experimental
> >> functionality. As it is deployed by experienced users and we gain more
> >> confidence in it and remove caveats the # of users it will be appropriate
> >> for will expand.
> >>
> >> For 4.0 it looks like we will be able to merge TR with support for normal
> >> reads and writes without monotonic reads. Monotonic reads require blocking
> >> read repair and blocking read repair with TR requires further changes that
> >> aren't feasible by 9/1.
> >>
> >> Future TR support would look something like
> >>
> >> 4.0.next:
> >> * vnodes (https://issues.apache.org/jira/browse/CASSANDRA-14404)
> >>
> >> 4.next:
> >> * Monotonic reads (
> >> https://issues.apache.org/jira/browse/CASSANDRA-14665)
> >> * LWT (https://issues.apache.org/jira/browse/CASSANDRA-14547)
> >> * Batch log (https://issues.apache.org/jira/browse/CASSANDRA-14549)
> >> * Counters (https://issues.apache.org/jira/browse/CASSANDRA-14548)
> >>
> >> Possibly never:
> >> * Materialized views
> >>
> >> Probably never:
> >> * Secondary indexes
> >>
> >> The most difficult changes to support Transient Replication should be
> >> behind us. LWT, Batch log, and counters shouldn't be that hard to make
> >> transient replication aware. Monotonic reads require some changes to the
> >> read path, but are at least conceptually not that hard to support. I am
> >> confident that by 4.next TR will have fewer tradeoffs.
> >>
> >> If you want to take a peek the current feature branch is
> >> https://github.com/aweisberg/cassandra/tree/14409-7 although we will be
> >> moving to 14409-8 to rebase on to trunk.
> >>
> >> Regards,
> >> Ariel
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Transient Replication 4.0 status update

2018-08-31 Thread Ariel Weisberg
Hi,

There are no transient nodes. All nodes are the same. If you have transient 
replication enabled each node will transiently replicate some ranges instead of 
fully replicating them.

Capacity requirements are reduced evenly across all nodes in the cluster.

Nodes are not temporarily transient replicas during expansion. They need to 
stream data like a full replica for the transient range before they can serve 
reads. There is a pending state similar to how there is a pending state for 
full replicas. Transient replicas also always receive writes when they are 
pending. There may be some room to relax how that is handled, but for now we 
opt to send pending transient ranges a bit more data and avoid reading from 
them when maybe we could.

This doesn't change how expansion works with vnodes. The same restrictions 
still apply. We won't officially support vnodes until we have done more testing 
and really thought through the corner cases. It's quite possible we will relax 
the restriction on creating transient keyspaces with vnodes in 4.0.x.

Ariel

On Fri, Aug 31, 2018, at 2:07 PM, Carl Mueller wrote:
> I put these questions on the ticket too... Sorry if some of them are
> stupid.
> 
> So are (basically) these transient nodes basically serving as centralized
> hinted handoff caches rather than having the hinted handoffs cluttering up
> full replicas, especially nodes that have no concern for the token range
> involved? I understand that hinted handoffs aren't being replaced by this,
> but is that kind of the idea?
> 
> Are the transient nodes "sitting around"?
> 
> Will the transient nodes have cheaper/lower hardware requirements?
> 
> During cluster expansion, does the newly streaming node acquiring data
> function as a temporary transient node until it becomes a full replica?
> Likewise while shrinking, does a previously full replica function as a
> transient while it streams off data?
> 
> Can this help vnode expansion with multiple concurrent nodes? Admittedly
> I'm not familiar with how much work has gone into fixing cluster expansion
> with vnodes, it is my understanding that you typically expand only one node
> at a time or in multiples of the datacenter size
> 
> On Mon, Aug 27, 2018 at 12:29 PM Ariel Weisberg  wrote:
> 
> > Hi all,
> >
> > I wanted to give everyone an update on how development of Transient
> > Replication is going and where we are going to be as of 9/1. Blake
> > Eggleston, Alex Petrov, Benedict Elliott Smith, and myself have been
> > working to get TR implemented for 4.0. Up to now we have avoided merging
> > anything related to TR to trunk because we weren't 100% sure we were going
> > to make the 9/1 deadline and even minimal TR functionality requires
> > significant changes (see 14405).
> >
> > We focused on getting a minimal set of deployable functionality working,
> > and want to avoid overselling what's going to work in the first version.
> > The feature is marked explicitly as experimental and has to be enabled via
> > a feature flag in cassandra.yaml. The expected audience for TR in 4.0 is
> > more experienced users who are ready to tackle deploying experimental
> > functionality. As it is deployed by experienced users and we gain more
> > confidence in it and remove caveats the # of users it will be appropriate
> > for will expand.
> >
> > For 4.0 it looks like we will be able to merge TR with support for normal
> > reads and writes without monotonic reads. Monotonic reads require blocking
> > read repair and blocking read repair with TR requires further changes that
> > aren't feasible by 9/1.
> >
> > Future TR support would look something like
> >
> > 4.0.next:
> > * vnodes (https://issues.apache.org/jira/browse/CASSANDRA-14404)
> >
> > 4.next:
> > * Monotonic reads (
> > https://issues.apache.org/jira/browse/CASSANDRA-14665)
> > * LWT (https://issues.apache.org/jira/browse/CASSANDRA-14547)
> > * Batch log (https://issues.apache.org/jira/browse/CASSANDRA-14549)
> > * Counters (https://issues.apache.org/jira/browse/CASSANDRA-14548)
> >
> > Possibly never:
> > * Materialized views
> >
> > Probably never:
> > * Secondary indexes
> >
> > The most difficult changes to support Transient Replication should be
> > behind us. LWT, Batch log, and counters shouldn't be that hard to make
> > transient replication aware. Monotonic reads require some changes to the
> > read path, but are at least conceptually not that hard to support. I am
> > confident that by 4.next TR will have fewer tradeoffs.
> >
> > If you want to take a peek the curr

Transient Replication 4.0 status update

2018-08-27 Thread Ariel Weisberg
Hi all,

I wanted to give everyone an update on how development of Transient Replication 
is going and where we are going to be as of 9/1. Blake Eggleston, Alex Petrov, 
Benedict Elliott Smith, and myself have been working to get TR implemented for 
4.0. Up to now we have avoided merging anything related to TR to trunk because 
we weren't 100% sure we were going to make the 9/1 deadline and even minimal TR 
functionality requires significant changes (see 14405).

We focused on getting a minimal set of deployable functionality working, and 
want to avoid overselling what's going to work in the first version. The 
feature is marked explicitly as experimental and has to be enabled via a 
feature flag in cassandra.yaml. The expected audience for TR in 4.0 is more 
experienced users who are ready to tackle deploying experimental functionality. 
As it is deployed by experienced users and we gain more confidence in it and 
remove caveats the # of users it will be appropriate for will expand.

For 4.0 it looks like we will be able to merge TR with support for normal reads 
and writes without monotonic reads. Monotonic reads require blocking read 
repair and blocking read repair with TR requires further changes that aren't 
feasible by 9/1.

Future TR support would look something like

4.0.next:
* vnodes (https://issues.apache.org/jira/browse/CASSANDRA-14404)

4.next:
* Monotonic reads (https://issues.apache.org/jira/browse/CASSANDRA-14665)
* LWT (https://issues.apache.org/jira/browse/CASSANDRA-14547)
* Batch log (https://issues.apache.org/jira/browse/CASSANDRA-14549)
* Counters (https://issues.apache.org/jira/browse/CASSANDRA-14548)

Possibly never:
* Materialized views
 
Probably never:
* Secondary indexes

The most difficult changes to support Transient Replication should be behind 
us. LWT, Batch log, and counters shouldn't be that hard to make transient 
replication aware. Monotonic reads require some changes to the read path, but 
are at least conceptually not that hard to support. I am confident that by 
4.next TR will have fewer tradeoffs.

If you want to take a peek the current feature branch is 
https://github.com/aweisberg/cassandra/tree/14409-7 although we will be moving 
to 14409-8 to rebase on to trunk.

Regards,
Ariel

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: upgrade guava on trunk before 9/1?

2018-08-15 Thread Ariel Weisberg
Hi,

They don't even do release notes after 23. Also no API diffs.  I mean I'm fine 
with it, but it's mostly just changing to another arbitrary version that won't 
match what is in apps.

Ariel

On Wed, Aug 15, 2018, at 10:48 AM, Jason Brown wrote:
> Hey Ariel,
> 
> Tbqh, not that much. I was mostly thinking from the "I have conflicts on
> guava versions in my app because I pull in cassandra and XYZ libraries, and
> the transitive dependencies on guava use different versions" POV. Further,
> we'll be on this version of guava for 4.0 for at least two years from now.
> 
> As I asked, "does anybody feeling strongly?". Personally, I'm sorta +0 to
> +0.5, but I was just throwing this out there in case someone does really
> think it best we upgrade (and wants to make a contribution).
> 
> -Jason
> 
> 
> 
> 
> On Wed, Aug 15, 2018 at 7:25 AM, Ariel Weisberg  wrote:
> 
> > Hi,
> >
> > What do we get from Guava in exchange for upgrading?
> >
> > Ariel
> >
> > On Wed, Aug 15, 2018, at 10:19 AM, Jason Brown wrote:
> > > Hey all,
> > >
> > > Does anyone feel strongly about upgrading guava on trunk before the 9/1
> > > feature freeze for 4.0? We are currently at 23.3 (thanks to
> > > CASSANDRA-13997), and the current is 26.0.
> > >
> > > I took a quick look, and there's about 17 compilation errors. They fall
> > > into two categories, both of which appear not too difficult to resolve (I
> > > didn't look too closely, tbh).
> > >
> > > If anyone wants to tackle this LHF I can rustle up some review time.
> > >
> > > Thanks,
> > >
> > > -Jason
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: upgrade guava on trunk before 9/1?

2018-08-15 Thread Ariel Weisberg
Hi,

What do we get from Guava in exchange for upgrading?

Ariel

On Wed, Aug 15, 2018, at 10:19 AM, Jason Brown wrote:
> Hey all,
> 
> Does anyone feel strongly about upgrading guava on trunk before the 9/1
> feature freeze for 4.0? We are currently at 23.3 (thanks to
> CASSANDRA-13997), and the current is 26.0.
> 
> I took a quick look, and there's about 17 compilation errors. They fall
> into two categories, both of which appear not too difficult to resolve (I
> didn't look too closely, tbh).
> 
> If anyone wants to tackle this LHF I can rustle up some review time.
> 
> Thanks,
> 
> -Jason

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: GitHub PR ticket spam

2018-08-06 Thread Ariel Weisberg
Hi,

Great idea. +1 to moving it to the work log.

Thanks,
Ariel

On Mon, Aug 6, 2018, at 12:40 PM, Aleksey Yeshchenko wrote:
> Nice indeed. I assume it also doesn’t spam commits@ when done this way, 
> in which case double +1 from me.
> 
> —
> AY
> 
> On 6 August 2018 at 17:18:36, Jeremiah D Jordan 
> (jeremiah.jor...@gmail.com) wrote:
> 
> Oh nice. I like the idea of keeping it but moving it to the worklog tab. 
> +1 on that from me.  
> 
> > On Aug 6, 2018, at 5:34 AM, Stefan Podkowinski  wrote:  
> >  
> > +1 for worklog option  
> >  
> > Here's an example ticket from Arrow, where they seem to be using the  
> > same approach:  
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ARROW-2D2583=DwICaQ=adz96Xi0w1RHqtPMowiL2g=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g=wYZwHSze6YITTXgzOrEvfr_onojahtjeJRzGAt8ByzM=KWt0xsOv9ESaieg432edGvPhktGkWHxVuLAdNyORiYY=
> >  
> > 
> >   
> >  
> >  
> > On 05.08.2018 09:56, Mick Semb Wever wrote:  
> >>> I find this a bit annoying while subscribed to commits@,  
> >>> especially since we created pr@ for these kind of messages. Also I don't  
> >>> really see any value in mirroring all github comments to the ticket.  
> >>  
> >>  
> >> I agree with you Stefan. It makes the jira tickets quite painful to read. 
> >> And I tend to make comments on the commits rather than the PRs so to avoid 
> >> spamming back to the jira ticket.  
> >>  
> >> But the linking to the PR is invaluable. And I can see Ariel's point about 
> >> a chronological historical archive.  
> >>  
> >>  
> >>> Ponies would be for this to be mirrored to a tab  
> >>> separate from comments in JIRA.  
> >>  
> >>  
> >> Ariel, that would be the the "worklog" option.  
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__reference.apache.org_pmc_github=DwICaQ=adz96Xi0w1RHqtPMowiL2g=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g=wYZwHSze6YITTXgzOrEvfr_onojahtjeJRzGAt8ByzM=1lWQawAO9fITzakpnmdzERuCbZs6IGQsUH_EEIMCMqs=
> >>  
> >> 
> >>   
> >>  
> >> If this works for you, and others, I can open a INFRA to switch to 
> >> worklog.  
> >> wdyt?  
> >>  
> >>  
> >> Mick.  
> >>  
> >> -  
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
> >>   
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org 
> >>   
> >>  
> >  
> > -  
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org 
> >   
> > For additional commands, e-mail: dev-h...@cassandra.apache.org 
> >   

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: GitHub PR ticket spam

2018-07-30 Thread Ariel Weisberg
Hi,

I really like having it mirrored. I would not be in favor of eliminating 
automated mirroring. What we are seeing is that removing the pain of commenting 
in JIRA is encouraging people to converse more in finer detail. That's a good 
thing.

I have also seen the pain of how various github workflows hide PRs. Rebasing, 
squashing, multiple branches, all of these can obfuscate the history of a 
review. So mirroring stuff to JIRA still makes sense to me as it's easier to 
untangle what happened in chronological order.

I think reducing verbosity to not include diffs is good. Especially if it 
contains a link to the comment. I do like being able to see the diff in JIRA 
(context switching bad) I just don't like to see it mixed in with regular 
comments. Ponies would be for this to be mirrored to a tab separate from 
comments in JIRA.

Regards,
Ariel

On Mon, Jul 30, 2018, at 1:25 PM, dinesh.jo...@yahoo.com.INVALID wrote:
> It is useful to have a historical record. However, it could definitely 
> be better (huge diffs are pointless).
> Thanks,
> Dinesh 
> 
> On Monday, July 30, 2018, 1:27:26 AM PDT, Stefan Podkowinski 
>  wrote:  
>  
>  Looks like we had some active PRs recently to discuss code changes in
> detail on GitHub, which I think is something we agreed is perfectly
> fine, in addition to the usual Jira ticket.
> 
> What bugs me a bit is that for some reasons any comments on the PR would
> be posted to the Jira ticket as well. I'm not sure what would be the
> exact reason for this, I guess it's because the PR is linked in the
> ticket? I find this a bit annoying while subscribed to commits@,
> especially since we created pr@ for these kind of messages. Also I don't
> really see any value in mirroring all github comments to the ticket.
> #14556 is a good example how you could end up with tons of unformatted
> code in the ticket that will also mess up search in jira. Does anyone
> think this is really useful, or can we stop linking the PR in the future
> (at least for highly active PRs)?
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 
>   

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Improve the performance of CAS

2018-05-16 Thread Ariel Weisberg
Hi,

I think you are looking at the right low hanging fruit.  Cassandra deserves a 
better consensus protocol, but it's a very big project.

Regards,
Ariel
On Wed, May 16, 2018, at 5:51 PM, Dikang Gu wrote:
> Cool, create a jira for it,
> https://issues.apache.org/jira/browse/CASSANDRA-14448. I have a draft patch
> working internally, will clean it up.
> 
> The EPaxos is more complicated, could be a long term effort.
> 
> Thanks
> Dikang.
> 
> On Wed, May 16, 2018 at 2:20 PM, sankalp kohli 
> wrote:
> 
> > Hi,
> > The idea of combining read with prepare sounds good. Regarding reducing
> > the commit round trip, it is possible today by giving a lower consistency
> > level for commit I think.
> >
> > Regarding EPaxos, it is a large change and will take longer to land. I
> > think we should do this as it will help lower the latencies a lot.
> >
> > Thanks,
> > Sankalp
> >
> > On Wed, May 16, 2018 at 2:15 PM, Jeremy Hanna 
> > wrote:
> >
> > > Hi Dikang,
> > >
> > > Have you seen Blake’s work on implementing egalitarian paxos or epaxos*?
> > > That might be helpful for the discussion.
> > >
> > > Jeremy
> > >
> > > * https://issues.apache.org/jira/browse/CASSANDRA-6246
> > >
> > > > On May 16, 2018, at 3:37 PM, Dikang Gu  wrote:
> > > >
> > > > Hello C* developers,
> > > >
> > > > I'm working on some performance improvements of the lightweight
> > > transitions
> > > > (compare and set), I'd like to hear your thoughts about it.
> > > >
> > > > As you know, current CAS requires 4 round trips to finish, which is not
> > > > efficient, especially in cross DC case.
> > > > 1) Prepare
> > > > 2) Quorum read current value
> > > > 3) Propose new value
> > > > 4) Commit
> > > >
> > > > I'm proposing the following improvements to reduce it to 2 round trips,
> > > > which is:
> > > > 1) Combine prepare and quorum read together, use only one round trip to
> > > > decide the ballot and also piggyback the current value in response.
> > > > 2) Propose new value, and then send out the commit request
> > > asynchronously,
> > > > so client will not wait for the ack of the commit. In case of commit
> > > > failures, we should still have chance to retry/repair it through hints
> > or
> > > > following read/cas events.
> > > >
> > > > After the improvement, we should be able to finish the CAS operation
> > > using
> > > > 2 rounds trips. There can be following improvements as well, and this
> > can
> > > > be a start point.
> > > >
> > > > What do you think? Did I miss anything?
> > > >
> > > > Thanks
> > > > Dikang
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> >
> 
> 
> 
> -- 
> Dikang

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Evolving the client protocol

2018-04-22 Thread Ariel Weisberg
Hi,

> This doesn't work without additional changes, for RF>1. The token ring could 
> place two replicas of the same token range on the same physical server, even 
> though those are two separate cores of the same server. You could add another 
> element to the hierarchy (cluster -> datacenter -> rack -> node -> 
> core/shard), but that generates unneeded range movements when a node is added.

I have seen rack awareness used/abused to solve this.

Regards,
Ariel

> On Apr 22, 2018, at 8:26 AM, Avi Kivity <a...@scylladb.com> wrote:
> 
> 
> 
>> On 2018-04-19 21:15, Ben Bromhead wrote:
>> Re #3:
>> 
>> Yup I was thinking each shard/port would appear as a discrete server to the
>> client.
> 
> This doesn't work without additional changes, for RF>1. The token ring could 
> place two replicas of the same token range on the same physical server, even 
> though those are two separate cores of the same server. You could add another 
> element to the hierarchy (cluster -> datacenter -> rack -> node -> 
> core/shard), but that generates unneeded range movements when a node is added.
> 
>> If the per port suggestion is unacceptable due to hardware requirements,
>> remembering that Cassandra is built with the concept scaling *commodity*
>> hardware horizontally, you'll have to spend your time and energy convincing
>> the community to support a protocol feature it has no (current) use for or
>> find another interim solution.
> 
> Those servers are commodity servers (not x86, but still commodity). In any 
> case 60+ logical cores are common now (hello AWS i3.16xlarge or even 
> i3.metal), and we can only expect logical core count to continue to increase 
> (there are 48-core ARM processors now).
> 
>> 
>> Another way, would be to build support and consensus around a clear
>> technical need in the Apache Cassandra project as it stands today.
>> 
>> One way to build community support might be to contribute an Apache
>> licensed thread per core implementation in Java that matches the protocol
>> change and shard concept you are looking for ;P
> 
> I doubt I'll survive the egregious top-posting that is going on in this list.
> 
>> 
>> 
>>> On Thu, Apr 19, 2018 at 1:43 PM Ariel Weisberg <ar...@weisberg.ws> wrote:
>>> 
>>> Hi,
>>> 
>>> So at technical level I don't understand this yet.
>>> 
>>> So you have a database consisting of single threaded shards and a socket
>>> for accept that is generating TCP connections and in advance you don't know
>>> which connection is going to send messages to which shard.
>>> 
>>> What is the mechanism by which you get the packets for a given TCP
>>> connection delivered to a specific core? I know that a given TCP connection
>>> will normally have all of its packets delivered to the same queue from the
>>> NIC because the tuple of source address + port and destination address +
>>> port is typically hashed to pick one of the queues the NIC presents. I
>>> might have the contents of the tuple slightly wrong, but it always includes
>>> a component you don't get to control.
>>> 
>>> Since it's hashing how do you manipulate which queue packets for a TCP
>>> connection go to and how is it made worse by having an accept socket per
>>> shard?
>>> 
>>> You also mention 160 ports as bad, but it doesn't sound like a big number
>>> resource wise. Is it an operational headache?
>>> 
>>> RE tokens distributed amongst shards. The way that would work right now is
>>> that each port number appears to be a discrete instance of the server. So
>>> you could have shards be actual shards that are simply colocated on the
>>> same box, run in the same process, and share resources. I know this pushes
>>> more of the complexity into the server vs the driver as the server expects
>>> all shards to share some client visible like system tables and certain
>>> identifiers.
>>> 
>>> Ariel
>>>> On Thu, Apr 19, 2018, at 12:59 PM, Avi Kivity wrote:
>>>> Port-per-shard is likely the easiest option but it's too ugly to
>>>> contemplate. We run on machines with 160 shards (IBM POWER 2s20c160t
>>>> IIRC), it will be just horrible to have 160 open ports.
>>>> 
>>>> 
>>>> It also doesn't fit will with the NICs ability to automatically
>>>> distribute packets among cores using multiple queues, so the kernel
>>>> would have to shuffle those packets around. Much better to have those
>>>>

Re: Evolving the client protocol

2018-04-19 Thread Ariel Weisberg
Hi,

So at technical level I don't understand this yet.

So you have a database consisting of single threaded shards and a socket for 
accept that is generating TCP connections and in advance you don't know which 
connection is going to send messages to which shard.

What is the mechanism by which you get the packets for a given TCP connection 
delivered to a specific core? I know that a given TCP connection will normally 
have all of its packets delivered to the same queue from the NIC because the 
tuple of source address + port and destination address + port is typically 
hashed to pick one of the queues the NIC presents. I might have the contents of 
the tuple slightly wrong, but it always includes a component you don't get to 
control.

Since it's hashing how do you manipulate which queue packets for a TCP 
connection go to and how is it made worse by having an accept socket per shard? 

You also mention 160 ports as bad, but it doesn't sound like a big number 
resource wise. Is it an operational headache?

RE tokens distributed amongst shards. The way that would work right now is that 
each port number appears to be a discrete instance of the server. So you could 
have shards be actual shards that are simply colocated on the same box, run in 
the same process, and share resources. I know this pushes more of the 
complexity into the server vs the driver as the server expects all shards to 
share some client visible like system tables and certain identifiers.

Ariel
On Thu, Apr 19, 2018, at 12:59 PM, Avi Kivity wrote:
> Port-per-shard is likely the easiest option but it's too ugly to 
> contemplate. We run on machines with 160 shards (IBM POWER 2s20c160t 
> IIRC), it will be just horrible to have 160 open ports.
> 
> 
> It also doesn't fit will with the NICs ability to automatically 
> distribute packets among cores using multiple queues, so the kernel 
> would have to shuffle those packets around. Much better to have those 
> packets delivered directly to the core that will service them.
> 
> 
> (also, some protocol changes are needed so the driver knows how tokens 
> are distributed among shards)
> 
> On 2018-04-19 19:46, Ben Bromhead wrote:
> > WRT to #3
> > To fit in the existing protocol, could you have each shard listen on a
> > different port? Drivers are likely going to support this due to
> > https://issues.apache.org/jira/browse/CASSANDRA-7544 (
> > https://issues.apache.org/jira/browse/CASSANDRA-11596).  I'm not super
> > familiar with the ticket so their might be something I'm missing but it
> > sounds like a potential approach.
> >
> > This would give you a path forward at least for the short term.
> >
> >
> > On Thu, Apr 19, 2018 at 12:10 PM Ariel Weisberg <ar...@weisberg.ws> wrote:
> >
> >> Hi,
> >>
> >> I think that updating the protocol spec to Cassandra puts the onus on the
> >> party changing the protocol specification to have an implementation of the
> >> spec in Cassandra as well as the Java and Python driver (those are both
> >> used in the Cassandra repo). Until it's implemented in Cassandra we haven't
> >> fully evaluated the specification change. There is no substitute for trying
> >> to make it work.
> >>
> >> There are also realities to consider as to what the maintainers of the
> >> drivers are willing to commit.
> >>
> >> RE #1,
> >>
> >> I am +1 on the fact that we shouldn't require an extra hop for range scans.
> >>
> >> In JIRA Jeremiah made the point that you can still do this from the client
> >> by breaking up the token ranges, but it's a leaky abstraction to have a
> >> paging interface that isn't a vanilla ResultSet interface. Serial vs.
> >> parallel is kind of orthogonal as the driver can do either.
> >>
> >> I agree it looks like the current specification doesn't make what should
> >> be simple as simple as it could be for driver implementers.
> >>
> >> RE #2,
> >>
> >> +1 on this change assuming an implementation in Cassandra and the Java and
> >> Python drivers.
> >>
> >> RE #3,
> >>
> >> It's hard to be +1 on this because we don't benefit by boxing ourselves in
> >> by defining a spec we haven't implemented, tested, and decided we are
> >> satisfied with. Having it in ScyllaDB de-risks it to a certain extent, but
> >> what if Cassandra decides to go a different direction in some way?
> >>
> >> I don't think there is much discussion to be had without an example of the
> >> the changes to the CQL specification to look at, but even then if it looks
> >> risky I am not likely to be i

Re: Evolving the client protocol

2018-04-19 Thread Ariel Weisberg
Hi,

>That basically means a fork in the protocol (perhaps a temporary fork if 
>we go for mode 2 where Cassandra retroactively adopts our protocol 
>changes, if they fit will).
>
>Implementing a protocol change may be easy for some simple changes, but 
>in the general case, it is not realistic to expect it.

> Can you elaborate? No one is forcing driver maintainers to update their 
> drivers to support new features, either for Cassandra or Scylla, but 
> there should be no reason for them to reject a contribution adding that 
> support.

I think it's unrealistic to expect the next version of the protocol spec to 
include functionality that is not supported by either  the server or drivers 
once a version of the server or driver supporting that protocol version is  
released. Putting something in the spec is making a hard commitment for the 
driver and server without also specifying who will do the work.

So yes a temporary fork is fine, but then you run into things like "we" don't 
like the spec change and find we want to change it again. For us it's fine 
because we never committed to supporting the fork either way. For the driver 
maintainers it's fine because they probably never accepted the spec change 
either and didn't update the drivers. This is because the maintainers aren't 
going to accept changes that are incompatible with what the Cassandra server 
implements.

So if you have a temporary fork of the spec you might also be committing to a 
temporary fork of the drivers as well as the headaches that come with the final 
version of the spec not matching your fork. We would do what we can to avoid 
that by having the conversation around the protocol design up front.

What I am largely getting at is that I think Apache Cassandra and its drivers 
can only truly commit to a spec where there is a released implementation in the 
server and drivers. Up until that point the spec is subject to change. We are 
less likely to change it if there is an implementation because we have already 
done the work and dug up most of the issues.

For sharding this is thorny and I think Ben makes a really good suggestion RE 
leveraging CASSANDRA-7544.  For paging state and timeouts I think it's likely 
we could stick to what we work out spec wise and we are happy to have the 
discussion and learn from ScyllaDB de-risking protocol changes, but if no one 
commits to doing the work you might find we release the next protocol version 
without the tentative spec changes.

Ariel
On Thu, Apr 19, 2018, at 12:53 PM, Avi Kivity wrote:
> 
> 
> On 2018-04-19 19:10, Ariel Weisberg wrote:
> > Hi,
> >
> > I think that updating the protocol spec to Cassandra puts the onus on the 
> > party changing the protocol specification to have an implementation of the 
> > spec in Cassandra as well as the Java and Python driver (those are both 
> > used in the Cassandra repo). Until it's implemented in Cassandra we haven't 
> > fully evaluated the specification change. There is no substitute for trying 
> > to make it work.
> 
> That basically means a fork in the protocol (perhaps a temporary fork if 
> we go for mode 2 where Cassandra retroactively adopts our protocol 
> changes, if they fit will).
> 
> Implementing a protocol change may be easy for some simple changes, but 
> in the general case, it is not realistic to expect it.
> 
> > There are also realities to consider as to what the maintainers of the 
> > drivers are willing to commit.
> 
> Can you elaborate? No one is forcing driver maintainers to update their 
> drivers to support new features, either for Cassandra or Scylla, but 
> there should be no reason for them to reject a contribution adding that 
> support.
> 
> If you refer to a potential politically-motivated rejection by the 
> DataStax-maintained drivers, then those drivers should and will be 
> forked. That's not true open source. However, I'm not assuming that will 
> happen.
> 
> >
> > RE #1,
> >
> > I am +1 on the fact that we shouldn't require an extra hop for range scans.
> >
> > In JIRA Jeremiah made the point that you can still do this from the client 
> > by breaking up the token ranges, but it's a leaky abstraction to have a 
> > paging interface that isn't a vanilla ResultSet interface. Serial vs. 
> > parallel is kind of orthogonal as the driver can do either.
> >
> > I agree it looks like the current specification doesn't make what should be 
> > simple as simple as it could be for driver implementers.
> >
> > RE #2,
> >
> > +1 on this change assuming an implementation in Cassandra and the Java and 
> > Python drivers.
> 
> Those were just given as examples. Each would be discussed on its own, 
> assuming we are able to find a w

Re: Evolving the client protocol

2018-04-19 Thread Ariel Weisberg
Hi,

I think that updating the protocol spec to Cassandra puts the onus on the party 
changing the protocol specification to have an implementation of the spec in 
Cassandra as well as the Java and Python driver (those are both used in the 
Cassandra repo). Until it's implemented in Cassandra we haven't fully evaluated 
the specification change. There is no substitute for trying to make it work.

There are also realities to consider as to what the maintainers of the drivers 
are willing to commit.

RE #1,

I am +1 on the fact that we shouldn't require an extra hop for range scans.

In JIRA Jeremiah made the point that you can still do this from the client by 
breaking up the token ranges, but it's a leaky abstraction to have a paging 
interface that isn't a vanilla ResultSet interface. Serial vs. parallel is kind 
of orthogonal as the driver can do either.

I agree it looks like the current specification doesn't make what should be 
simple as simple as it could be for driver implementers.

RE #2,

+1 on this change assuming an implementation in Cassandra and the Java and 
Python drivers.

RE #3,

It's hard to be +1 on this because we don't benefit by boxing ourselves in by 
defining a spec we haven't implemented, tested, and decided we are satisfied 
with. Having it in ScyllaDB de-risks it to a certain extent, but what if 
Cassandra decides to go a different direction in some way?

I don't think there is much discussion to be had without an example of the the 
changes to the CQL specification to look at, but even then if it looks risky I 
am not likely to be in favor of it.

Regards,
Ariel

On Thu, Apr 19, 2018, at 9:33 AM, glom...@scylladb.com wrote:
>
>
> On 2018/04/19 07:19:27, kurt greaves  wrote:
> > >
> > > 1. The protocol change is developed using the Cassandra process in
> > >a JIRA ticket, culminating in a patch to
> > >doc/native_protocol*.spec when consensus is achieved.
> >
> > I don't think forking would be desirable (for anyone) so this seems
> > the most reasonable to me. For 1 and 2 it certainly makes sense but
> > can't say I know enough about sharding to comment on 3 - seems to me
> > like it could be locking in a design before anyone truly knows what
> > sharding in C* looks like. But hopefully I'm wrong and there are
> > devs out there that have already thought that through.
>
> Thanks. That is our view and is great to hear.
>
> About our proposal number 3: In my view, good protocol designs are
> future proof and flexible. We certainly don't want to propose a design
> that works just for Scylla, but would support reasonable
> implementations regardless of how they may look like.
>
> >
> > Do we have driver authors who wish to support both projects?
> >
> > Surely, but I imagine it would be a minority. ​
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For
> additional commands, e-mail: dev-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Roadmap for 4.0

2018-04-12 Thread Ariel Weisberg
Hi,

+1 to September 1st. I know I will have much better availability then.

Ariel
On Thu, Apr 12, 2018, at 5:15 PM, Sankalp Kohli wrote:
> +1 with Sept 1st as I am seeing willingness for people to test it after it
> 
> > On Apr 12, 2018, at 13:59, Ben Bromhead  wrote:
> > 
> > While I would prefer earlier, if Sept 1 gets better buy-in and we can have
> > broader commitment to testing. I'm super happy with that. As Nate said,
> > having a solid line to work towards is going to help massively.
> > 
> > On Thu, Apr 12, 2018 at 4:07 PM Nate McCall  wrote:
> > 
> >>> If we push it to Sept 1 freeze, I'll personally spend a lot of time
> >> testing.
> >>> 
> >>> What can I do to help convince the Jun1 folks that Sept1 is acceptable?
> >> 
> >> I can come around to that. At this point, I really just want us to
> >> have a date we can start talking to/planning around.
> >> 
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >> 
> >> --
> > Ben Bromhead
> > CTO | Instaclustr 
> > +1 650 284 9692
> > Reliability at Scale
> > Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Roadmap for 4.0

2018-04-11 Thread Ariel Weisberg
Hi,

What is the role of minor releases in Cassandra? I know that we have guarantees 
we make about minor releases that we don't make about major releases (is this 
summarized anywhere?), but is there anyone who actually thinks those guarantees 
are worth it vs having major releases on a shorter schedule?

If we had major releases on a shorter schedule they would be smaller and 
stabilize faster and I think that has already been brought up.

We don't do calendar based releases and I think that's a mistake. Maybe we 
don't cut the final version of a release based on the calendar, but I think we 
should release branch on a fixed cadence and release when ready.

I also don't see a place for minor releases as they exist today. It seems like 
they are almost all the overhead of a major release with unnecessary 
restrictions on what is possible.

Ariel

On Wed, Apr 11, 2018, at 12:42 PM, Ben Bromhead wrote:
> I'm on the side of freezing/branching earlier so we can really start the QA
> process, but I do understand the concerns.
> 
> As Kurt alluded to previously, given our current release velocity, 4.1/5.0
> will likely be some time away after 4.0. If we manage to release two high
> quality stable major versions back to back in a span of say 12 months, that
> would actually be pretty awesome. The upgrade cycle will be a minor
> complaint for just those major versions as the community settles on a
> better cadence as we learn and go through it.
> 
> Both Kurt and Jeff have advocated for key features that should be part of
> the next major update which seems to be a major part of the desire to push
> back against an early feature freeze. Interestingly most of these
> contribute to the theme of the release (stability) even though they are
> large changes. Particularly:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-9754  - "Birch" (changes
> file format)
> https://issues.apache.org/jira/browse/CASSANDRA-10540 - RangeAwareCompaction
> https://issues.apache.org/jira/browse/CASSANDRA-13426 - Idemponent schema
> changes
> 
> We haven't seen any actual binding -1s yet on June 1, despite obvious
> concerns and plenty of +1s
> 
> Having said that, the above issues are the only ones people have identified
> as:
> 
>- the issues is a desired feature,
>- the issue has clear progress on the tickets,
>- the issues fits the theme of the release, and
>- there is some concern about the issue making the June 1 deadline.
> 
> I would invite those working on / reviewing these tickets to comment (Michael
> Kjellman, Marcus Eriksson, Aleksey Yeschenko) specifically about inclusion
> into 4.0 and June 1.
> 
> If we want to delay the feature freeze for these, either a June 1 freeze
> with a carve-out/exception for those three (they can get committed after
> June 1 to 4.0) or a moderate push back of the freeze date (e.g. July 1) may
> be an appropriate compromise.
> 
> The carve-out/exception however is messy and opens a can of worms on the
> proposed testing process for a 4.0 branch, but it is an option.
> 
> I know this list doesn't include changes like transient replicas and
> strongly consistent schema changes (previously mentioned), but the state of
> the tickets is still in an architectural discussion, so I don't think its
> worth making them blockers. Pluggable storage was also raised as something
> worth including for 4.0, if someone working on those (Dikang, Blake?) had
> an opinion on it regarding 4.0, impact on stability and a June 1 deadline?
> 
> Ben
> 
> 
> On Wed, Apr 11, 2018 at 11:15 AM Blake Eggleston 
> wrote:
> 
> > I agree that not releasing semi-regularly is not good for the project. I
> > think our habit of releasing half working software is much worse though.
> > Our testing/stability story is not iron clad. I really think the bar for
> > releasing 4.0 should be that the people in this thread are running the code
> > in production, recommending their customers run it in production, or
> > offering and supporting it as part of their cloud service.
> >
> > In that context, the argument for waiting for some features is less about
> > trying to do all the things and more about making 4.0 something worth the
> > time and expense of validating for production.
> >
> > On 4/11/18, 1:06 AM, "Sylvain Lebresne"  wrote:
> >
> > On Wed, Apr 11, 2018 at 12:35 AM Jeff Jirsa  wrote:
> >
> > > Seriously, what's the rush to branch? Do we all love merging so much
> > we
> > > want to do a few more times just for the sake of merging? If nothing
> > > diverges, there's nothing gained from the branch, and if it did
> > diverge, we
> > > add work for no real gain.
> > >
> >
> > Again, to me, the "rush" is that 1) there is tons of changes sitting in
> > trunk
> > that some user (_not all_, granted)[1], especially new ones, would
> > likely
> > benefits, and sooner for those is better than later, 2) we want to
> > 

Re: Roadmap for 4.0

2018-04-05 Thread Ariel Weisberg
Hi,

+1 to having a feature freeze date. June 1st is earlier than I would have 
picked.

Ariel

On Thu, Apr 5, 2018, at 10:57 AM, Josh McKenzie wrote:
> +1 here for June 1.
> 
> On Thu, Apr 5, 2018 at 9:50 AM, Jason Brown  wrote:
> 
> > +1
> >
> > On Wed, Apr 4, 2018 at 8:31 PM, Blake Eggleston 
> > wrote:
> >
> > > +1
> > >
> > > On 4/4/18, 5:48 PM, "Jeff Jirsa"  wrote:
> > >
> > > Earlier than I’d have personally picked, but I’m +1 too
> > >
> > >
> > >
> > > --
> > > Jeff Jirsa
> > >
> > >
> > > > On Apr 4, 2018, at 5:06 PM, Nate McCall 
> > wrote:
> > > >
> > > > Top-posting as I think this summary is on point - thanks, Scott!
> > (And
> > > > great to have you back, btw).
> > > >
> > > > It feels to me like we are coalescing on two points:
> > > > 1. June 1 as a freeze for alpha
> > > > 2. "Stable" is the new "Exciting" (and the testing and dogfooding
> > > > implied by such before a GA)
> > > >
> > > > How do folks feel about the above points?
> > > >
> > > >
> > > >> Re-raising a point made earlier in the thread by Jeff and affirmed
> > > by Josh:
> > > >>
> > > >> –––
> > > >> Jeff:
> > >  A hard date for a feature freeze makes sense, a hard date for a
> > > release
> > >  does not.
> > > >>
> > > >> Josh:
> > > >>> Strongly agree. We should also collectively define what "Done"
> > > looks like
> > > >>> post freeze so we don't end up in bike-shedding hell like we have
> > > in the
> > > >>> past.
> > > >> –––
> > > >>
> > > >> Another way of saying this: ensuring that the 4.0 release is of
> > > high quality is more important than cutting the release on a specific
> > date.
> > > >>
> > > >> If we adopt Sylvain's suggestion of freezing features on a
> > "feature
> > > complete" date (modulo a "definition of done" as Josh suggested), that
> > will
> > > help us align toward the polish, performance work, and dog-fooding needed
> > > to feel great about shipping 4.0. It's a good time to start thinking
> > about
> > > the approaches to testing, profiling, and dog-fooding various
> > contributors
> > > will want to take on before release.
> > > >>
> > > >> I love how Ben put it:
> > > >>
> > > >>> An "exciting" 4.0 release to me is one that is stable and usable
> > > >>> with no perf regressions on day 1 and includes some of the big
> > > >>> internal changes mentioned previously.
> > > >>>
> > > >>> This will set the community up well for some awesome and exciting
> > > >>> stuff that will still be in the pipeline if it doesn't make it to
> > > 4.0.
> > > >>
> > > >> That sounds great to me, too.
> > > >>
> > > >> – Scott
> > > >
> > > > 
> > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > >
> > > 
> > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> > >
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: question on running cassandra-dtests

2018-03-28 Thread Ariel Weisberg
l]
>   at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:296) 
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at io.netty.util.concurrent.SingleThreadEventExecutor
> $5.run(SingleThreadEventExecutor.java:858) [netty-all-4.1.14.Final.jar:
> 4.1.14.Final]
>   at io.netty.util.concurrent.DefaultThreadFactory
> $DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) [netty-
> all-4.1.14.Final.jar:4.1.14.Final]
>   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151], ERROR 
> [MessagingService-NettyOutbound-Thread-4-6] 2018-03-28 00:24:37,061 
> OutboundHandshakeHandler.java:209 - Failed to properly handshake with 
> peer 127.0.0.2:7000 (GOSSIP). Closing the channel.
> java.lang.NoSuchMethodError: 
> org.apache.cassandra.net.async.OutboundConnectionIdentifier.connectionAddress()Ljava/
> net/InetSocketAddress;
>   at 
> org.apache.cassandra.net.async.OutboundHandshakeHandler.channelActive(OutboundHandshakeHandler.java:107)
>  
> ~[main/:na]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:213)
>  
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:199)
>  
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:192)
>  
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at io.netty.channel.DefaultChannelPipeline
> $HeadContext.channelActive(DefaultChannelPipeline.java:1330) [netty-
> all-4.1.14.Final.jar:4.1.14.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:213)
>  
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:199)
>  
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:910)
>  
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at io.netty.channel.epoll.AbstractEpollStreamChannel
> $EpollStreamUnsafe.fulfillConnectPromise(AbstractEpollStreamChannel.java:855) 
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at io.netty.channel.epoll.AbstractEpollStreamChannel
> $EpollStreamUnsafe.finishConnect(AbstractEpollStreamChannel.java:888) 
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at io.netty.channel.epoll.AbstractEpollStreamChannel
> $EpollStreamUnsafe.epollOutReady(AbstractEpollStreamChannel.java:907) 
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at 
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:394) 
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:296) 
> [netty-all-4.1.14.Final.jar:4.1.14.Final]
>   at io.netty.util.concurrent.SingleThreadEventExecutor
> $5.run(SingleThreadEventExecutor.java:858) [netty-all-4.1.14.Final.jar:
> 4.1.14.Final]
>   at io.netty.util.concurrent.DefaultThreadFactory
> $DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) [netty-
> all-4.1.14.Final.jar:4.1.14.Final]
>   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151], ERROR 
> [MessagingService-NettyOutbound-Thread-4-7] 2018-03-28 00:24:37,063 
> OutboundHandshakeHandler.java:209 - Failed to properly handshake with 
> peer 127.0.0.3:7000 (GOSSIP). Closing the channel.
> 
> Can somebody help me figure out how I can run dtests successfully? Once 
> I am able to do that, I will be able to proceed with the implementation 
> of tests for the JIRA ticket I'm working on.
> 
> Thanks,
> Preetika
> 
> -Original Message-
> From: Ariel Weisberg [mailto:ar...@weisberg.ws] 
> Sent: Tuesday, March 27, 2018 7:15 PM
> To: dev@cassandra.apache.org
> Subject: Re: question on running cassandra-dtests
> 
> Hi,
> 
> Great! Glad you were able to get up and running. The dtests can be 
> tricky if you aren't already somewhat familiar with Python.
> 
> Ariel
> 
> On Mon, Mar 26, 2018, at 9:10 PM, Murukesh Mohanan wrote:
> > On Tue, Mar 27, 2018 at 6:47 Ariel Weisberg <ar...@weisberg.ws> wrote:
> > 
> > > Hi,
> > >
> > > Are you deleting the venv before creating it? You shouldn't really 
> > > need to use sudo for the virtualenv. That is going to make things 
> > > potentially wonky. Naming it cassandra-dtest might also do something 
> > > wonky if you have a cassandra-dtest directory already. I usually 
> > > name it just venv and place it in the same subdir as the require

Re: question on running cassandra-dtests

2018-03-27 Thread Ariel Weisberg
Hi,

Great! Glad you were able to get up and running. The dtests can be tricky if 
you aren't already somewhat familiar with Python.

Ariel

On Mon, Mar 26, 2018, at 9:10 PM, Murukesh Mohanan wrote:
> On Tue, Mar 27, 2018 at 6:47 Ariel Weisberg <ar...@weisberg.ws> wrote:
> 
> > Hi,
> >
> > Are you deleting the venv before creating it? You shouldn't really need to
> > use sudo for the virtualenv. That is going to make things potentially
> > wonky. Naming it cassandra-dtest might also do something wonky if you have
> > a cassandra-dtest directory already. I usually name it just venv and place
> > it in the same subdir as the requirements file.
> >
> > Also running sudo is going to create a new shell and then exit the shell
> > immediately so when you install the requirements it might be doing it not
> > in the venv, but in whatever is going on inside the sudo shell.
> 
> 
> Yep, looking at the logs, that's probably the issue. When activating a venv
> (with `source .../bin/activate`), it sets environment variables (`PATH`,
> `PYTHONHOME` etc.) so that the virtualenv's Python, pip are used instead of
> the system Python and pip. sudo defaults to using a clean PATH and
> resetting most of the user's environment, so the effects of the venv are
> lost when running in sudo.
> 
> 
> The advantage of virtualenv is not needing to mess with system packages at
> > all so sudo is inadvisable when creating, activating, and pip installing
> > things.
> >
> > You might need to use pip3 instead of pip, but I suspect that in a correct
> > venv pip is going to point to pip3.
> >
> > Ariel
> >
> > On Mon, Mar 26, 2018, at 5:31 PM, Tyagi, Preetika wrote:
> > > Yes, that's correct. I followed README and ran all below steps to create
> > > virtualenv. Attached is the output of all commands I ran successfully
> > > except the last one i.e. pytest.
> > >
> > > Could you please let me know if you see anything wrong or missing?
> > >
> > > Thanks,
> > > Preetika
> > >
> > > -Original Message-
> > > From: Ariel Weisberg [mailto:ar...@weisberg.ws]
> > > Sent: Monday, March 26, 2018 9:32 AM
> > > To: dev@cassandra.apache.org
> > > Subject: Re: question on running cassandra-dtests
> > >
> > > Hi,
> > >
> > > Your environment is python 2.7 when it should be python 3.
> > > See:
> > > >   File "/usr/local/lib/python2.7/dist-packages/_pytest/assertion/
> > > > rewrite.py", line 213, in load_module
> > >
> > > Are you using virtualenv to create a python 3 environment to use with
> > the tests?
> > >
> > > From README.md:
> > >
> > > **Note**: While virtualenv isn't strictly required, using virtualenv is
> > > almost always the quickest path to success as it provides common base
> > > setup across various configurations.
> > >
> > > 1. Install virtualenv: ``pip install virtualenv`` 2. Create a new
> > > virtualenv: ``virtualenv --python=python3 --no-site-packages ~/dtest``
> > > 3. Switch/Activate the new virtualenv: ``source ~/dtest/bin/activate``
> > > 4. Install remaining DTest Python dependencies: ``pip install -r /path/
> > > to/cassandra-dtest/requirements.txt``
> > >
> > > Regards,
> > > Ariel
> > >
> > > On Mon, Mar 26, 2018, at 11:13 AM, Tyagi, Preetika wrote:
> > > > I was able to run requirements.txt with success. Below is the error I
> > get:
> > > >
> > > > Traceback (most recent call last):
> > > >   File "/usr/local/lib/python2.7/dist-packages/_pytest/config.py",
> > > > line 371, in _importconftest
> > > > mod = conftestpath.pyimport()
> > > >   File "/usr/local/lib/python2.7/dist-packages/py/_path/local.py",
> > > > line 668, in pyimport
> > > > __import__(modname)
> > > >   File "/usr/local/lib/python2.7/dist-packages/_pytest/assertion/
> > > > rewrite.py", line 213, in load_module
> > > > py.builtin.exec_(co, mod.__dict__)
> > > >   File "/usr/local/lib/python2.7/dist-packages/py/_builtin.py", line
> > > > 221, in exec_
> > > > exec2(obj, globals, locals)
> > > >   File "", line 7, in exec2
> > > >   File "/home//conftest.py", line 11, in 
> > > > from itertools import zip_longest
> > > > ImportError: cannot import name zip_longest
> > > > ERRO

Re: question on running cassandra-dtests

2018-03-26 Thread Ariel Weisberg
Hi,

Your environment is python 2.7 when it should be python 3.
See:
>   File "/usr/local/lib/python2.7/dist-packages/_pytest/assertion/
> rewrite.py", line 213, in load_module

Are you using virtualenv to create a python 3 environment to use with the tests?

>From README.md:

**Note**: While virtualenv isn't strictly required, using virtualenv is almost 
always the quickest
path to success as it provides common base setup across various configurations.

1. Install virtualenv: ``pip install virtualenv``
2. Create a new virtualenv: ``virtualenv --python=python3 --no-site-packages 
~/dtest``
3. Switch/Activate the new virtualenv: ``source ~/dtest/bin/activate``
4. Install remaining DTest Python dependencies: ``pip install -r 
/path/to/cassandra-dtest/requirements.txt``

Regards,
Ariel

On Mon, Mar 26, 2018, at 11:13 AM, Tyagi, Preetika wrote:
> I was able to run requirements.txt with success. Below is the error I get:
> 
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/_pytest/config.py", line 
> 371, in _importconftest
> mod = conftestpath.pyimport()
>   File "/usr/local/lib/python2.7/dist-packages/py/_path/local.py", line 
> 668, in pyimport
> __import__(modname)
>   File "/usr/local/lib/python2.7/dist-packages/_pytest/assertion/
> rewrite.py", line 213, in load_module
> py.builtin.exec_(co, mod.__dict__)
>   File "/usr/local/lib/python2.7/dist-packages/py/_builtin.py", line 
> 221, in exec_
> exec2(obj, globals, locals)
>   File "", line 7, in exec2
>   File "/home//conftest.py", line 11, in 
> from itertools import zip_longest
> ImportError: cannot import name zip_longest
> ERROR: could not load /home//conftest.py
> 
> Thanks,
> Preetika
> 
> -Original Message-
> From: Murukesh Mohanan [mailto:murukesh.moha...@gmail.com] 
> Sent: Sunday, March 25, 2018 10:48 PM
> To: dev@cassandra.apache.org
> Subject: Re: question on running cassandra-dtests
> 
> The complete error is needed. I get something similar if I hadn't run 
> `pip3 install -r requirements.txt`:
> 
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/_pytest/config.py", line 
> 328, in _getconftestmodules
> return self._path2confmods[path]
> KeyError: local('/home/muru/dev/cassandra-dtest')
> 
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/_pytest/config.py", line 
> 359, in _importconftest
> return self._conftestpath2mod[conftestpath]
> KeyError: local('/home/muru/dev/cassandra-dtest/conftest.py')
> 
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/_pytest/config.py", line 
> 365, in _importconftest
> mod = conftestpath.pyimport()
>   File "/usr/local/lib/python3.6/site-packages/py/_path/local.py", line 
> 668, in pyimport
> __import__(modname)
>   File "/usr/local/lib/python3.6/site-packages/_pytest/assertion/
> rewrite.py", line 212, in load_module
> py.builtin.exec_(co, mod.__dict__)
>   File "/home/muru/dev/cassandra-dtest/conftest.py", line 13, in 
> 
> from dtest import running_in_docker, 
> cleanup_docker_environment_before_test_execution
>   File "/home/muru/dev/cassandra-dtest/dtest.py", line 12, in 
> import cassandra
> ModuleNotFoundError: No module named 'cassandra'
> ERROR: could not load /home/muru/dev/cassandra-dtest/conftest.py
> 
> Of course, `pip3 install -r requirements.txt` creates an `src` directory 
> with appropriate branches of ccm and cassandra-driver checked out.
> 
> If you have run `pip3 install -r requirements.txt`, then something else 
> is wrong and we need the complete error log.
> 
> On 2018/03/23 20:22:47, "Tyagi, Preetika"  wrote: 
> > Hi All,
> > 
> > I am trying to setup and run Cassandra-dtests so that I can write some 
> > tests for a JIRA ticket I have been working on.
> > This is the repo I am using: https://github.com/apache/cassandra-dtest
> > I followed all the instructions and installed dependencies.
> > 
> > However, when I run "pytest -cassandra-dir= > directory>
> > 
> > It throws the error "could not load /conftest.py.
> > 
> > I checked that this file (conftest.py) exists in Cassandra-dtest source 
> > root and I'm not sure why it cannot find it. Does anyone have any idea what 
> > might be going wrong here?
> > 
> > I haven't used dtests before so I wonder if I'm missing something here.
> > 
> > Thanks,
> > Preetika
> > 
> > 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


Re: [DISCUSS] java 9 and the future of cassandra on the jdk

2018-03-21 Thread Ariel Weisberg
Hi,

I'm not clear on what building and bundling our own JRE/JDK accomplishes? What 
is our source for JRE updates going to be? Are we going to build our own and 
does Oracle release the source for their LTS releases? Are we going to extract 
LTS updates from CentOS?

If the goal of bundling is to simplify upgrading the JRE/JDK for users by 
synchronizing it with updating Cassandra, well I think that isn't so bad. Sure 
it's a responsibility on our part, but if we can extract the Ubuntu or CentOS 
build it's "just" a matter of getting the latest bug fix JRE/JDK every time we 
cut a minor release. However it's us taking responsibility for it when we 
didn't previously and I don't see why we can't just document where to get 
updates from instead of bundling it ourselves.

Is there a licensing issue that forces us to just ship the JRE vs JDK or is it 
about download size? For release builds just the JRE is fine, but for building 
from source it would be nice if it bundled or fetched the full JDK. If we can 
avoid checking in a large binary distribution that would also be good.

Ariel

On Wed, Mar 21, 2018, at 10:21 AM, Gerald Henriksen wrote:
> On Wed, 21 Mar 2018 14:04:39 +0100, you wrote:
> 
> >There's also another option, which I just want to mention here for the
> >sake of discussion.
> >
> >Quoting the Oracle Support Roadmap:
> >"Instead of relying on a pre-installed standalone JRE, we encourage
> >application developers to deliver JREs with their applications."
> >
> >I've played around with Java 9 a while ago and also tested creating a
> >self contained JRE using jlink, which you can bundle and ship with your
> >application. So there's a technical solution for that with Java 9. Of
> >course you'd have to clarify licensing issues (OpenJDK is GPLv2 +
> >Classpath exception) first.
> >
> >Bundling a custom JRE along with Cassandra, would be convenient in a way
> >that we can do all the testing against the bundled Java version. We
> >could also switch to a new Java version whenever it fits us.
> 
> To a certain extent though the issue isn't whether Cassandra works
> well with the given JRE but rather the issue of having a supported JRE
> in a production environment.
> 
> If Cassandra ships with a bundled JRE does that then mean the
> people/organizations downloading and using that product are going to
> expect the Cassandra project to provide bug and security updates to
> the JRE as well as Cassandra?
> 
> What happens if an organization gets hacked due to an issue in an out
> of date JRE that Cassandra bundled?  Yes, that can currently happen if
> the organization chooses to run Cassandra on an unsupported JRE.  But
> in that case the organization has made that decision, not Cassandra.
> 
> Essentially any security concious entity, whether a person or
> organization, running any software stack on top of Java (or I guess
> any of the other languages based on the JVM) is going to have to make
> a choice between constantly updating their JRE or going with a LTS
> version (either from Oracle or Red Hat or any other company that is
> willing to provide it).  Or maybe even move to .Net now that it is
> supported on Linux.
> 
> I don't think there are any great choices here for Cassandra or any of
> the other Java based projects but an easy solution (in terms of basing
> the project on a supported JRE that can be downloaded for free) would
> be to choose whatever version of OpenJDK is supported by Red Hat or
> any other Linux distribution that offers a LTS release.
> 
> So for example basing on OpenJDK 8 gets you support until October 2020
> with paying for Red Hat, or for free (with mainly security updates) by
> using Centos.
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] java 9 and the future of cassandra on the jdk

2018-03-20 Thread Ariel Weisberg
Hi,

Synchronizing with Oracle LTS releases is kind of low value if it's a paid 
offering. But if someone in the community doesn't want to upgrade and pays 
Oracle we don't want to get in the way of that.

Which is how you end up with what Jordan and ElasticSearch suggest. I'm still 
+1 on that although in my heart of hearts I want  to only support the latest 
OpenJDK on trunk and after we cut a release only change the JDK if there is a 
serious issue.

It's going to be annoying once we have a serious security or correctness issue 
and we need to move to a later OpenJDK. The majority won't be paying Oracle for 
LTS. I don't think that will happen that often though.

Regards,
Ariel

If that ends up not working and we find it's a problem to not be getting 
On Tue, Mar 20, 2018, at 4:50 PM, Jason Brown wrote:
> Thanks to Hannu and others pointing out that the OracleJDK is a
> *commercial* LTS, and thus not an option. mea culpa for missing the
> "commercial" and just focusing on the "LTS" bit. OpenJDK is is, then.
> 
> Stefan's elastic search link is rather interesting. Looks like they are
> compiling for both a LTS version as well as the current OpenJDK. They
> assume some of their users will stick to a LTS version and some will run
> the current version of OpenJDK.
> 
> While it's extra work to add JDK version as yet another matrix variable in
> addition to our branching, is that something we should consider? Or are we
> going to burden maintainers even more? Do we have a choice? Note: I think
> this is similar to what Jeremiah is proposed.
> 
> @Ariel: Going beyond 3 years could be tricky in the worst case because
> bringing in up to 3 years of JDK changes to an older release might mean
> some of our dependencies no longer function and now it's not just minor
> fixes it's bringing in who knows what in terms of updated dependencies
> 
> I'm not sure we have a choice anymore, as we're basically bound to what the
> JDK developers choose to do (and we're bound to the JDK ...). However, if
> we have the changes necessary for the JDK releases higher than the LTS (if
> we following the elastic search model), perhaps it'll be a reasonably
> smooth transition?
> 
> On Tue, Mar 20, 2018 at 1:31 PM, Jason Brown <jasedbr...@gmail.com> wrote:
> 
> > copied directly from dev channel, just to keep with this ML conversation
> >
> > 08:08:26   Robert Stupp jasobrown: https://www.azul.com/java-
> > stable-secure-free-choose-two-three/ and https://blogs.oracle.com/java-
> > platform-group/faster-and-easier-use-and-redistribution-of-java-se
> > 08:08:38 the 2nd says: "The Oracle JDK will continue as a commercial long
> > term support offering"
> > 08:08:46 also: http://www.oracle.com/technetwork/java/eol-135779.html
> > 08:09:21 the keyword in that cite is "commercial"
> > 08:21:21  Michael Shuler a couple more thoughts.. 1) keep C*
> > support in step with latest Ubuntu LTS OpenJDK major in main, 2) bundle JRE
> > in C* releases? (JDK is not "legal" to bundle)
> > 08:23:44  https://www.elastic.co/blog/elasticsearch-
> > java-9-and-beyond  - interesting read on that matter
> > 08:26:04 can't wait for the infra and CI testing implications.. will be
> > lot's of fun ;(
> > 08:42:13  Robert Stupp Not sure whether stepping with Ubuntu is
> > necessary. It's not so difficult to update apt.source ;)
> > 08:42:43 CI ? It just let's your test matrix explode - a litte ;)
> > 08:46:48  Michael Shuler yep, we currently `def jdkLabel = 'JDK 1.8
> > (latest)'` in job DSL and could easily modify that
> >
> > On Tue, Mar 20, 2018 at 9:08 AM, Kant Kodali <k...@peernova.com> wrote:
> >
> >> Java 10 is releasing today!
> >>
> >> On Tue, Mar 20, 2018 at 9:07 AM, Ariel Weisberg <ar...@weisberg.ws>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > +1 to what Jordan is saying.
> >> >
> >> > It seems like if we are cutting a release off of trunk we want to make
> >> > sure we get N years of supported JDK out of it. For a single LTS
> >> release N
> >> > could be at most 3 and historically that isn't long enough and it's very
> >> > likely we will get < 3 after a release is cut.
> >> >
> >> > Going beyond 3 years could be tricky in the worst case because bringing
> >> in
> >> > up to 3 years of JDK changes to an older release might mean some of our
> >> > dependencies no longer function and now it's not just minor fixes it's
> >> > bringing in who knows what in terms of updated dependencies.
> >> >
> >> > I think in some cases w

Re: Debug logging enabled by default since 2.2

2018-03-20 Thread Ariel Weisberg
Hi,

That's good to hear.

What's the difference between DEBUG and TRACE? Obviously we decide ourselves. I 
don't have a good answer because right now we are in the process of eliminating 
the distinction we used to make which is that DEBUG is safe to turn on in 
production and TRACE is not.

Ariel

On Tue, Mar 20, 2018, at 12:36 PM, Alexander Dejanovski wrote:
> Ariel,
> 
> the current plan that's discussed on the ticket (
> https://issues.apache.org/jira/browse/CASSANDRA-14326) is to maintain the
> separation and keep the debug.log for "real" DEBUG level logging, which
> would be disabled by default.
> A new intermediate marker would be created to have VERBOSE_INFO logging
> (some current debug loggings must be changed to that new level/marker),
> which would be enabled by default, and "standard" INFO logging would go to
> system.log.
> 
> I guess in that configuration, some/most TRACE level loggings would then be
> eligible to graduate to DEBUG...?
> 
> 
> 
> 
> On Tue, Mar 20, 2018 at 4:19 PM Ariel Weisberg <ar...@weisberg.ws> wrote:
> 
> > Hi,
> >
> > Signal to noise ratio matters for logs. Things that we log at DEBUG aren't
> > at all bound by constraints of human readability or being particularly
> > relevant most of the time. I don't want to see most of this stuff unless I
> > have already not found what I am looking for at INFO and above.
> >
> > Can we at least maintain the separation of what is effectively debug
> > logging (switching to an annotation aside) from INFO and above? I want to
> > avoid two steps forward one step back.
> >
> > Ariel
> > On Tue, Mar 20, 2018, at 9:23 AM, Paulo Motta wrote:
> > > That sounds like a good plan, Alexander! Thanks!
> > >
> > > Stefan, someone needs to go through all messages being logged at DEBUG
> > > and reclassify important ones as INFO. I suggest continuing discussion
> > > on specifics on CASSANDRA-14326.
> > >
> > > 2018-03-20 6:46 GMT-03:00 Stefan Podkowinski <s...@apache.org>:
> > > > Are you suggesting to move all messages currently logged via debug() to
> > > > info() with the additional marker set, or only particular messages?
> > > >
> > > >
> > > > On 19.03.2018 19:51, Paulo Motta wrote:
> > > >> Thanks for the constructive input and feedback! From this discussion
> > > >> it seems like overloading the DEBUG level to signify
> > > >> async-verbose-INFO on CASSANDRA-10241 is leading to some confusion and
> > > >> we should fix this.
> > > >>
> > > >> However, we cannot simply turn debug.log off as during CASSANDRA-10241
> > > >> some verbose-but-useful-info-logs, such as flush information were
> > > >> changed from INFO to DEBUG, and since the patch has been in for nearly
> > > >> 3 years it's probably non-revertable. Furthermore, the practice of
> > > >> using the DEBUG level for logging non-debug stuff has been in our
> > > >> Logging Guidelines
> > > >> (https://wiki.apache.org/cassandra/LoggingGuidelines) since then, so
> > > >> there is probably useful DEBUG stuff that would need to be turned into
> > > >> INFO if we get rid of debug.log.
> > > >>
> > > >> For this reason I'm more in favor of converting the debug.log into
> > > >> async/verbose_system.log as suggested by Jeremiah and use a marker to
> > > >> direct these logs (former DEBUG level logs) to that log instead.
> > > >> Nevertheless, if the majority prefers to get back to a single
> > > >> system.log file and get rid of debug.log/verbose_system.log altogether
> > > >> then we would need to go through all log usages and readjust them to
> > > >> use the proper logging levels and update our logging guidelines to
> > > >> reflect whatever new policy is decided, not only disabling debug.log
> > > >> and call it a day.
> > > >>
> > > >> 2018-03-19 12:02 GMT-03:00 Jeremiah D Jordan <jerem...@datastax.com>:
> > > >>> People seem hung up on DEBUG here.  The goal of CASSANDRA-10241 was
> > > >>> to clean up the system.log so that it a very high “signal” in terms
> > of what was logged
> > > >>> to it synchronously, but without reducing the ability of the logs to
> > allow people to
> > > >>> solve problems and perform post mortem analysis of issues.  We have
> > informational
> > > >>> log messages t

Re: Debug logging enabled by default since 2.2

2018-03-20 Thread Ariel Weisberg
Hi,

Signal to noise ratio matters for logs. Things that we log at DEBUG aren't at 
all bound by constraints of human readability or being particularly relevant 
most of the time. I don't want to see most of this stuff unless I have already 
not found what I am looking for at INFO and above.

Can we at least maintain the separation of what is effectively debug logging 
(switching to an annotation aside) from INFO and above? I want to avoid two 
steps forward one step back.

Ariel
On Tue, Mar 20, 2018, at 9:23 AM, Paulo Motta wrote:
> That sounds like a good plan, Alexander! Thanks!
> 
> Stefan, someone needs to go through all messages being logged at DEBUG
> and reclassify important ones as INFO. I suggest continuing discussion
> on specifics on CASSANDRA-14326.
> 
> 2018-03-20 6:46 GMT-03:00 Stefan Podkowinski :
> > Are you suggesting to move all messages currently logged via debug() to
> > info() with the additional marker set, or only particular messages?
> >
> >
> > On 19.03.2018 19:51, Paulo Motta wrote:
> >> Thanks for the constructive input and feedback! From this discussion
> >> it seems like overloading the DEBUG level to signify
> >> async-verbose-INFO on CASSANDRA-10241 is leading to some confusion and
> >> we should fix this.
> >>
> >> However, we cannot simply turn debug.log off as during CASSANDRA-10241
> >> some verbose-but-useful-info-logs, such as flush information were
> >> changed from INFO to DEBUG, and since the patch has been in for nearly
> >> 3 years it's probably non-revertable. Furthermore, the practice of
> >> using the DEBUG level for logging non-debug stuff has been in our
> >> Logging Guidelines
> >> (https://wiki.apache.org/cassandra/LoggingGuidelines) since then, so
> >> there is probably useful DEBUG stuff that would need to be turned into
> >> INFO if we get rid of debug.log.
> >>
> >> For this reason I'm more in favor of converting the debug.log into
> >> async/verbose_system.log as suggested by Jeremiah and use a marker to
> >> direct these logs (former DEBUG level logs) to that log instead.
> >> Nevertheless, if the majority prefers to get back to a single
> >> system.log file and get rid of debug.log/verbose_system.log altogether
> >> then we would need to go through all log usages and readjust them to
> >> use the proper logging levels and update our logging guidelines to
> >> reflect whatever new policy is decided, not only disabling debug.log
> >> and call it a day.
> >>
> >> 2018-03-19 12:02 GMT-03:00 Jeremiah D Jordan :
> >>> People seem hung up on DEBUG here.  The goal of CASSANDRA-10241 was
> >>> to clean up the system.log so that it a very high “signal” in terms of 
> >>> what was logged
> >>> to it synchronously, but without reducing the ability of the logs to 
> >>> allow people to
> >>> solve problems and perform post mortem analysis of issues.  We have 
> >>> informational
> >>> log messages that are very useful to understanding the state of things, 
> >>> like compaction
> >>> status, repair status, flushing, or the state of gossip in the system 
> >>> that are very useful to
> >>> operators, but if they are all in the system.log make said log file 
> >>> harder to look over for
> >>> issues.  In 10241 the method chosen for how to keep these log messages 
> >>> around by
> >>> default, but get them out of the system.log was that these messages were 
> >>> changed from
> >>> INFO to DEBUG and the new debug.log was created.
> >>>
> >>> From the discussion here it seems that many would like to change how this 
> >>> works.  Rather
> >>> than just turning off the debug.log I would propose that we switch to 
> >>> using the SLF4J
> >>> MARKER[1] ability to move the messages back to INFO but tag them as 
> >>> belonging to
> >>> the asynchronous_system.log rather than the normal system.log.
> >>>
> >>> [1] https://logback.qos.ch/manual/layouts.html#marker 
> >>> 
> >>> https://www.slf4j.org/faq.html#fatal 
> >>> 
> >>>
> >>>
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] java 9 and the future of cassandra on the jdk

2018-03-20 Thread Ariel Weisberg
Hi,

+1 to what Jordan is saying.

It seems like if we are cutting a release off of trunk we want to make sure we 
get N years of supported JDK out of it. For a single LTS release N could be at 
most 3 and historically that isn't long enough and it's very likely we will get 
< 3 after a release is cut.

Going beyond 3 years could be tricky in the worst case because bringing in up 
to 3 years of JDK changes to an older release might mean some of our 
dependencies no longer function and now it's not just minor fixes it's bringing 
in who knows what in terms of updated dependencies.

I think in some cases we are going to need to take a release we have already 
cut and make it work with an LTS release that didn't exist when the release was 
cut.

We also need to update how CI works. We should at least build and run a quick 
smoke test with the JDKs we are claiming to support and asynchronously run all 
the tests on the rather large matrix that now exists.

Ariel

On Tue, Mar 20, 2018, at 11:07 AM, Jeremiah Jordan wrote:
> My suggestion would be to keep trunk on the latest LTS by default, but 
> with compatibility with the latest release if possible.  Since Oracle 
> LTS releases are every 3 years, I would not want to tie us to that 
> release cycle?
> So until Java 11 is out that would mean trunk should work under Java 8, 
> with the option of being compiled/run under Java 9 or 10.  Once Java 11 
> is out we could then switch to 11 only.
> 
> -Jeremiah
> 
> On Mar 20, 2018, at 10:48 AM, Jason Brown  wrote:
> 
> >>> Wouldn't that potentially leave us in a situation where we're ready for
> > a C* release but blocked waiting on a new LTS cut?
> > 
> > Agreed, and perhaps if we're close enough to a LTS release (say three
> > months or less), we could choose to delay (probably with community
> > input/vote). If we're a year or two out, then, no, we should not wait. I
> > think this is what I meant to communicate by "Perhaps we can evaluate this
> > over time." (poorly stated, in hindsight)
> > 
> >> On Tue, Mar 20, 2018 at 7:22 AM, Josh McKenzie  
> >> wrote:
> >> 
> >> Need a little clarification on something:
> >> 
> >>> 2) always release cassandra on a LTS version
> >> combined with:
> >>> 3) keep trunk on the lasest jdk version, assumming we release a major
> >>> cassandra version close enough to a LTS release.
> >> 
> >> Wouldn't that potentially leave us in a situation where we're ready
> >> for a C* release but blocked waiting on a new LTS cut? For example, if
> >> JDK 9 were the currently supported LTS and trunk was on JDK 11, we'd
> >> either have to get trunk to work with 9 or wait for 11 to resolve
> >> that.
> >> 
> >>> On Tue, Mar 20, 2018 at 9:32 AM, Jason Brown  wrote:
> >>> Hi all,
> >>> 
> >>> 
> >>> TL;DR Oracle has started revving the JDK version much faster, and we need
> >>> an agreed upon plan.
> >>> 
> >>> Well, we probably should has this discussion this already by now, but
> >> here
> >>> we are. Oracle announced plans to release updated JDK version every six
> >>> months, and each new version immediate supercedes the previous in all
> >> ways:
> >>> no updates/security fixes to previous versions is the main thing, and
> >>> previous versions are EOL'd immediately. In addition, Oracle has planned
> >>> parallel LTS versions that will live for three years, and then superceded
> >>> by the next LTS; but not immediately EOL'd from what I can tell. Please
> >> see
> >>> [1, 2] for Oracle's offical comments about this change ([3] was
> >>> particularly useful, imo), [4] and many other postings on the internet
> >> for
> >>> discussion/commentary.
> >>> 
> >>> We have a jira [5] where Robert Stupp did most of the work to get us onto
> >>> Java 9 (thanks, Robert), but then the announcement of the JDK version
> >>> changes happened last fall after Robert had done much of the work on the
> >>> ticket.
> >>> 
> >>> Here's an initial proposal of how to move forward. I don't suspect it's
> >>> complete, but a decent place to start a conversation.
> >>> 
> >>> 1) receommend OracleJDK over OpenJDK. IIUC from [3], the OpenJDK will
> >>> release every six months, and the OracleJDK will release every three
> >> years.
> >>> Thus, the OracleJDK is the LTS version, and it just comes from a snapshot
> >>> of one of those OpenJDK builds.
> >>> 
> >>> 2) always release cassandra on a LTS version. I don't think we can
> >>> reasonably expect operators to update the JDK every six months, on time.
> >>> Further, if there are breaking changes to the JDK, we don't want to have
> >> to
> >>> update established c* versions due to those changes, every six months.
> >>> 
> >>> 3) keep trunk on the lasest jdk version, assumming we release a major
> >>> cassandra version close enough to a LTS release. Currently that seems
> >>> reasonable for cassandra 4.0 to be released with java 11 (18.9 LTS)
> >>> support. Perhaps we can evaluate this over time.
> >>> 
> 

Re: Debug logging enabled by default since 2.2

2018-03-18 Thread Ariel Weisberg
Hi,

In a way the real issue might be that we don’t have nightly performance runs 
that would make an accidentally introduced debug statement obvious.

A log statement that runs once or more per read or write should be easy to 
spot. I haven’t measured the impact though. And as a bonus by having this we 
can spot a variety of performance issues introduced by all kinds of changes.

Ariel

> On Mar 18, 2018, at 3:46 PM, Jeff Jirsa  wrote:
> 
> In Cassandra-10241 I said I was torn on this whole ticket, since most people 
> would end up turning it off if it had a negative impact. You said:
> 
> “I'd like to emphasize that we're not talking about turning debug or trace on 
> for client-generated request paths. There's way too much data generated and 
> it's unlikely to be useful.
> What we're proposing is enabling debug logging ONLY for cluster state changes 
> like gossip and schema, and infrequent activities like repair. “
> 
> Clearly there’s a disconnect here - we’ve turned debug logging on for 
> everything and shuffled some stuff to trace, which is a one time action but 
> is hard to protect against regression. In fact, just looking at the read 
> callback shows two instances of debug log in the client request path 
> (exercise for the reader to “git blame”).
> 
> Either we can go clean up all the surprises that leaked through, or we can 
> turn off debug and start backing out some of the changes in 10241. Putting 
> stuff like compaction in the same bucket as digest mismatch and gossip state 
> doesn’t make life materially better for most people.
> 
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Mar 18, 2018, at 11:21 AM, Jonathan Ellis  wrote:
>> 
>> That really depends on whether you're judicious in deciding what to log at
>> debug, doesn't it?
>> 
>> On Sun, Mar 18, 2018 at 12:57 PM, Michael Kjellman 
>> wrote:
>> 
>>> +1. this is how it works.
>>> 
>>> your computer doesn’t run at debug logging by default. your phone doesn’t
>>> either. neither does your smart tv. your database can’t be running at debug
>>> just because it makes our lives as engineers easier.
>>> 
 On Mar 18, 2018, at 5:14 AM, Alexander Dejanovski <
>>> a...@thelastpickle.com> wrote:
 
 It's a tiny bit unusual to turn on debug logging for all users by default
 though, and there should be occasions to turn it on when facing issues
>>> that
 you want to debug (if they can be easily reproduced).
>>> 
>> 
>> 
>> 
>> -- 
>> Jonathan Ellis
>> co-founder, http://www.datastax.com
>> @spyced


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Making RF4 useful aka primary and secondary ranges

2018-03-14 Thread Ariel Weisberg
Hi,

There is a JIRA for decoupling the size of the group size used for consensus 
with level of data redundancy. 
https://issues.apache.org/jira/browse/CASSANDRA-13442

It's been discussed quite a bit offline and I did a presentation on it at NGCC. 
Hopefully we will see some movement on it soon.

Ariel

On Wed, Mar 14, 2018, at 5:40 PM, Carl Mueller wrote:
> Currently there is little use for RF4. You're getting the requirements of
> QUORUM-3 but only one extra backup.
> 
> I'd like to propose something that would make RF4 a sort of more heavily
> backed up RF3.
> 
> A lot of this is probably achievable with strictly driver-level logic, so
> perhaps it would belong more there.
> 
> Basically the idea is to have four replicas of the data, but only have to
> practically do QUORUM with three nodes. We consider the first three
> replicas the "primary replicas". On an ongoing basis for QUORUM reads and
> writes, we would rely on only those three replicas to satisfy
> two-out-of-three QUORUM. Writes are persisted to the fourth replica in the
> normal manner of cassandra, it just doesn't count towards the QUORUM write.
> 
> On reads, with token and node health awareness by the driver, if the
> primaries are all healthy, two-of-three QUORUM is calculated from those.
> 
> If however one of the three primaries is down, read QUORUM is a bit
> different:
> 1) if the first two replies come from the two remaining primaries and
> agree, the is returned
> 2) if the first two replies are a primary and the "hot spare" and those
> agree, that is returned
> 3) if the primary and hot spare disagree, wait for the next primary to
> return, and then take the agreement (hopefully) that results
> 
> Then once the previous primary comes back online, the read quorum goes back
> to preferring that set, with the assuming hinted handoff and repair will
> get it back up to snuff.
> 
> There could also be some mechanism examining the hinted handoff status of
> the four to determine when to reactivate the primary that was down.
> 
> For mutations, one could prefer a "QUORUM plus" that was a quorum of the
> primaries plus the hot spare.
> 
> Of course one could do multiple hot spares, so RF5 could still be treated
> as RF3 + hot spares.
> 
> The goal here is more data resiliency but not having to rely on as many
> nodes for resiliency.
> 
> Since the data is ring-distributed, the fact there are primary owners of
> ranges should still be evenly distributed and no hot nodes should result

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Release votes

2018-02-16 Thread Ariel Weisberg
Hi,

I created https://issues.apache.org/jira/browse/CASSANDRA-14241 for this issue. 
You are right there is a solid chunk of failing tests on Apache infrastructure 
that don't fail on CircleCI. I'll find someone to get it done.

I think that fix before commit is only going to happen if we go all the way and 
route every single commit through testing infrastructure that runs all the 
tests multiple times and refuses to merge commits unless the tests pass 
somewhat consistently. Short of that flakey (and hard failing) tests are going 
to keep creeping in (and even then). That's not feasible without much better 
infrastructure available to everyone and it's not a short term thing RN I 
think. I mean maybe we move forward with it on the Apache infrastructure we 
have.

I 'm not sure flakey infrastructure is what is acutely hurting us although we 
do have infrastructure that exposes unreliable tests although maybe that's just 
a matter of framing.

Dealing with flakey tests generally devolves into picking victim(s) via some 
process. Blocking releases on failing tests is a way of picking the people who 
want the next release as victims. Blocking commits on flakey tests is a way of 
making people who want to merge stuff the victim. Doing nothing is making some 
random subset of volunteers who fix the tests as well as all developers who run 
the tests victims as well as end users to a certain extent. Excluding tests and 
running tests multiple times is picking the end user of releases as the victim.

RE multi-pronged. We are currently using a flaky annotation that reruns tests, 
we have skipped tests with JIRAs, we are are re-running tests right now if they 
fail for certain classes of reasons. So we are currently down that road right 
now. I think it's fine but we need a backpressure mechanism because we can't 
keep accruing this kind of thing forever.

In my mind processes for keeping the tests passing need to provide two 
functions, pick victims(s) (task management), and create backpressure (slow new 
development to match defect rate). It seems possible to create backpressure by 
blocking releases, but that fails to pick victims to an extent. Many people 
running C* are so far behind they aren't waiting on that next release. Or they 
are accustomed to running a private fork and backporting. When we were able to 
block commits via informal process I think it helped, but an informal process 
has limitations.

I think blocking commits via automation is going to spread the load out most 
evenly and make it a priority for everyone in the contributor base. We have 16 
apache nodes to work with which I think would handle our current commit load. 
We can fine tune criteria for blocking commits as we go.

I don't have an answer for how we backpressure the utilization of flakey 
annotations and re-running tests. Maybe it's a czar saying no commits until we 
reach some goal done on a period (every 3 months). Maybe we vote on it 
periodically. Czars can be really effective in moving the herd. The Czar does 
need to be able to wield something to motivate some set of contributors to do 
the work. It's not so much about preventing the commits as it is signaling 
unambiguously that this is what we are working on now and if you aren't you are 
working on the wrong thing. It ends up being quite depressing though when you 
end up working through significant amounts of tech debt all at once. It hurts 
less when you have a lot of people working on it.

Ariel

On Thu, Feb 15, 2018, at 6:48 PM, kurt greaves wrote:
> It seems there has been a bit of a slip in testing as of recently, mostly
> due to the fact that there's no canonical testing environment that isn't
> flaky. We probably need to come up with some ideas and a plan on how we're
> going to do testing in the future, and how we're going to make testing
> accessible for all contributors. I think this is the only way we're really
> going to change behaviour. Having an incredibly tedious process and then
> being aggressive about it only leads to resentment and workarounds.
> 
> I'm completely unsure of where dtests are at since the conversion to
> pytest, and there's a lot of failing dtests on the ASF jenkins jobs (which
> appear to be running pytest). As there's currently not a lot of visibility
> into what people are doing with CircleCI for this it's hard to say if
> things are better over there. I'd like to help here if anyone wants to fill
> me in.
> 
> On 15 February 2018 at 21:14, Josh McKenzie  wrote:
> 
> > >
> > > We’ve said in the past that we don’t release without green tests. The PMC
> > > gets to vote and enforce it. If you don’t vote yes without seeing the
> > test
> > > results, that enforces it.
> >
> > I think this is noble and ideal in theory. In practice, the tests take long
> > enough, hardware infra has proven flaky enough, and the tests *themselves*
> > flaky enough, that there's been a consistent low-level of test failure
> > noise 

Re: CASSANDRA-14183 review request -> logback upgrade to fix CVE

2018-02-13 Thread Ariel Weisberg
Hi,

Option 4, upgrade trunk, update NEWS.TXT in prior versions warning about the 
vulnerability.

Ariel

On Tue, Feb 13, 2018, at 12:28 PM, Ariel Weisberg wrote:
> Hi,
> 
> So our options are:
> 
> 1. Ignore it.
> Most people aren't using this functionality.
> Most people aren't and shouldn't be exposing the logging port to 
> untrusted networks
> But everyone loses at defense in depth (or is it breadth) if they use 
> this functionality and someone might expose the port
> 
> 2. Remove the offending classes from the 1.1.10 jar
> My crazy idea, break it, but only for the people using the vulnerable 
> functionality. Possibly no one, but probably someone. Maybe they can 
> upgrade it manually for their usage?
> This also has an issue when working with maven.
> 
> 3. Upgrade it
> Definitely going to break some apps according to Michael Shuler. 
> Happened when he tried it.
> 
> Certainly we can upgrade in trunk? While we are at it come up to the 
> latest version.
> 
> Ariel
> 
> On Tue, Feb 13, 2018, at 12:03 PM, Ariel Weisberg wrote:
> > Hi,
> > 
> > I don't think the fix is in 1.1.11 looking at the diff between 1.1.11 
> > and 1.2.0 https://github.com/qos-ch/logback/compare/v_1.1.11...v_1.2.0
> > 
> > I looked at 1.1.11 and 1.1.10 and didn't see it there either.
> > 
> > When you say stuff broke do you mean stuff not in the dtests or utests?
> > 
> > Ariel
> > 
> > On Tue, Feb 13, 2018, at 11:57 AM, Michael Shuler wrote:
> > > I tried a logback 1.2.x jar update a number of months ago to fix the
> > > broken log rotation (try setting rotation to a large number - you'll
> > > find you only get I think it was 10 files, regardless of setting).
> > > 
> > > Like we've found updating other jars in the past, this seemingly
> > > "simple" update broke a number of application components, so we rolled
> > > it back and worked out another log rotation method.
> > > 
> > > Looking at the logback changelog, I cannot tell if version 1.1.11 is
> > > fixed for this, or if that might be less breakage? There are a pretty
> > > significant number of API-looking changes from 1.1.3 to 1.2.3, so I do
> > > not wish to break other user's applications, as I have experienced.
> > > 
> > > I do not think this should block the current releases, unless someone
> > > wants to do some significant testing and user outreach for tentatively
> > > breaking their applications.
> > > 
> > > -- 
> > > Michael
> > > 
> > > On 02/13/2018 10:48 AM, Jason Brown wrote:
> > > > Ariel,
> > > > 
> > > > If this is a legit CVE, then we would want to patch all the current
> > > > versions we support - which is 2.1 and higher.
> > > > 
> > > > Also, is this worth stopping the current open vote for this patch? (Not 
> > > > in
> > > > a place to look at the patch and affects to impacted branches right 
> > > > now).
> > > > 
> > > > Jason
> > > > 
> > > > On Tue, Feb 13, 2018 at 08:43 Ariel Weisberg <ar...@weisberg.ws> wrote:
> > > > 
> > > >> Hi,
> > > >>
> > > >> Seems like users could conceivably be using the vulnerable component. 
> > > >> Also
> > > >> seems like like we need potentially need to do this as far back as 2.1?
> > > >>
> > > >> Anyone else have an opinion before I commit this? What version to start
> > > >> from?
> > > >>
> > > >> Ariel
> > > >>
> > > >> On Tue, Feb 13, 2018, at 5:59 AM, Thiago Veronezi wrote:
> > > >>> Hi dev team,
> > > >>>
> > > >>> Sorry to keep bothering you.
> > > >>>
> > > >>> This is just a friendly reminder that I would like to contribute to 
> > > >>> this
> > > >>> project starting with a fix for CASSANDRA-14183
> > > >>> <https://issues.apache.org/jira/browse/CASSANDRA-14183>.
> > > >>>
> > > >>> []s,
> > > >>> Thiago.
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Tue, Jan 30, 2018 at 8:05 AM, Thiago Veronezi <thi...@veronezi.org>
> > > >>> wrote:
> > > >>>
> > > >>>> Hi dev team,
> > > >>>>
> > > >>>> Can one of you guys

Re: CASSANDRA-14183 review request -> logback upgrade to fix CVE

2018-02-13 Thread Ariel Weisberg
Hi,

So our options are:

1. Ignore it.
Most people aren't using this functionality.
Most people aren't and shouldn't be exposing the logging port to untrusted 
networks
But everyone loses at defense in depth (or is it breadth) if they use this 
functionality and someone might expose the port

2. Remove the offending classes from the 1.1.10 jar
My crazy idea, break it, but only for the people using the vulnerable 
functionality. Possibly no one, but probably someone. Maybe they can upgrade it 
manually for their usage?
This also has an issue when working with maven.

3. Upgrade it
Definitely going to break some apps according to Michael Shuler. Happened when 
he tried it.

Certainly we can upgrade in trunk? While we are at it come up to the latest 
version.

Ariel

On Tue, Feb 13, 2018, at 12:03 PM, Ariel Weisberg wrote:
> Hi,
> 
> I don't think the fix is in 1.1.11 looking at the diff between 1.1.11 
> and 1.2.0 https://github.com/qos-ch/logback/compare/v_1.1.11...v_1.2.0
> 
> I looked at 1.1.11 and 1.1.10 and didn't see it there either.
> 
> When you say stuff broke do you mean stuff not in the dtests or utests?
> 
> Ariel
> 
> On Tue, Feb 13, 2018, at 11:57 AM, Michael Shuler wrote:
> > I tried a logback 1.2.x jar update a number of months ago to fix the
> > broken log rotation (try setting rotation to a large number - you'll
> > find you only get I think it was 10 files, regardless of setting).
> > 
> > Like we've found updating other jars in the past, this seemingly
> > "simple" update broke a number of application components, so we rolled
> > it back and worked out another log rotation method.
> > 
> > Looking at the logback changelog, I cannot tell if version 1.1.11 is
> > fixed for this, or if that might be less breakage? There are a pretty
> > significant number of API-looking changes from 1.1.3 to 1.2.3, so I do
> > not wish to break other user's applications, as I have experienced.
> > 
> > I do not think this should block the current releases, unless someone
> > wants to do some significant testing and user outreach for tentatively
> > breaking their applications.
> > 
> > -- 
> > Michael
> > 
> > On 02/13/2018 10:48 AM, Jason Brown wrote:
> > > Ariel,
> > > 
> > > If this is a legit CVE, then we would want to patch all the current
> > > versions we support - which is 2.1 and higher.
> > > 
> > > Also, is this worth stopping the current open vote for this patch? (Not in
> > > a place to look at the patch and affects to impacted branches right now).
> > > 
> > > Jason
> > > 
> > > On Tue, Feb 13, 2018 at 08:43 Ariel Weisberg <ar...@weisberg.ws> wrote:
> > > 
> > >> Hi,
> > >>
> > >> Seems like users could conceivably be using the vulnerable component. 
> > >> Also
> > >> seems like like we need potentially need to do this as far back as 2.1?
> > >>
> > >> Anyone else have an opinion before I commit this? What version to start
> > >> from?
> > >>
> > >> Ariel
> > >>
> > >> On Tue, Feb 13, 2018, at 5:59 AM, Thiago Veronezi wrote:
> > >>> Hi dev team,
> > >>>
> > >>> Sorry to keep bothering you.
> > >>>
> > >>> This is just a friendly reminder that I would like to contribute to this
> > >>> project starting with a fix for CASSANDRA-14183
> > >>> <https://issues.apache.org/jira/browse/CASSANDRA-14183>.
> > >>>
> > >>> []s,
> > >>> Thiago.
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Jan 30, 2018 at 8:05 AM, Thiago Veronezi <thi...@veronezi.org>
> > >>> wrote:
> > >>>
> > >>>> Hi dev team,
> > >>>>
> > >>>> Can one of you guys take a look on this jira ticket?
> > >>>> https://issues.apache.org/jira/browse/CASSANDRA-14183
> > >>>>
> > >>>> It has an a patch available for a known security issue with one of the
> > >>>> dependencies. It has only with trivial code changes. It should be
> > >>>> straightforward to review it. Any feedback is very welcome.
> > >>>>
> > >>>> Thanks,
> > >>>> Thiago
> > >>>>
> > >>
> > >> -
> > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>
> > >>
> > > 
> > 
> > 
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: CASSANDRA-14183 review request -> logback upgrade to fix CVE

2018-02-13 Thread Ariel Weisberg
Hi,

I don't think the fix is in 1.1.11 looking at the diff between 1.1.11 and 1.2.0 
https://github.com/qos-ch/logback/compare/v_1.1.11...v_1.2.0

I looked at 1.1.11 and 1.1.10 and didn't see it there either.

When you say stuff broke do you mean stuff not in the dtests or utests?

Ariel

On Tue, Feb 13, 2018, at 11:57 AM, Michael Shuler wrote:
> I tried a logback 1.2.x jar update a number of months ago to fix the
> broken log rotation (try setting rotation to a large number - you'll
> find you only get I think it was 10 files, regardless of setting).
> 
> Like we've found updating other jars in the past, this seemingly
> "simple" update broke a number of application components, so we rolled
> it back and worked out another log rotation method.
> 
> Looking at the logback changelog, I cannot tell if version 1.1.11 is
> fixed for this, or if that might be less breakage? There are a pretty
> significant number of API-looking changes from 1.1.3 to 1.2.3, so I do
> not wish to break other user's applications, as I have experienced.
> 
> I do not think this should block the current releases, unless someone
> wants to do some significant testing and user outreach for tentatively
> breaking their applications.
> 
> -- 
> Michael
> 
> On 02/13/2018 10:48 AM, Jason Brown wrote:
> > Ariel,
> > 
> > If this is a legit CVE, then we would want to patch all the current
> > versions we support - which is 2.1 and higher.
> > 
> > Also, is this worth stopping the current open vote for this patch? (Not in
> > a place to look at the patch and affects to impacted branches right now).
> > 
> > Jason
> > 
> > On Tue, Feb 13, 2018 at 08:43 Ariel Weisberg <ar...@weisberg.ws> wrote:
> > 
> >> Hi,
> >>
> >> Seems like users could conceivably be using the vulnerable component. Also
> >> seems like like we need potentially need to do this as far back as 2.1?
> >>
> >> Anyone else have an opinion before I commit this? What version to start
> >> from?
> >>
> >> Ariel
> >>
> >> On Tue, Feb 13, 2018, at 5:59 AM, Thiago Veronezi wrote:
> >>> Hi dev team,
> >>>
> >>> Sorry to keep bothering you.
> >>>
> >>> This is just a friendly reminder that I would like to contribute to this
> >>> project starting with a fix for CASSANDRA-14183
> >>> <https://issues.apache.org/jira/browse/CASSANDRA-14183>.
> >>>
> >>> []s,
> >>> Thiago.
> >>>
> >>>
> >>>
> >>> On Tue, Jan 30, 2018 at 8:05 AM, Thiago Veronezi <thi...@veronezi.org>
> >>> wrote:
> >>>
> >>>> Hi dev team,
> >>>>
> >>>> Can one of you guys take a look on this jira ticket?
> >>>> https://issues.apache.org/jira/browse/CASSANDRA-14183
> >>>>
> >>>> It has an a patch available for a known security issue with one of the
> >>>> dependencies. It has only with trivial code changes. It should be
> >>>> straightforward to review it. Any feedback is very welcome.
> >>>>
> >>>> Thanks,
> >>>> Thiago
> >>>>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>
> > 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: CASSANDRA-14183 review request -> logback upgrade to fix CVE

2018-02-13 Thread Ariel Weisberg
Hi,

Seems like users could conceivably be using the vulnerable component. Also 
seems like like we need potentially need to do this as far back as 2.1?

Anyone else have an opinion before I commit this? What version to start from?

Ariel

On Tue, Feb 13, 2018, at 5:59 AM, Thiago Veronezi wrote:
> Hi dev team,
> 
> Sorry to keep bothering you.
> 
> This is just a friendly reminder that I would like to contribute to this
> project starting with a fix for CASSANDRA-14183
> .
> 
> []s,
> Thiago.
> 
> 
> 
> On Tue, Jan 30, 2018 at 8:05 AM, Thiago Veronezi 
> wrote:
> 
> > Hi dev team,
> >
> > Can one of you guys take a look on this jira ticket?
> > https://issues.apache.org/jira/browse/CASSANDRA-14183
> >
> > It has an a patch available for a known security issue with one of the
> > dependencies. It has only with trivial code changes. It should be
> > straightforward to review it. Any feedback is very welcome.
> >
> > Thanks,
> > Thiago
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Cassandra 7544 configurable storage port and you

2018-02-09 Thread Ariel Weisberg

Hi,

I want to bring up some important changes to how addresses should be handled in 
Cassandra after 7544. As of 7544 a Cassandra instance (sometimes referred to as 
a node in the code) can't be identified by an InetAddress. It can't be 
identified for the purposes of the internal storage communication, and also for 
the purpose of the client port used for CQL.

InetAddressAndPort is the class we are using and that is almost always a 
reference to the storage port and it will supply the port from the yaml file if 
none is provided/available. InetSocketAddress is the type we used to refer to 
the native protocol address in most places as well as to interface with outside 
libraries like Netty or Java when opening connections although there are are 
few places where it is an InetAddressAndPort so you will need to look for 
variables named rpc address, or native address to know.

Using just an InetAddress for comparison is almost always wrong. Unwrapping the 
address from an InetAddressAndPort is almost always wrong. If you find a 
comparison where one side is an InetAddressAndPort and the other is an 
InetAddress it's a bug and there is probably a port available for you construct 
an InetAddressAndPort. In limited circumstances there might not be a port 
available such as when talking to a prior version node during rolling upgrade 
in which case you can let InetAddressAndPort supply the default from the YAML.

In FBUtilities you should pretty much never use getJustLocalAddress(), 
getJustBroadcastAddress() or getJustBroadcastNativeAddress(). You want the 
AndPort version as that will have the port specific to the instance.

I created https://issues.apache.org/jira/browse/CASSANDRA-14226 to add more 
documentation to the code to cover this a little better.

Regards,
Ariel

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Soliciting volunteers for flaky dtests on trunk

2017-05-17 Thread Ariel Weisberg
Hi,

Thank you Blake, Lerh Chuan Low, Jason, and Kurt, and anyone else who
volunteered.

I'm going to look at repair_test.TestRepair which is not quite the same
as repair_test.incremental_repair test which Blake is looking at. 

The one remaining somewhat high pole in the tent is
cqlsh_tests.CqlshSmokeTest.

Thanks,
Ariel

On Thu, May 11, 2017, at 01:12 PM, Jason Brown wrote:
> I've taken
> CASSANDRA-13507
> CASSANDRA-13517
> 
> -Jason
> 
> 
> On Wed, May 10, 2017 at 9:45 PM, Lerh Chuan Low <l...@instaclustr.com>
> wrote:
> 
> > I'll try my hand on https://issues.apache.org/jira/browse/CASSANDRA-13182.
> >
> > On 11 May 2017 at 05:59, Blake Eggleston <beggles...@apple.com> wrote:
> >
> > > I've taken CASSANDRA-13194, CASSANDRA-13506, CASSANDRA-13515,
> > > and CASSANDRA-13372 to start
> > >
> > > On May 10, 2017 at 12:44:47 PM, Ariel Weisberg (ar...@weisberg.ws)
> > wrote:
> > >
> > > Hi,
> > >
> > > The dev list murdered my rich text formatted email. Here it is
> > > reformatted as plain text.
> > >
> > > The unit tests are looking pretty reliable right now. There is a long
> > > tail of infrequently failing tests but it's not bad and almost all
> > > builds succeed in the current build environment. In CircleCI it seems
> > > like unit tests might be a little less reliable, but still usable.
> > >
> > > The dtests on the other hand aren't producing clean builds yetl. There
> > > is also a pretty diverse set of failing tests.
> > >
> > > I did a bit of triaging of the flakey dtests. I started by cataloging
> > > everything, but what I found is that the long tail of flakey dtests is
> > > very long indeed so I narrowed focus to just the top frequently failing
> > > tests for now. See https://goo.gl/b96CdO
> > >
> > > I created spreadsheet with some of the failing tests. Links to JIRA,
> > > last time the test was seen failing, and how many failures I found in
> > > Apache Jenkins across the 3 dtest builds. There are a lot of failures
> > > not listed. There would be 50+ entries if I cataloged each one.
> > >
> > > There are two hard failing tests, but both are already moving along:
> > > CASSANDRA-13229 (Ready to commit, assigned Alex Petrov, Paulo Motta
> > > reviewing, last updated April 2017) dtest failure in
> > > topology_test.TestTopology.size_estimates_multidc_test
> > > CASSANDRA-13113 (Ready to commit, assigned Alex Petrov, Sam T Reviewing,
> > > last updated March 2017) test failure in
> > > auth_test.TestAuth.system_auth_ks_is_alterable_test
> > >
> > > I think the tests we should tackle first are on this sheet in priority
> > > order https://goo.gl/S3khv1
> > >
> > > Suite: bootstrap_test
> > > Test: TestBootstrap.simultaneous_bootstrap_test
> > > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13506
> > > Last failure: 5/5/2017
> > > Counted failures: 45
> > >
> > > Suite: repair_test
> > > Test: incremental_repair_test.TestIncRepair.compaction_test
> > > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13194
> > > Last failure: 5/4/2017
> > > Counted failures: 44
> > >
> > > Suite: sstableutil_test
> > > Test: SSTableUtilTest.compaction_test
> > > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13182
> > > Last failure: 5/4/2017
> > > Counted failures: 35
> > >
> > > Suite: paging_test
> > > Test: TestPagingWithDeletions.test_ttl_deletions
> > > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13507
> > > Last failure: 4/25/2017
> > > Counted failures: 31
> > >
> > > Suite: repair_test
> > > Test: incremental_repair_test.TestIncRepair.multiple_repair_test
> > > JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13515
> > > Last failed: 5/4/2017
> > > Counted failures: 18
> > >
> > > Suite: cqlsh_tests
> > > Test: cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip_*
> > > JIRA:
> > > https://issues.apache.org/jira/issues/?jql=project%20%
> > > 3D%20CASSANDRA%20AND%20status%20in%20(Open%2C%20%22In%
> > > 20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22%
> > > 2C%20%22Ready%20to%20Commit%22%2C%20%22Awaiting%
> > > 20Feedback%22)%20AND%20text%20~%20%22CqlshCopyTest%22
> > > Last failed: 5/8/2017
> > > Counted failures: 23
> > >
> > > Suite: paxos_tests
> > > T

Re: Soliciting volunteers for flaky dtests on trunk

2017-05-10 Thread Ariel Weisberg
Hi,

The dev list murdered my rich text formatted email. Here it is
reformatted as plain text.

The unit tests are looking pretty reliable right now. There is a long
tail of infrequently failing tests but it's not bad and almost all
builds succeed in the current build environment. In CircleCI it seems
like unit tests might be a little less reliable, but still usable.

The dtests on the other hand aren't producing clean builds yetl. There
is also a pretty diverse set of failing tests.

I did a bit of triaging of the flakey dtests. I started by cataloging
everything, but what I found is that the long tail of flakey dtests is
very long indeed so I narrowed focus to just the top frequently failing
tests for now. See https://goo.gl/b96CdO

I created spreadsheet with some of the failing tests. Links to JIRA,
last time the test was seen failing, and how many failures I found in
Apache Jenkins across the 3 dtest builds. There are a lot of failures
not listed. There would be 50+ entries if I cataloged each one.

There are two hard failing tests, but both are already moving along:
CASSANDRA-13229 (Ready to commit, assigned Alex Petrov, Paulo Motta
reviewing, last updated April 2017) dtest failure in
topology_test.TestTopology.size_estimates_multidc_test
CASSANDRA-13113 (Ready to commit, assigned Alex Petrov, Sam T Reviewing,
last updated March 2017)   test failure in
auth_test.TestAuth.system_auth_ks_is_alterable_test

I think the tests we should tackle first are on this sheet in priority
order https://goo.gl/S3khv1

Suite: bootstrap_test
Test: TestBootstrap.simultaneous_bootstrap_test
JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13506
Last failure: 5/5/2017
Counted failures: 45

Suite: repair_test
Test: incremental_repair_test.TestIncRepair.compaction_test
JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13194
Last failure: 5/4/2017
Counted failures: 44

Suite: sstableutil_test
Test: SSTableUtilTest.compaction_test
JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13182
Last failure: 5/4/2017
Counted failures: 35

Suite: paging_test
Test: TestPagingWithDeletions.test_ttl_deletions
JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13507
Last failure: 4/25/2017
Counted failures: 31

Suite: repair_test
Test: incremental_repair_test.TestIncRepair.multiple_repair_test
JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13515
Last failed: 5/4/2017
Counted failures: 18

Suite: cqlsh_tests
Test: cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip_*
JIRA:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22%2C%20%22Ready%20to%20Commit%22%2C%20%22Awaiting%20Feedback%22)%20AND%20text%20~%20%22CqlshCopyTest%22
Last failed: 5/8/2017
Counted failures: 23

Suite: paxos_tests
Test: TestPaxos.contention_test_many_threads
JIRA: https://issues.apache.org/jira/browse/CASSANDRA-13517
Last failed: 5/8/2017
Counted failures: 15

Suite: repair_test
Test: TestRepair
JIRA:
https://issues.apache.org/jira/issues/?jql=status%20%3D%20Open%20AND%20text%20~%20%22dtest%20failure%20repair_test%22
Last failure: 5/4/2017
Comment: No one test fails a lot but the number of failing tests is
substantial

Suite: cqlsh_tests
Test: cqlsh_tests.CqlshSmokeTest.[test_insert | test_truncate |
test_use_keyspace | test_create_keyspace]
JIRA: No JIRA yet
Last failed: 4/22/2017
count: 6

If you have spare cycles you can make a huge difference in test
stability by picking off one of these.

Regards,
Ariel

On Wed, May 10, 2017, at 12:45 PM, Ariel Weisberg wrote:
> Hi all,
> 
> The unit tests are looking pretty reliable right now. There is a long
> tail of infrequently failing tests but it's not bad and almost all
> builds succeed in the current build environment. In CircleCI it seems
> like unit tests might be a little less reliable, but still usable.
> The dtests on the other hand aren't producing clean builds yetl. There
> is also a pretty diverse set of failing tests.
> I did a bit of triaging of the flakey dtests. I started by cataloging
> everything, but what I found is that the long tail of flakey dtests is
> very long indeed so I narrowed focus to just the top frequently failing
> tests for now. See https://goo.gl/b96CdO
> I created spreadsheet with some of the failing tests. Links to JIRA,
> last time the test was seen failing, and how many failures I found in
> Apache Jenkins across the 3 dtest builds. There are a lot of failures
> not listed. There would be  50+ entries if I cataloged each one.
> There are two hard failing tests, but both are already moving along:
> CASSANDRA-13229 (Ready to commit, assigned Alex Petrov, Paulo Motta
> reviewing, last updated April 2017)  dtest failure in
> topology_test.TestTopology.size_estimates_multidc_testCASSANDRA-13113
> (Ready to commit, assigned Alex Petrov, Sam T Reviewing,
> last updated March 2017) test fa

Soliciting volunteers for flaky dtests on trunk

2017-05-10 Thread Ariel Weisberg
Hi all,

The unit tests are looking pretty reliable right now. There is a long
tail of infrequently failing tests but it's not bad and almost all
builds succeed in the current build environment. In CircleCI it seems
like unit tests might be a little less reliable, but still usable.
The dtests on the other hand aren't producing clean builds yetl. There
is also a pretty diverse set of failing tests.
I did a bit of triaging of the flakey dtests. I started by cataloging
everything, but what I found is that the long tail of flakey dtests is
very long indeed so I narrowed focus to just the top frequently failing
tests for now. See https://goo.gl/b96CdO
I created spreadsheet with some of the failing tests. Links to JIRA,
last time the test was seen failing, and how many failures I found in
Apache Jenkins across the 3 dtest builds. There are a lot of failures
not listed. There would be  50+ entries if I cataloged each one.
There are two hard failing tests, but both are already moving along:
CASSANDRA-13229 (Ready to commit, assigned Alex Petrov, Paulo Motta
reviewing, last updated April 2017)  dtest failure in
topology_test.TestTopology.size_estimates_multidc_testCASSANDRA-13113 (Ready to 
commit, assigned Alex Petrov, Sam T Reviewing,
last updated March 2017) test failure in
auth_test.TestAuth.system_auth_ks_is_alterable_test
I think the tests we should tackle first are on this sheet in priority
order https://goo.gl/S3khv1
Suite Test JIRA Last failure Counted failures Status Assigned Reviewer
Comments bootstrap_test TestBootstrap.simultaneous_bootstrap_test
https://issues.apache.org/jira/browse/CASSANDRA-13506
 5/5/2017 45 Open



repair_test incremental_repair_test.TestIncRepair.compaction_test
https://issues.apache.org/jira/browse/CASSANDRA-13194
 5/4/2017 44 Open



sstableutil_test SSTableUtilTest.compaction_test
https://issues.apache.org/jira/browse/CASSANDRA-[1]13182
 5/4/2017 35 Open



paging_test TestPagingWithDeletions.test_ttl_deletions
https://issues.apache.org/jira/browse/CASSANDRA-[2]13507
4/25/2017 31 Open



repair_test incremental_repair_test.TestIncRepair.multiple_repair_test
https://issues.apache.org/jira/browse/CASSANDRA-[3]13515
 5/4/2017 18 Open



cqlsh_tests cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip_*
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22%2C%20%22Ready%20to%20Commit%22%2C%20%22Awaiting%20Feedback%22)%20AND%20text%20~%20%22CqlshCopyTest%22
 5/8/2017 23




paxos_tests TestPaxos.contention_test_many_threads
https://issues.apache.org/jira/browse/CASSANDRA-[4]13517
 5/8/2017 15 Open



repair_test TestRepair
https://issues.apache.org/jira/issues/?jql=status%20%3D%20Open%20AND%20text%20~%20%22dtest%20failure%20repair_test%22
 5/4/2017




No one test fails a lot but the number of failing tests is substantial
cqlsh_tests cqlsh_tests.CqlshSmokeTest.[test_insert | test_truncate |
test_use_keyspace | test_create_keyspace]

4/22/2017 6
If you have spare cycles you can make a huge difference in test
stability by picking off one of these.
Regards,
Ariel

Links:

  1. https://issues.apache.org/jira/browse/CASSANDRA-13194
  2. https://issues.apache.org/jira/browse/CASSANDRA-13194
  3. https://issues.apache.org/jira/browse/CASSANDRA-13194
  4. https://issues.apache.org/jira/browse/CASSANDRA-13194


Re: Proposal: push TravisCI and CircleCI yaml files into the active branches

2017-03-29 Thread Ariel Weisberg
Hi,

I think we should support something that we can scale out and that can
produce consistent reproducible results.

One of the longer term issues is going to be different CI systems
producing different sets of failures. It's not going to happen often but
it will and it's one way flakiness will sneak in.

There is an argument for settling on a single platform in that respect.
If queue time are an issue and we can get more resources running
multiple platforms then a different tradeoff will make sense.

I have a desire to run my own CI box that can parallelize with
containers, but I admit it's pretty niche use case and not worth much
investment or overhead.

Ariel

On Tue, Mar 28, 2017, at 08:52 PM, Jeff Jirsa wrote:
> To further our recent conversations on testing, I've opened the following
> JIRA:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-13388
> 
> This should allow us to do SOME amount of automated testing for
> developers
> without jumping through a ton of hoops. If the free plans prove to be
> insufficient, we can look into using the ASF's paid plan, or find other
> solutions.
> 
> If anyone has a reason that they feel this is inappropriate, speak up.
> Otherwise, I'll try to put together a few yaml files this week to commit
> soon'ish.
> 
> Here's an example circle run that just kicks off unit tests on one of my
> personal branches - https://circleci.com/gh/jeffjirsa/cassandra/35 -
> nothing fancy, but at least it gives us the junit/testall run to show us
> that we had one failure ( testDroppedMessages -
> org.apache.cassandra.net.MessagingServiceTest ). It appears that we can
> parallelize dtests and run dtests as a series of small tasks as well - we
> may be able to get jacoco to run within the time limit, if we split up
> unit
> tests a bit.
> 
> We can have this conversation on the list or on the JIRA, but does anyone
> have a great reason why we SHOULDN'T support one or both of these
> vendors,
> or alternatively, is there another vendor we should consider instead?
> 
> - Jeff


Re: [DISCUSS] Implementing code quality principles, and rules (was: Code quality, principles and rules)

2017-03-28 Thread Ariel Weisberg
Hi,

Code coverage:

I value code coverage as part of the review and development process. As
a project wide target I think it's not as high value, but having a
standard encourages people to take the time to use the tools and that's
a healthy side effect. Code coverage is a measure of code execution not
whether that code returned the correct answer or generated the right
behavior. We should also document what kind of code coverage is valuable
and how to distinguish between coverage that tests vs merely exercises
functionality.

Tests always passing:

> 2. If a patch has a failing test (test touching the same code path), the code 
> or test should be fixed prior to being accepted.

The reality of tests is that they will degrade over time unless you have
very strict policies on committing that mean no one can commit unless
the tests are consistently passing. We have to do away with, well it's
failing on trunk, but my branch doesn't make it worse, or that test
failure looks unrelated.

I think it's all about picking the poor sod who is going to have to drop
what their are doing and commit to fixing a test.

If you give people any kind of out other than doing the work that is
unrelated to their main objective it's going to happen an unfortunate
amount of the time. That's not a negative statement about our particular
group it's just the economics of the process. What we have so far is a
good high level statement about the desired outcome, but what we don't
have is an incentive structure to make that outcome a reality.

I'm not particularly attached to any specific incentive structure as
long as it's convincing that some people will not be able to avoid
working on technical debt associated with tests and it spreads the load
out in some reasonable way.

I think we should have automated indication of whether all branches are
open to commit. Just an indicator. It doesn't have to (shouldn't?)
physically enforce it. That encourages us to express the policy in a
clear unambiguous way, and it encourages us to build tests and CI to
that policy.

It's not perfect. It's not going to distribute the work fairly, but it
will at least get it done. It's also going to be super annoying and
blocking as you have to artificially delay committing while you wait on
someone to fix a test. You wish you could just commit once fixing the
test has been assigned, but then some of the time the test will remain
broken indefinitely and you are back where you started. A certain amount
of pain spread wide is what makes things happen sadly.

This is one incentive structure that has people stop what they are doing
and what they care about to work on the thing that they don't really
care about at the moment. There are other strategies for picking a
victim and making it someone's problem so they get fixed, but as an OSS
project it's really going to come down to who wants to commit the most
that will fix the tests. Telling people to do stuff doesn't work so hot.

Alternatively we can do what we have always done which is somewhat
ignore the tests failing (shout out to those working diligently to fix
them in the background), and then just block releases on the tests. I
don't like this approach because it makes it hard to hit release dates
consistently and the longer a test remains broken the harder it is to
suss out who broke it and knows how to fix it. It also introduces
overhead for ongoing development intra-release as every developer has to
parse the current set of failing tests. Overall I see it as inefficient.

 > A recurring failing test carries no signal and is better deleted.

Writing tests isn't free. They are just as much part of the end product
as feature. Just because a test doesn't pass doesn't mean it should be
deleted unless it's a particularly bad test or doesn't succeed at
testing what it is supposed to test. Fixing should be cheaper than
writing from scratch. If we delete tests we should have a policy for
justifying why rather than giving people carte blanche (or close enough)
to avoid failing tests by deleting them.

If the cheapest path available is to delete tests it's going to happen
more than it should. It should be expensive enough to delete a test that
it's balanced with the value of fixing it. I just want to make sure
reviewers have enough ammo to push back when they should.

Regards,
Ariel


On Tue, Mar 28, 2017, at 11:53 AM, sankalp kohli wrote:
> If the code coverage goes down or do not go above the required cutoff due
> to adding toString or getter setter like functionality, can we make it a
> process to explain the reason in the JIRA before committing it?
> 
> Regarding changes in hard to unit tests parts of the code, can we make it
> a
> process to link the JIRA which talks about rewriting that component of
> C*.
> Linking JIRAs will help us learn how many changes have gone in with
> little
> or no testing due to this component.
> 
> On Tue, Mar 28, 2017 at 8:38 AM, Jason Brown 
> wrote:
> 
> > 

Re: committing performance patches without performance numbers

2017-03-09 Thread Ariel Weisberg
Hi,

I should clarify. Should in the sense of it was fine for the process we
had at the time (ugh!) but it's not what we should do in the future.

Ariel

On Thu, Mar 9, 2017, at 04:55 PM, Ariel Weisberg wrote:
> Hi,
> 
> I agree that patch could have used it and it was amenable to
> micro-benchmarking. Just to be pedantic about process which is something
> I approve of to the chagrin of so many.
> 
> On a completely related note that change also randomly boxes a boolean.
> 
> Ariel
> 
> On Thu, Mar 9, 2017, at 03:45 PM, Jonathan Haddad wrote:
> > I don't expect everyone to run a 500 node cluster off to the side to test
> > their patches, but at least some indication that the contributor started
> > Cassandra on their laptop would be a good sign.  The JIRA I referenced
> > was
> > an optimization around List, Set and Map serialization.  Would it really
> > have been that crazy to run a handful of benchmarks locally and post
> > those
> > results?
> > 
> > On Thu, Mar 9, 2017 at 12:26 PM Ariel Weisberg <ar...@weisberg.ws> wrote:
> > 
> > > Hi,
> > >
> > > I think there are issues around the availability of hardware sufficient
> > > to demonstrate the performance concerns under test. It's an open source
> > > project without centralized infrastructure. A lot of performance
> > > contributions come from people running C* in production. They are
> > > already running these changes and have seen the improvement, but
> > > communicating that to the outside world in a convincing way can be hard.
> > >
> > > Right now we don't even have performance in continuous integration. I
> > > think we are putting the cart before the horse in that respect. What
> > > about all the commits that don't intend to have a performance impact but
> > > do? Even if we had performance metrics in CI who is going to triage the
> > > results religiously?
> > >
> > > We also to my knowledge don't have benchmarks for key functionality in
> > > cassandra-stress. Can cassandra-stress benchmark CAS? My recollection is
> > > every time I looked is that it wasn't there.
> > >
> > > We can only set the bar as high as contributors are able to meet.
> > > Certainly if they can't justify why they can't benchmark the thing they
> > > want to contribute then reviewers should make them go and benchmark it.
> > >
> > > Regards,
> > > Ariel
> > >
> > > On Thu, Mar 9, 2017, at 03:11 PM, Jeff Jirsa wrote:
> > > > Agree. Anything that's meant to increase performance should demonstrate
> > > > it
> > > > actually does that. We have microbench available in recent versions -
> > > > writing a new microbenchmark isn't all that onerous. Would be great if 
> > > > we
> > > > had perf tests included in the normal testall/dtest workflow for ALL
> > > > patches so we could quickly spot regressions, but that gets pretty
> > > > expensive in terms of running long enough tests to actually see most
> > > > common
> > > > code paths.
> > > >
> > > >
> > > > On Thu, Mar 9, 2017 at 12:00 PM, Jonathan Haddad <j...@jonhaddad.com>
> > > > wrote:
> > > >
> > > > > I'd like to discuss what I consider to be a pretty important matter -
> > > > > patches which are written for the sole purpose of improving 
> > > > > performance
> > > > > without including a single performance benchmark in the JIRA.
> > > > >
> > > > > My original email was in "Testing and Jira Tickets", i'll copy it here
> > > > > for posterity:
> > > > >
> > > > > If you don't mind, I'd like to broaden the discussion a little bit to
> > > also
> > > > > discuss performance related patches.  For instance, CASSANDRA-13271
> > > was a
> > > > > performance / optimization related patch that included *zero*
> > > information
> > > > > on if there was any perf improvement or a regression as a result of 
> > > > > the
> > > > > change, even though I've asked twice for that information.
> > > > >
> > > > > In addition to "does this thing break anything" we should be asking
> > > "how
> > > > > does this patch affect performance?" (and were the appropriate docs
> > > > > included, but that's another topic altogether)
> > > > >
> > > > > There's a minor note about perf related stuff here:
> > > > > http://cassandra.apache.org/doc/latest/development/
> > > > > testing.html#performance-testing
> > > > >
> > > > >
> > > > > "Performance tests for Cassandra are a special breed of tests that are
> > > not
> > > > > part of the usual patch contribution process. In fact you can
> > > contribute
> > > > > tons of patches to Cassandra without ever running performance tests.
> > > They
> > > > > are important however when working on performance improvements, as 
> > > > > such
> > > > > improvements must be measurable."
> > > > >
> > > > > I think performance numbers aren't just important, but should be a
> > > > > *requirement* to merge a performance patch.
> > > > >
> > > > > Thoughts?
> > > > > Jon
> > > > >
> > >


Re: committing performance patches without performance numbers

2017-03-09 Thread Ariel Weisberg
Hi,

I agree that patch could have used it and it was amenable to
micro-benchmarking. Just to be pedantic about process which is something
I approve of to the chagrin of so many.

On a completely related note that change also randomly boxes a boolean.

Ariel

On Thu, Mar 9, 2017, at 03:45 PM, Jonathan Haddad wrote:
> I don't expect everyone to run a 500 node cluster off to the side to test
> their patches, but at least some indication that the contributor started
> Cassandra on their laptop would be a good sign.  The JIRA I referenced
> was
> an optimization around List, Set and Map serialization.  Would it really
> have been that crazy to run a handful of benchmarks locally and post
> those
> results?
> 
> On Thu, Mar 9, 2017 at 12:26 PM Ariel Weisberg <ar...@weisberg.ws> wrote:
> 
> > Hi,
> >
> > I think there are issues around the availability of hardware sufficient
> > to demonstrate the performance concerns under test. It's an open source
> > project without centralized infrastructure. A lot of performance
> > contributions come from people running C* in production. They are
> > already running these changes and have seen the improvement, but
> > communicating that to the outside world in a convincing way can be hard.
> >
> > Right now we don't even have performance in continuous integration. I
> > think we are putting the cart before the horse in that respect. What
> > about all the commits that don't intend to have a performance impact but
> > do? Even if we had performance metrics in CI who is going to triage the
> > results religiously?
> >
> > We also to my knowledge don't have benchmarks for key functionality in
> > cassandra-stress. Can cassandra-stress benchmark CAS? My recollection is
> > every time I looked is that it wasn't there.
> >
> > We can only set the bar as high as contributors are able to meet.
> > Certainly if they can't justify why they can't benchmark the thing they
> > want to contribute then reviewers should make them go and benchmark it.
> >
> > Regards,
> > Ariel
> >
> > On Thu, Mar 9, 2017, at 03:11 PM, Jeff Jirsa wrote:
> > > Agree. Anything that's meant to increase performance should demonstrate
> > > it
> > > actually does that. We have microbench available in recent versions -
> > > writing a new microbenchmark isn't all that onerous. Would be great if we
> > > had perf tests included in the normal testall/dtest workflow for ALL
> > > patches so we could quickly spot regressions, but that gets pretty
> > > expensive in terms of running long enough tests to actually see most
> > > common
> > > code paths.
> > >
> > >
> > > On Thu, Mar 9, 2017 at 12:00 PM, Jonathan Haddad <j...@jonhaddad.com>
> > > wrote:
> > >
> > > > I'd like to discuss what I consider to be a pretty important matter -
> > > > patches which are written for the sole purpose of improving performance
> > > > without including a single performance benchmark in the JIRA.
> > > >
> > > > My original email was in "Testing and Jira Tickets", i'll copy it here
> > > > for posterity:
> > > >
> > > > If you don't mind, I'd like to broaden the discussion a little bit to
> > also
> > > > discuss performance related patches.  For instance, CASSANDRA-13271
> > was a
> > > > performance / optimization related patch that included *zero*
> > information
> > > > on if there was any perf improvement or a regression as a result of the
> > > > change, even though I've asked twice for that information.
> > > >
> > > > In addition to "does this thing break anything" we should be asking
> > "how
> > > > does this patch affect performance?" (and were the appropriate docs
> > > > included, but that's another topic altogether)
> > > >
> > > > There's a minor note about perf related stuff here:
> > > > http://cassandra.apache.org/doc/latest/development/
> > > > testing.html#performance-testing
> > > >
> > > >
> > > > "Performance tests for Cassandra are a special breed of tests that are
> > not
> > > > part of the usual patch contribution process. In fact you can
> > contribute
> > > > tons of patches to Cassandra without ever running performance tests.
> > They
> > > > are important however when working on performance improvements, as such
> > > > improvements must be measurable."
> > > >
> > > > I think performance numbers aren't just important, but should be a
> > > > *requirement* to merge a performance patch.
> > > >
> > > > Thoughts?
> > > > Jon
> > > >
> >


Re: Testing and jira tickets

2017-03-09 Thread Ariel Weisberg
Hi,

Before this change I had already been queuing the jobs myself as a
reviewer. It also happens to be that many reviewers are committers. I
wouldn't ask contributors to run the dtests/utests for any purpose other
then so that they know the submission is done. 

Even if they did and they pass it doesn't matter. It only matters if it
passes in CI. If it fails in CI but passes on their desktop it's not
good enough so we have to run in CI anyways.

If a reviewer is not a committer. Well they can ask someone else to do
it? I know we have issues with responsiveness, but I would make myself
available for that. It shouldn't be a big problem because if someone is
doing a lot of reviews they should be a committer right?

Regards,
Ariel

On Thu, Mar 9, 2017, at 01:51 PM, Jason Brown wrote:
> Hey all,
> 
> A nice convention we've stumbled into wrt to patches submitted via Jira
> is
> to post the results of unit test and dtest runs to the ticket (to show
> the
> patch doesn't break things). Many contributors have used the
> DataStax-provided cassci system, but that's not the best long term
> solution. To that end, I'd like to start a conversation about what is the
> best way to proceed going forward, and then add it to the "How to
> contribute" docs.
> 
> As an example, should contributors/committers run dtests and unit tests
> on
> *some* machine (publicly available or otherwise), and then post those
> results to the ticket? This could be a link to a build system, like what
> we
> have with cassci, or just  upload the output of the test run(s).
> 
> I don't have any fixed notions, and am looking forward to hearing other's
> ideas.
> 
> Thanks,
> 
> -Jason
> 
> p.s. a big thank you to DataStax for providing the cassci system


Re: committing performance patches without performance numbers

2017-03-09 Thread Ariel Weisberg
Hi,

I think there are issues around the availability of hardware sufficient
to demonstrate the performance concerns under test. It's an open source
project without centralized infrastructure. A lot of performance
contributions come from people running C* in production. They are
already running these changes and have seen the improvement, but
communicating that to the outside world in a convincing way can be hard.

Right now we don't even have performance in continuous integration. I
think we are putting the cart before the horse in that respect. What
about all the commits that don't intend to have a performance impact but
do? Even if we had performance metrics in CI who is going to triage the
results religiously?

We also to my knowledge don't have benchmarks for key functionality in
cassandra-stress. Can cassandra-stress benchmark CAS? My recollection is
every time I looked is that it wasn't there.

We can only set the bar as high as contributors are able to meet.
Certainly if they can't justify why they can't benchmark the thing they
want to contribute then reviewers should make them go and benchmark it.

Regards,
Ariel

On Thu, Mar 9, 2017, at 03:11 PM, Jeff Jirsa wrote:
> Agree. Anything that's meant to increase performance should demonstrate
> it
> actually does that. We have microbench available in recent versions -
> writing a new microbenchmark isn't all that onerous. Would be great if we
> had perf tests included in the normal testall/dtest workflow for ALL
> patches so we could quickly spot regressions, but that gets pretty
> expensive in terms of running long enough tests to actually see most
> common
> code paths.
> 
> 
> On Thu, Mar 9, 2017 at 12:00 PM, Jonathan Haddad 
> wrote:
> 
> > I'd like to discuss what I consider to be a pretty important matter -
> > patches which are written for the sole purpose of improving performance
> > without including a single performance benchmark in the JIRA.
> >
> > My original email was in "Testing and Jira Tickets", i'll copy it here
> > for posterity:
> >
> > If you don't mind, I'd like to broaden the discussion a little bit to also
> > discuss performance related patches.  For instance, CASSANDRA-13271 was a
> > performance / optimization related patch that included *zero* information
> > on if there was any perf improvement or a regression as a result of the
> > change, even though I've asked twice for that information.
> >
> > In addition to "does this thing break anything" we should be asking "how
> > does this patch affect performance?" (and were the appropriate docs
> > included, but that's another topic altogether)
> >
> > There's a minor note about perf related stuff here:
> > http://cassandra.apache.org/doc/latest/development/
> > testing.html#performance-testing
> >
> >
> > "Performance tests for Cassandra are a special breed of tests that are not
> > part of the usual patch contribution process. In fact you can contribute
> > tons of patches to Cassandra without ever running performance tests. They
> > are important however when working on performance improvements, as such
> > improvements must be measurable."
> >
> > I think performance numbers aren't just important, but should be a
> > *requirement* to merge a performance patch.
> >
> > Thoughts?
> > Jon
> >


Re: Per blockng release on dtest

2017-01-10 Thread Ariel Weisberg
Hi,



The upgrade tests are tricky because they upgrade from an existing
release to a current release. The bug is in 3.9 and won't be fixed until
3.11 because the test  checks out and builds 3.9 right now. 3.10 doesn't
include the commit that fixes the issue so it will fail after 3.10 is
released and the test is updated to check out 3.10.


We claim to support upgrade from any 3.x version to 4.0. If someone
tries to upgrade 3.10 to whatever 4.0 ends up being I think they will
hit the wrong answer bug. So I would advocate for having the fix brought
into 3.10, but it was broken in 3.9 as well.


Some of the tests fail because trunk complains of unreadable stables and
I suspect that isn't a bug it's just something that is no longer
supported due to thrift removal, but I haven't fixed those yet. Those
are probably issues with trunk or the tests.


Others fail for reasons I haven't triaged yet. I'm struggling with my
own issues getting the tests to run locally.


Ariel



On Tue, Jan 10, 2017, at 11:49 AM, Nate McCall wrote:

> >

> > I concede it would be fine to do it gradually. Once the pace of
> > issues
> > introduced by new development is beaten by the pace at which
> > they are
> > addressed I think things will go well.

>

> So from Michael's JIRA query:

> https://issues.apache.org/jira/browse/CASSANDRA-12617?jql=project%20%3D%20CASSANDRA%20AND%20fixVersion%20%3D%203.10%20AND%20resolution%20%3D%20Unresolved
>

> Are we good for 3.10 after we get those cleaned up?

>

> Ariel, you made reference to:

> https://github.com/apache/cassandra/commit/c612cd8d7dbd24888c216ad53f974686b88dd601
>

> Do we need to re-open an issue to have this applied to 3.10 and add it
> to the above list?

>

> >

> > On Tue, Jan 10, 2017, at 11:17 AM, Josh McKenzie wrote:

> >>

> >> Sankalp's proposal of us progressively tightening up our standards
> >> allows
> >> us to get code out the door and regain some lost momentum on
> >> the 3.10
> >> release failures and blocking, and gives us time as a community to
> >> adjust
> >> our behavior without the burden of an ever-later slipped release
> >> hanging
> >> over our heads. There's plenty of bugfixes in the 3.X line; the
> >> more time
> >> people can have to kick the tires on that code, the more things
> >> we can
> >> find

> >> and the better future releases will be.

>

>

> +1 On gradually moving to this. Dropping releases with huge change

> lists has never gone well for us in the past.




Re: Per blockng release on dtest

2017-01-10 Thread Ariel Weisberg
Hi,

I concede it would be fine to do it gradually. Once the pace of issues
introduced by new development is beaten by the pace at which they are
addressed I think things will go well.

Ariel

On Tue, Jan 10, 2017, at 11:17 AM, Josh McKenzie wrote:
> @ariel: you're letting the perfect be the enemy of the good here. We (as
> a
> project) have been releasing with a smattering of test failures and
> upgrade
> edge-cases back into perpetuity. While that doesn't make it ideal or
> justify continuing the behavior, getting a green testall + dtest for 3.10
> is a strong incremental improvement. Integrating other tests in the
> "block
> if not green" on subsequent releases is likewise an improvement.
> 
> I strongly advocate for incremental change in expectations of the
> community's behavior rather than a black-and-white, "this has to be
> perfect
> or we block" mentality.
> 
> Sankalp's proposal of us progressively tightening up our standards allows
> us to get code out the door and regain some lost momentum on the 3.10
> release failures and blocking, and gives us time as a community to adjust
> our behavior without the burden of an ever-later slipped release hanging
> over our heads. There's plenty of bugfixes in the 3.X line; the more time
> people can have to kick the tires on that code, the more things we can
> find
> and the better future releases will be.
> 
> 
> 
> 
> 
> On Tue, Jan 10, 2017 at 10:33 AM, Ariel Weisberg <ar...@weisberg.ws>
> wrote:
> 
> > Hi,
> >
> > At least some of those failures are real. I don't think we should
> > release 3.10 until the real failures are addressed. As I said earlier
> > one of them is a wrong answer bug that is not going to be fixed in 3.10.
> >
> > Can we just ignore failures because we think they don't mean anything?
> > Who is going to check which of the 60 failures is real?
> >
> > These tests were passing just fine at the beginning of December and then
> > commits happened and now the tests are failing. That is exactly what
> > their for. They are good tests. I don't think it matters if the failures
> > are "real" today because those are valid tests and they don't test
> > anything if they fail for spurious reasons. They are a critical part of
> > the Cassandra infrastructure as much as the storage engine or network
> > code.
> >
> > In my opinion the tests need to be fixed and people need to fix them as
> > they break them and we need to figure out how to get from people
> > breaking them and it going unnoticed to they break it and then fix it in
> > a time frame that fits the release schedule.
> >
> > My personal opinion is that releases are a reward for finishing the job.
> > Releasing without finishing the job creates the wrong incentive
> > structure for the community. If you break something you are no longer
> > the person that blocked the release you are just one of several people
> > breaking things without consequence.
> >
> > I think that rapid feedback and triaging combined with releases blocked
> > by the stuff individual contributors have broken is the way to more
> > consistent releases both schedule wise and quality wise.
> >
> > Regarding delaying 3.10? Who exactly is the consumer that is chomping at
> > the bit to get another release? One that doesn't reliably upgrade from a
> > previous version?
> >
> > Ariel
> >
> > On Tue, Jan 10, 2017, at 08:13 AM, Josh McKenzie wrote:
> > > First, I think we need to clarify if we're blocking on just testall +
> > > dtest
> > > or blocking on *all test jobs*.
> > >
> > > If the latter, upgrade tests are the elephant in the room:
> > > http://cassci.datastax.com/view/cassandra-3.11/job/
> > cassandra-3.11_dtest_upgrade/lastCompletedBuild/testReport/
> > >
> > > Do we have confidence that the reported failures are all test problems
> > > and
> > > not w/Cassandra itself? If so, is that documented somewhere?
> > >
> > > On Mon, Jan 9, 2017 at 7:33 PM, Nate McCall <zznat...@gmail.com> wrote:
> > >
> > > > I'm not sure I understand the culmination of the past couple of
> > threads on
> > > > this.
> > > >
> > > > With a situation like:
> > > > http://cassci.datastax.com/view/cassandra-3.11/job/
> > cassandra-3.11_dtest/
> > > > lastCompletedBuild/testReport/
> > > >
> > > > We have some sense of stability on what might be flaky tests(?).
> > > > Again, I'm not sure what our criteria is specifically.
> > > >
> > > > Basically, it feels like we are in a stalemate right now. How do we
> > > > move forward?
> > > >
> > > > -Nate
> > > >
> >


Re: Wrapping up tick-tock

2017-01-10 Thread Ariel Weisberg
Hi,

With yearly releases trunk is going to be a mess when it comes time to
cut a release. Cutting releases is when people start caring whether all
the things in the release are in a finished state. It's when the state
of CI finally becomes relevant.

If we wait a year we are going to accumulate a years worth of unfinished
stuff in a single release. It's more expensive to context switch back
and then address those issues. If we put out large unstable releases it
means time until the features in the release are usable is pushed back
even further since it takes another 6-12 months for the release to
stabilize. Features introduced at the beginning of the cycle will have
to wait 18-24 months before anyone can benefit from them.

Is the biggest pain point with tick-tock just the elimination of long
term support releases? What is the pain point around release frequency?
Right now people should be using 3.0 unless they need a bleeding edge
feature from 3.X and those people will have to give up something to get
something.

Ariel

On Tue, Jan 10, 2017, at 10:29 AM, Jonathan Haddad wrote:
> I don't see why it has to be one extreme (yearly) or another (monthly).
> When you had originally proposed Tick Tock, you wrote:
> 
> "The primary goal is to improve release quality.  Our current major “dot
> zero” releases require another five or six months to make them stable
> enough for production.  This is directly related to how we pile features
> in
> for 9 to 12 months and release all at once.  The interactions between the
> new features are complex and not always obvious.  2.1 was no exception,
> despite DataStax hiring a full tme test engineering team specifically for
> Apache Cassandra."
> 
> I agreed with you at the time that the yearly cycle was too long to be
> adding features before cutting a release, and still do now.  Instead of
> elastic banding all the way back to a process which wasn't working
> before,
> why not try somewhere in the middle?  A release every 6 months (with
> monthly bug fixes for a year) gives:
> 
> 1. long enough time to stabilize (1 year vs 1 month)
> 2. not so long things sit around untested forever
> 3. only 2 releases (current and previous) to do bug fix support at any
> given time.
> 
> Jon
> 
> On Tue, Jan 10, 2017 at 6:56 AM Jonathan Ellis  wrote:
> 
> > Hi all,
> >
> > We’ve had a few threads now about the successes and failures of the
> > tick-tock release process and what to do to replace it, but they all died
> > out without reaching a robust consensus.
> >
> > In those threads we saw several reasonable options proposed, but from my
> > perspective they all operated in a kind of theoretical fantasy land of
> > testing and development resources.  In particular, it takes around a
> > person-week of effort to verify that a release is ready.  That is, going
> > through all the test suites, inspecting and re-running failing tests to see
> > if there is a product problem or a flaky test.
> >
> > (I agree that in a perfect world this wouldn’t be necessary because your
> > test ci is always green, but see my previous framing of the perfect world
> > as a fantasy land.  It’s also worth noting that this is a common problem
> > for large OSS projects, not necessarily something to beat ourselves up
> > over, but in any case, that's our reality right now.)
> >
> > I submit that any process that assumes a monthly release cadence is not
> > realistic from a resourcing standpoint for this validation.  Notably, we
> > have struggled to marshal this for 3.10 for two months now.
> >
> > Therefore, I suggest first that we collectively roll up our sleeves to vet
> > 3.10 as the last tick-tock release.  Stick a fork in it, it’s done.  No
> > more tick-tock.
> >
> > I further suggest that in place of tick tock we go back to our old model of
> > yearly-ish releases with as-needed bug fix releases on stable branches,
> > probably bi-monthly.  This amortizes the release validation problem over a
> > longer development period.  And of course we remain free to ramp back up to
> > the more rapid cadence envisioned by the other proposals if we increase our
> > pool of QA effort or we are able to eliminate flakey tests to the point
> > that a long validation process becomes unnecessary.
> >
> > (While a longer dev period could mean a correspondingly more painful test
> > validation process at the end, my experience is that most of the validation
> > cost is “fixed” in the form of flaky tests and thus does not increase
> > proportionally to development time.)
> >
> > Thoughts?
> >
> > --
> > Jonathan Ellis
> > co-founder, http://www.datastax.com
> > @spyced
> >


Re: Per blockng release on dtest

2017-01-10 Thread Ariel Weisberg
Hi,

At least some of those failures are real. I don't think we should
release 3.10 until the real failures are addressed. As I said earlier
one of them is a wrong answer bug that is not going to be fixed in 3.10.

Can we just ignore failures because we think they don't mean anything?
Who is going to check which of the 60 failures is real?

These tests were passing just fine at the beginning of December and then
commits happened and now the tests are failing. That is exactly what
their for. They are good tests. I don't think it matters if the failures
are "real" today because those are valid tests and they don't test
anything if they fail for spurious reasons. They are a critical part of
the Cassandra infrastructure as much as the storage engine or network
code.

In my opinion the tests need to be fixed and people need to fix them as
they break them and we need to figure out how to get from people
breaking them and it going unnoticed to they break it and then fix it in
a time frame that fits the release schedule.

My personal opinion is that releases are a reward for finishing the job.
Releasing without finishing the job creates the wrong incentive
structure for the community. If you break something you are no longer
the person that blocked the release you are just one of several people
breaking things without consequence.

I think that rapid feedback and triaging combined with releases blocked
by the stuff individual contributors have broken is the way to more
consistent releases both schedule wise and quality wise.

Regarding delaying 3.10? Who exactly is the consumer that is chomping at
the bit to get another release? One that doesn't reliably upgrade from a
previous version?
 
Ariel

On Tue, Jan 10, 2017, at 08:13 AM, Josh McKenzie wrote:
> First, I think we need to clarify if we're blocking on just testall +
> dtest
> or blocking on *all test jobs*.
> 
> If the latter, upgrade tests are the elephant in the room:
> http://cassci.datastax.com/view/cassandra-3.11/job/cassandra-3.11_dtest_upgrade/lastCompletedBuild/testReport/
> 
> Do we have confidence that the reported failures are all test problems
> and
> not w/Cassandra itself? If so, is that documented somewhere?
> 
> On Mon, Jan 9, 2017 at 7:33 PM, Nate McCall  wrote:
> 
> > I'm not sure I understand the culmination of the past couple of threads on
> > this.
> >
> > With a situation like:
> > http://cassci.datastax.com/view/cassandra-3.11/job/cassandra-3.11_dtest/
> > lastCompletedBuild/testReport/
> >
> > We have some sense of stability on what might be flaky tests(?).
> > Again, I'm not sure what our criteria is specifically.
> >
> > Basically, it feels like we are in a stalemate right now. How do we
> > move forward?
> >
> > -Nate
> >


Re: 3.10 release status: blocked on dtest

2017-01-07 Thread Ariel Weisberg
Hi,

When we say all tests passing it does seem like we are including the
upgrade tests, but there are some failures that don't seem to have
tickets blocking the release. It seems like we are also excluding any
tests decorated as resource intensive? There is also large_dtest,
novnode_dtest, and offheap_dtest which all have a few failing tests. I
think we should consider those as blockers as well.

The upgrade tests have this chicken and egg issue where they test the
previous current release against the in development release and if there
is a bug preventing upgrade you end up with a lot of failing tests that
continue to fail even after the bug is fixed.

There is a also a real bug fixed by
https://github.com/apache/cassandra/commit/c612cd8d7dbd24888c216ad53f974686b88dd601
that near as I can tell isn't included in 3.10. It will continue to fail
until we release a version that addresses the issue. It kind of makes
you think that if the current version fails, but  the in development
version passes we want a quick way of filtering out the failure.

I also found an issue with max version in the since decorator that
causes some of the upgrade tests to fail. I gave the fix for that to
Philip.

I haven't managed to get the upgrade tests passing after addressing the
above two issues, but it's always hard to tell what is my environment
and what is the tests.

Ariel


On Wed, Jan 4, 2017, at 01:46 PM, Michael Shuler wrote:
> Thanks! I think I was looking at a wrong JIRA, sorry 'bout that.
> 
> -- 
> Michael
> 
> On 01/04/2017 12:31 PM, Oleksandr Petrov wrote:
> > #13025 was updated yesterday. It just needs some feedback, but we know what
> > the problem is there.
> > 
> > On Wed, Jan 4, 2017 at 5:32 PM Michael Shuler 
> > wrote:
> > 
> >> On 12/20/2016 03:48 PM, Michael Shuler wrote:
> >>> Current release blockers in JIRA on the cassandra-3.11 branch are:
> >>>
> >>> https://issues.apache.org/jira/browse/CASSANDRA-12617
> >>> https://issues.apache.org/jira/browse/CASSANDRA-13058
> >>
> >> and https://issues.apache.org/jira/browse/CASSANDRA-13025
> >>
> >> CASSANDRA-13058 is unassigned, but was just updated (thanks Stefan!).
> >> The other tickets are assigned, but have not been updated in a while.
> >>
> >> JQL for 3.10:
> >>
> >> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20fixVersion%20%3D%203.10%20AND%20resolution%20%3D%20Unresolved
> >>
> >> --
> >> Kind regards,
> >> Michael
> >>
> 


Re: Use of posix_fadvise

2016-10-18 Thread Ariel Weisberg
Hi,

Compaction can merge some very large files together with data that may
be completely cold. So yeah caching the whole file just creates pressure
to evict useful stuff. In some theories.

In other theories the page cache is flush and scan resistant and should
just eat this stuff up without intervention. Sure it might hurt a bit,
but it's a bounded amount before the cache stops discarding useful stuff
in favor of new stuff that is unproven.

If there is a benchmark with this enabled/disabled I haven't seen it.
Doesn't mean it doesn't exist though.

Ariel
On Tue, Oct 18, 2016, at 12:05 PM, Michael Kjellman wrote:
> Within a single SegmentedFile?
> 
> On Oct 18, 2016, at 9:02 AM, Ariel Weisberg
> <ariel.weisb...@datastax.com<mailto:ariel.weisb...@datastax.com>> wrote:
> 
> With compaction there can be hot and cold data mixed together.
> 


Re: Use of posix_fadvise

2016-10-18 Thread Ariel Weisberg
Hi,

With compaction there can be hot and cold data mixed together. So we want
to drop the data and then warm it via early opening so only the hot data is
in the cache.

Some of those cases are for the old sstable that have been rewritten or
discarded so the data is entirely defunct. The files might not get deleted
though so they do add pressure to the cache until they are evicted.

In the instance you are looking at in a tidier won't there always be a
reference held in the current view for the column family? It don't think it
would constantly be evicting them nor closing/reopening and remapping the
file.


Specifically regarding the behavior in different kernels, from `man
> posix_fadvise`: "In kernels before 2.6.6, if len was specified as 0, then
> this was interpreted literally as "zero bytes", rather than as meaning "all
> bytes through to the end of the file"."

Not ideal, but at least not actively harmful right? The cache is supposed
to be scan/flush resistant.

Ariel

On Tue, Oct 18, 2016 at 11:57 AM, Michael Kjellman <
mkjell...@internalcircle.com> wrote:

> Right, so in SSTableReader#GlobalTidy$tidy it does:
> // don't ideally want to dropPageCache for the file until all instances
> have been released
> CLibrary.trySkipCache(desc.filenameFor(Component.DATA), 0, 0);
> CLibrary.trySkipCache(desc.filenameFor(Component.PRIMARY_INDEX), 0, 0);
>
> It seems to me every time the reference is released on a new sstable we
> would immediately tidy() it and then call posix_fadvise with
> POSIX_FADV_DONTNEED with an offset of 0 and a length of 0 (which I'm
> thinking is doing so in respect to the API behavior in modern Linux kernel
> builds?). Am I reading things correctly here? Sorta hard as there are many
> different code paths the reference could have tidy() called.
>
> Why would we want to drop the segment we just write from the page cache --
> wouldn't that most likely be the most hot data, and even if it turned out
> not to be wouldn't it be better in this case to have kernel be smart at
> what it's best at?
>
> best,
> kjellman
>
> > On Oct 18, 2016, at 8:50 AM, Jake Luciani  wrote:
> >
> > The main point is to avoid keeping things in the page cache that are no
> > longer needed like compacted data that has been early opened elsewhere.
> >
> > On Oct 18, 2016 11:29 AM, "Michael Kjellman" <
> mkjell...@internalcircle.com>
> > wrote:
> >
> >> We use posix_fadvise in a bunch of places, and in stereotypical
> Cassandra
> >> fashion no comments were provided.
> >>
> >> There is a check the OS is Linux (okay, a start) but it turns out the
> >> behavior of providing a length of 0 to posix_fadvise changed in some 2.6
> >> kernels. We don't check the kernel version -- or even note it.
> >>
> >> What is the *expected* outcome of our use of posix_fadvise -- not what
> >> does it do or not do today -- but what problem was it added to solve and
> >> what's the expected behavior regardless of kernel versions.
> >>
> >> best,
> >> kjellman
> >>
> >> Sent from my iPhone
>
>


End June retrospective, July retrospective will start @ C* summit

2015-09-08 Thread Ariel Weisberg
Hi all,

I am closing out July retrospective now. The retrospective doc has a single
author (me) which kind of says that doing this asynchronously by email
isn't working. At least not as a starting point.

I am not super surprised nor am I disappointed. Trying and failing is part
of eventually trying and succeeding. Process and iterating on process is a
skill you have to develop and it's hard to do when you don't have dedicated
time in your schedule for it. I'm an advocate for retrospectives and I
still don't send out the emails out on time.

We are going to have a butts in seats retrospective at summit and I will
take notes and make that available. We'll have a chance to discuss where to
go from there.

Regards,
Ariel


Re: Should we make everything blue in Jenkins?

2015-08-16 Thread Ariel Weisberg
Hi,

Thanks for bring this up Michael.

I want to elaborate on the impetus for this (or at least my take on it). When 
8099 merged we had a thing that must never happen for our process to work. We 
introduced a large enough number of test failures that it was difficult to tell 
if you introduced a regression.

At the time we thought we could exclude the test failures prior to 8099 and 
that the test failures introduced by 8099 would get addressed promptly. What 
has happened instead is that the number of failures have snowballed to the 
point that you can hardly tell if you broke anything even if you compare test 
by test with trunk. You have to go into the history on trunk for each test and 
go back several pages to really be sure.

If you don’t have consistently passing CI you can’t avoid the addition of test 
failures by ongoing work that slip in masked by known failures.

The artery is severed, we’re bleeding out, and we’re going to have to lose the 
leg. I’m sure the prosthetic when it comes will be just as good, but the rehab 
is going to suck. There that’s my analogy.

I think the utests are in pretty good shape but the pig tests are a problem. 
They extend the job time a lot, cause aborts, and fail randomly.

Ariel

 On Aug 14, 2015, at 3:16 PM, Michael Shuler mich...@pbandjelly.org wrote:
 
 This is a prompt for Cassandra developers to discuss the alternatives and let 
 Test Engineering know what you desire.
 
 As discussed a few times in person, on irc, etc., there are a couple 
 different ways we can run tests in Jenkins, particularly cassandra-dtest. The 
 Cassandra developers are the committers to unit tests, so Test Engineering 
 runs whatever is in the branch. If you'd like to make changes to unit tests 
 to make things blue, just commit those!
 
 Currently, we run dtests as 1), but we could do 2):
 
 1) Run all dtests that don't catastrophically hang a server, pass or fail, 
 and report the results.
 2) Run only known passing dtests, skipping anything that fails - make it all 
 blue on the main branch builds.
 
 The biggest benefit is that dev branch builds should be easily recognizable 
 as able to merge, if the dtest run is passing and blue. There is no 
 comparison with the main branch build needing interpretation.
 
 Test Eng has recently added the ability run *only* the skipped tests and has 
 a a prototype job, trunk_dtest-skipped-with-require, to dig through. This 
 could be set up for all main branch builds, moving anything that doesn't pass 
 100% to the -skipped job. This is perhaps the drawback with 2) above: we're 
 simply not going to run all the dtests on your dev branch. I don't think it 
 makes sense to set up a -skipped dtest job on your dev branches. In addition, 
 there's another job result set to go look at to properly evaluate the true 
 state of a Cassandra branch or release. There may be other side effects - 
 feel free to chime in.
 
 I'm on a disconnected holiday until Monday Aug 24, so I won't have a chance 
 to check in until then - the Test Eng team can field questions or 
 clarifications, if needed.
 
 -- 
 Warm regards,
 Michael



Re: Proposal, add Epic to the set of issue types available in ASF Jira for Cassandra

2015-08-06 Thread Ariel Weisberg
Hi,

It's not instead of it's in addition too. The presence of Epics doesn't
prevent the use of labels and you can filter with labels on the agile
board it's just really gross. You have to create quick filters or type
in JQL. Epics get's you drag and drop as well as some other UI niceness
like tracking how close an Epic is to completion.

There is also no process change where we stop using labels and start
using Epics.

Ariel

On Thu, Aug 6, 2015, at 10:04 AM, Jake Luciani wrote:
 Is the reason to use epics over labels simply because the agile board
 doesn't support it?
 
 On Wed, Aug 5, 2015 at 12:42 PM, Ariel Weisberg
 ariel.weisb...@datastax.com
  wrote:
 
  Hi,
 
  At this stage I wasn't going to propose a process change. My goal is to
  observe and report mall cop style so I can present what happens the way we
  currently operate. Right now Epics are just a way for me to bucket and then
  rank things inside a release based on what they are, enhancement, core to
  the release (Materialized Views, 8099), bugs or failing tests.
 
  Regards,
  Ariel
 
  On Wed, Aug 5, 2015 at 11:43 AM, Gary Dusbabek gdusba...@gmail.com
  wrote:
 
   Who would have the burden of assigning and managing epics?
  
   Thanks,
  
   Gary.
  
  
  
   On Tue, Aug 4, 2015 at 3:08 PM, Ariel Weisberg 
   ariel.weisb...@datastax.com
   wrote:
  
Hi all,
   
I am playing with using an Agile board to track what goes into each
Cassandra release. What slips from release to release, as well as what
  is
added after the initial set of tasks for a release is started.
   
You can see the SCRUM agile board I created here

   
  
  https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=83view=planning.nodetailselectedIssue=CASSANDRA-9908epics=visible

.
   
The board has two ways to bucket issues. One is the release in which
  the
issue is supposed to be fixed. The other is Epics. Epics are not
   associated
with a release so a feature like 8099 might have an epic. Epics can be
   used
to bucket issues within a release or across releases.
   
I would characterize Epics as being a lot like labels that integrate
  with
the Agile board. You can use labels with Agile boards by adding quick
filters to select on labels.
   
The current set of issues types associated with the C* project in ASF
   JIRA
doesn't include Epic or Story. I don't use Story, but Epic would be
   useful
for further categorizing things.
   
When I asked ASF Infra about it they said that this needs discussion
  and
approval by the PMC.
   
Thanks,
Ariel
   
  
 
 
 
 
 -- 
 http://twitter.com/tjake


Re: Proposal, add Epic to the set of issue types available in ASF Jira for Cassandra

2015-08-05 Thread Ariel Weisberg
Hi,

At this stage I wasn't going to propose a process change. My goal is to
observe and report mall cop style so I can present what happens the way we
currently operate. Right now Epics are just a way for me to bucket and then
rank things inside a release based on what they are, enhancement, core to
the release (Materialized Views, 8099), bugs or failing tests.

Regards,
Ariel

On Wed, Aug 5, 2015 at 11:43 AM, Gary Dusbabek gdusba...@gmail.com wrote:

 Who would have the burden of assigning and managing epics?

 Thanks,

 Gary.



 On Tue, Aug 4, 2015 at 3:08 PM, Ariel Weisberg 
 ariel.weisb...@datastax.com
 wrote:

  Hi all,
 
  I am playing with using an Agile board to track what goes into each
  Cassandra release. What slips from release to release, as well as what is
  added after the initial set of tasks for a release is started.
 
  You can see the SCRUM agile board I created here
  
 
 https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=83view=planning.nodetailselectedIssue=CASSANDRA-9908epics=visible
  
  .
 
  The board has two ways to bucket issues. One is the release in which the
  issue is supposed to be fixed. The other is Epics. Epics are not
 associated
  with a release so a feature like 8099 might have an epic. Epics can be
 used
  to bucket issues within a release or across releases.
 
  I would characterize Epics as being a lot like labels that integrate with
  the Agile board. You can use labels with Agile boards by adding quick
  filters to select on labels.
 
  The current set of issues types associated with the C* project in ASF
 JIRA
  doesn't include Epic or Story. I don't use Story, but Epic would be
 useful
  for further categorizing things.
 
  When I asked ASF Infra about it they said that this needs discussion and
  approval by the PMC.
 
  Thanks,
  Ariel
 



Proposal, add Epic to the set of issue types available in ASF Jira for Cassandra

2015-08-04 Thread Ariel Weisberg
Hi all,

I am playing with using an Agile board to track what goes into each
Cassandra release. What slips from release to release, as well as what is
added after the initial set of tasks for a release is started.

You can see the SCRUM agile board I created here
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=83view=planning.nodetailselectedIssue=CASSANDRA-9908epics=visible
.

The board has two ways to bucket issues. One is the release in which the
issue is supposed to be fixed. The other is Epics. Epics are not associated
with a release so a feature like 8099 might have an epic. Epics can be used
to bucket issues within a release or across releases.

I would characterize Epics as being a lot like labels that integrate with
the Agile board. You can use labels with Agile boards by adding quick
filters to select on labels.

The current set of issues types associated with the C* project in ASF JIRA
doesn't include Epic or Story. I don't use Story, but Epic would be useful
for further categorizing things.

When I asked ASF Infra about it they said that this needs discussion and
approval by the PMC.

Thanks,
Ariel


End June retrospective, begin July retrospective

2015-08-03 Thread Ariel Weisberg
Hi all,

Thanks for the participation in the June retrospective. I linked in a few
issues that people brought up, but didn't push into the CVH doc. Some of
you did your own filing which is great.

The July 2015 retrospective doc
https://docs.google.com/document/d/1SPGhHMgX_BrBkoUstcI-V149TtInRGy3pockOav7FRU/edit?usp=sharing
is now available.

There were some issues with color coding comments. Since this is a doc we
can't easily tell who is writing something unless it's labelled. So pick a
color

In July we released 2.2 and 2.1.8 and here is a link to the bugs you worked
on for those releases
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20issuetype%20%3D%20Bug%20AND%20fixVersion%20in%20(2.1.8%2C%202.2.0%2C%20%222.2.0%20beta%201%22%2C%20%222.2.0%20rc1%22%2C%20%222.2.0%20rc2%22)%20AND%20assignee%20in%20(currentUser())
.

Here is the performance harness doc
https://docs.google.com/document/d/1TMdJ7-y-hKQwhPRFYL0VXf0R53MsF4QmhZmwbT8wpE0/edit
and
the Cassandra validation harness
https://docs.google.com/document/d/1kccPqxEAoYQpT0gXnp20MYQUDmjOrakAeQhf6vkqjGo/edit#heading=h.zd5nw0kl2ypi
 doc.

Regards,
Ariel


  1   2   >