from:"Joseph Lynch"

Re: Harry in-tree (Forked from "Long tests, Burn tests, Simulator tests, Fuzz tests - can we clarify the diffs?")

2023-12-21 Thread Joseph Lynch

+1

Sounds like a great change that will help us unify around a common testing
paradigm, and even pave the path to in-tree load testing plus integrated
correctness checking which would be extremely valuable!

-Joey

On Thu, Dec 21, 2023 at 1:35 PM Caleb Rackliffe 
wrote:

> +1
>
> Agree w/ all the justifications mentioned above.
>
> As a reviewer on CASSANDRA-19210
> , my goals were to
> a.) look at the directory, naming, and package structure of the ported
> code, b.) make sure IDE integration was working, and c.) make sure any
> modifications to existing code (rather than direct code movements from
> cassandra-harry) were straightforward.
>
> On Thu, Dec 21, 2023 at 3:23 PM Alex Petrov  wrote:
>
>> Hey folks,
>>
>> I am mostly done with a patch that brings Harry in-tree [1]. I will
>> trigger one more CI run overnight, and my intention was to merge it some
>> time soon, but I wanted to give a fair warning here, since this is a
>> relatively large patch.
>>
>> Good news for everyone that it:
>>   a) touches no production code whatsoever. Only test (in-jvm dtest
>> namely) code that was using Harry already.
>>   b) the only tests that are changed are ones that used a duplicate
>> version of placement simulator we had both for testing TCM, and in Harry
>>   c) in addition, I have converted 3 existing TCM tests to a new API to
>> have some base for examples/usage.
>>
>> Since we were effectively relying on this code for a while now, and the
>> intention now is to converge to:
>>   a) fewer different generators, and have a shareable version of
>> generators for everyone to use accross the base
>>   b) a testing tool that can be useful for both trivial cases, and
>> complex scenarios
>> myself and many other Cassandra contributors have expressed an opinion
>> that bringing Harry in-tree will be highly benefitial.
>>
>> I strongly believe that bringing Harry in-tree will help to lower the
>> barrier for fuzz test and simplify co-development of Cassandra and Harry.
>> Previously, it has been rather difficult to debug edge cases because I had
>> to either re-compile an in-jvm dtest jar and bring it to Harry, or
>> re-compile a Harry jar and bring it to Cassandra, which is both tedious and
>> time consuming. Moreover, I believe we have missed at very least one RT
>> regression [2] because Harry was not in-tree, as its tests would've caught
>> the issue even with the model that existed.
>>
>> For other recently found issues, I think having Harry in-tree would have
>> substantially lowered a turnaround time, and allowed me to share repros
>> with developers of corresponding features much quicker.
>>
>> I do expect a slight learning curve for Harry, but my intention is to
>> build a web of simple tests (worked on some of them yesterday after
>> conversation with David already), which can follow the in-jvm-dtest pattern
>> of find-similar-test / copy / modify. There's already copious
>> documentation, so I do not believe not having docs for Harry was ever an
>> issue, since there have been plenty.
>>
>> You all are aware of my dedication to testing and quality of Apache
>> Cassandra, and I hope you also see the benefits of having a model checker
>> in-tree.
>>
>> Thank you and happy upcoming holidays,
>> --Alex
>>
>> [1] https://issues.apache.org/jira/browse/CASSANDRA-19210
>> [2] https://issues.apache.org/jira/browse/CASSANDRA-18932
>>
>>

Re: [VOTE] Accept java-driver

2023-10-03 Thread Joseph Lynch

+1 (nb)

I am so grateful for all the hard work that went into getting the java
driver accepted into the project, well done to all involved!

-Joey

On Tue, Oct 3, 2023 at 7:38 AM C. Scott Andreas 
wrote:

> +1 (nb)
>
> Accepting this donation would mark a huge milestone for the project.
>
> On Oct 3, 2023, at 4:25 AM, Josh McKenzie  wrote:
>
>
> I see now this will likely be instead apache/cassandra-java-driver
>
> I was wondering about that. apache/java-driver seemed pretty broad. :)
>
> From the linked page:
> Check that all active committers have a signed CLA on record. TODO –
> attach list
> I've been part of these discussions and work so am familiar with the
> status of it (as well as guidance and clearance from the foundation re:
> folks we couldn't reach) - but might be worthwhile to link to the sheet or
> perhaps instead provide a summary of the 49 java contributors, their CLA
> signing status, attempts to reach out, etc for other PMC members that
> weren't actively involved back when we were working through it.
>
> As for my vote: +1
>
> Thanks everyone for the hard work getting to this point. This really is a
> significant contribution to the project.
>
> On Tue, Oct 3, 2023, at 6:48 AM, Brandon Williams wrote:
>
> +1
>
> Kind Regards,
> Brandon
>
> On Mon, Oct 2, 2023 at 11:53 PM Mick Semb Wever  wrote:
> >
> > The donation of the java-driver is ready for its IP Clearance vote.
> > https://incubator.apache.org/ip-clearance/cassandra-java-driver.html
> >
> > The SGA has been sent to the ASF.  This does not require acknowledgement
> before the vote.
> >
> > Once the vote passes, and the SGA has been filed by the ASF Secretary,
> we will request ASF Infra to move the datastax/java-driver as-is to
> apache/java-driver
> >
> > This means all branches and tags, with all their history, will be kept.
> A cleaning effort has already cleaned up anything deemed not needed.
> >
> > Background for the donation is found in CEP-8:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
> >
> > PMC members, please take note of (and check) the IP Clearance
> requirements when voting.
> >
> > The vote will be open for 72 hours (or longer). Votes by PMC members are
> considered binding. A vote passes if there are at least three binding +1s
> and no -1's.
> >
> > regards,
> > Mick
>
>
>
>

Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-20 Thread Joseph Lynch

Having native dependencies shouldn't make the project x86 only, it
should just accelerate the performance on x86 when available. Can't we
just try to load the fastest available provider (so arm will use
native java but x86 will use proper hardware acceleration) and failing
that fall-back to the default? If I recall correctly from the
messaging service patches (and zstd/lz4) it's reasonably
straightforward to try to load native code and then fail-back if you
fail.

-Joey

On Thu, Jul 20, 2023 at 10:27 AM J. D. Jordan  wrote:
>
> Maybe we could start providing Dockerfile’s and/or make arch specific rpm/deb 
> packages that have everything setup correctly per architecture?
> We could also download them all and have the startup scripts put stuff in the 
> right places depending on the arch of the machine running them?
> I feel like there are probably multiple ways we could solve this without 
> requiring users to jump through a bunch of hoops?
> But I do agree we can’t make the project x86 only.
>
> -Jeremiah
>
> > On Jul 20, 2023, at 2:01 AM, Miklosovic, Stefan 
> >  wrote:
> >
> > Hi,
> >
> > as I was reviewing the patch for this feature (1), we realized that it is 
> > not quite easy to bundle this directly into Cassandra.
> >
> > The problem is that this was supposed to be introduced as a new dependency:
> >
> > 
> >software.amazon.cryptools
> >AmazonCorrettoCryptoProvider
> >2.2.0
> >linux-x86_64
> > 
> >
> > Notice "classifier". That means that if we introduced this dependency into 
> > the project, what about ARM users? (there is corresponding aarch classifier 
> > as well). ACCP is platform-specific but we have to ship Cassandra 
> > platform-agnostic. It just needs to run OOTB everywhere. If we shipped that 
> > with x86 and a user runs Cassandra on ARM, I guess that would break things, 
> > right?
> >
> > We also can not just add both dependencies (both x86 and aarch) because how 
> > would we differentiate between them in runtime? That all is just too tricky 
> > / error prone.
> >
> > So, the approach we want to take is this:
> >
> > 1) nothing will be bundled in Cassandra by default
> > 2) a user is supposed to download the library and put it to the class path
> > 3) a user is supposed to put the implementation of ICryptoProvider 
> > interface Cassandra exposes to the class path
> > 3) a user is supposed to configure cassandra.yaml and its section 
> > "crypto_provider" to reference the implementation he wants
> >
> > That way, we avoid the situation when somebody runs x86 lib on ARM or vice 
> > versa.
> >
> > By default, NoOpProvider will be used, that means that the default crypto 
> > provider from JRE will be used.
> >
> > It can seem like we have not done too much progress here but hey ... we 
> > opened the project to the custom implementations of crypto providers a 
> > community can create. E.g. as 3rd party extensions etc ...
> >
> > I want to be sure that everybody is aware of this change (that we plan to 
> > do that in such a way that it will not be "bundled") and that everybody is 
> > on board with this. Otherwise I am all ears about how to do that 
> > differently.
> >
> > (1) https://issues.apache.org/jira/browse/CASSANDRA-18624
> >
> > 
> > From: German Eichberger via dev 
> > Sent: Friday, June 23, 2023 22:43
> > To: dev
> > Subject: Re: [DISCUSS] Using ACCP or tc-native by default
> >
> > NetApp Security WARNING: This is an external email. Do not click links or 
> > open attachments unless you recognize the sender and know the content is 
> > safe.
> >
> >
> >
> > +1 to ACCP - we love performance.
> > 
> > From: David Capwell 
> > Sent: Thursday, June 22, 2023 4:21 PM
> > To: dev 
> > Subject: [EXTERNAL] Re: [DISCUSS] Using ACCP or tc-native by default
> >
> > +1 to ACCP
> >
> > On Jun 22, 2023, at 3:05 PM, C. Scott Andreas  wrote:
> >
> > +1 for ACCP and can attest to its results. ACCP also optimizes for a range 
> > of hash functions and other cryptographic primitives beyond TLS 
> > acceleration for Netty.
> >
> > On Jun 22, 2023, at 2:07 PM, Jeff Jirsa  wrote:
> >
> >
> > Either would be better than today.
> >
> > On Thu, Jun 22, 2023 at 1:57 PM Jordan West 
> > mailto:jw...@apache.org>> wrote:
> > Hi,
> >
> > I’m wondering if there is appetite to change the default SSL provider for 
> > Cassandra going forward to either ACCP [1] or tc-native in Netty? Our 
> > deployment as well as others I’m aware of make this change in their fork 
> > and it can lead to significant performance improvement. When recently 
> > qualifying 4.1 without using ACCP (by accident) we noticed p99 latencies 
> > were 2x higher than 3.0 w/ ACCP. Wiring up ACCP can be a bit of a pain and 
> > also requires some amount of customization. I think it could be great for 
> > the wider community to adopt it.
> >
> > The biggest hurdle I foresee is licensing but ACCP is Apache 2.0 licensed. 
> >

Re: [VOTE] CEP-26: Unified Compaction Strategy

2023-04-06 Thread Joseph Lynch

+1

This proposal looks really exciting!

-Joey

On Wed, Apr 5, 2023 at 2:13 AM Aleksey Yeshchenko  wrote:
>
> +1
>
> On 4 Apr 2023, at 16:56, Ekaterina Dimitrova  wrote:
>
> +1
>
> On Tue, 4 Apr 2023 at 11:44, Benjamin Lerer  wrote:
>>
>> +1
>>
>> Le mar. 4 avr. 2023 à 17:17, Andrés de la Peña  a 
>> écrit :
>>>
>>> +1
>>>
>>> On Tue, 4 Apr 2023 at 15:09, Jeremy Hanna  
>>> wrote:

 +1 nb, will be great to have this in the codebase - it will make nearly 
 every table's compaction work more efficiently.  The only possible 
 exception is tables that are well suited for TWCS.

 On Apr 4, 2023, at 8:00 AM, Berenguer Blasi  
 wrote:

 +1

 On 4/4/23 14:36, J. D. Jordan wrote:

 +1

 On Apr 4, 2023, at 7:29 AM, Brandon Williams  wrote:

 
 +1

 On Tue, Apr 4, 2023, 7:24 AM Branimir Lambov  wrote:
>
> Hi everyone,
>
> I would like to put CEP-26 to a vote.
>
> Proposal:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-26%3A+Unified+Compaction+Strategy
>
> JIRA and draft implementation:
> https://issues.apache.org/jira/browse/CASSANDRA-18397
>
> Up-to-date documentation:
> https://github.com/blambov/cassandra/blob/CASSANDRA-18397/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md
>
> Discussion:
> https://lists.apache.org/thread/8xf5245tclf1mb18055px47b982rdg4b
>
> The vote will be open for 72 hours.
> A vote passes if there are at least three binding +1s and no binding 
> vetoes.
>
> Thanks,
> Branimir


>

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-03-28 Thread Joseph Lynch

> If we want to bring groups/containers/etc into the default deployment 
> mechanisms of C*, great.  I am all for dividing it up into micro services 
> given we solve all the problems I listed in the complexity section.
>
> I am actually all for dividing C* up into multiple micro services, but the 
> project needs to buy in to containers as the default mechanism for running it 
> for that to be viable in my mind.

I was under the impression that with CEP-1 the project did buy into
the direction of moving the workloads that are non-latency sensitive
out of the main process? At the time of the discussion folks mentioned
repair, bulk workloads, backup, restore, compaction etc ... as all
possible things we would like to extract over time to the sidecar.

I don't think we want to go full on micro services, with like 12
processes all handling one thing, but 2 seems like a good step? One
for latency sensitive requests (reads/writes - the current process),
and one for non latency sensitive requests (control plane, bulk work,
etc ... - the sidecar).

-Joey

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-03-28 Thread Joseph Lynch

One of the explicit goals of making an official sidecar project was to
try to make it something the project does not break compatibility with
as one of the main issues the third-party sidecars (that handle
distributed control, backup, repair, etc ...) have is they break
constantly because C* breaks the control interfaces (JMX and config
files in particular) constantly. If it helps with the mental model,
maybe think of the Cassandra sidecar as part of the Cassandra
distribution and we try not to break the distribution? Just like we
can't break CQL and break the CQL client ecosystem, we hopefully don't
break control interfaces of the sidecar either.

On Tue, Mar 28, 2023 at 10:30 AM Jeremiah D Jordan
 wrote:
>
> - Resources isolation. Having the said service running within the same JVM 
> may negatively impact Cassandra storage's performance. It could be more 
> beneficial to have them in Sidecar, which offers strong resource isolation 
> guarantees.
>
> How does having this in a side car change the impact on “storage 
> performance”?  The side car reading sstables will have the same impact on 
> storage IO as the main process reading sstables.  Given the sidecar is 
> running on the same node as the main C* process, the only real resource 
> isolation you have is in heap/GC?  CPU/Memory/IO are all still shared between 
> the main C* process and the side car, and coordinating those across processes 
> is harder than coordinating them within a single process.  For example if we 
> wanted to have the compaction throughput, streaming throughput, and analytics 
> read throughput all tied back to a single disk IO cap, that is harder with an 
> external process.

I think we might be underselling how valuable JVM isolation is,
especially for analytics queries that are going to pass the entire
dataset through heap somewhat constantly. In addition to that, having
this in a separate process gives us access to easy-to-use OS level
protections over CPU time, memory, network, and disk via cgroups; as
well as taking advantage of the existing isolation techniques kernels
already offer to protect processes from each other e.g. CPU schedulers
like CFS [1], network qdiscs like tc-fq/tc-prio[2, 3], and io
schedulers like kyber/bfq [4].

Mixing latency sensitive point queries with throughput sensitive ones
in the same JVM just seems fraught with peril and I don't buy we will
build the same level of performance isolation that the kernel has.
Note you do not need containers to do this, the kernel by default uses
these isolation mechanisms to enforce fairness to resources, cgroups
just make it better (and can be used regardless of containerization).
This was the thinking behind backup/restore, repair, bulk operations,
etc ... living in a separate process.

As has been mentioned elsewhere, being able to run that workload on
different physical machines is even better to isolate, and I could
totally see a wonderful architecture in the future where you have
sidecar doing incremental backups from source nodes and restores every
~10 minutes to the "analytics" nodes where spark bulk readers are
pointed. For isolation the best would be a separate process on a
separate machine, followed by a separate process on the same machine,
followed by a separate thread on the same machine (historically what
C* does) ... now thats not so say we need to go straight to best, but
we probably shouldn't do the worst thing?

-Joey

[1] https://man7.org/linux/man-pages/man7/sched.7.html
[2] https://man7.org/linux/man-pages/man8/tc-fq.8.html
[3] https://man7.org/linux/man-pages/man8/tc-prio.8.html
[4] https://docs.kernel.org/block/index.html

Re: Welcome our next PMC Chair Josh McKenzie

2023-03-23 Thread Joseph Lynch

Congratulations Josh! Thank you Mick!

-Joey

On Thu, Mar 23, 2023 at 10:56 AM Molly Monroy  wrote:

> Congrats Josh - looking forward to working with you more closely! It's
> been a pleasure, Mick!
>
> On Thu, Mar 23, 2023 at 8:32 AM Josh McKenzie 
> wrote:
>
>> Definitely want to +1 the appreciation for all the work Mick's put into
>> the role.
>>
>> Looking forward to continuing to help out where I can!
>>
>> On Thu, Mar 23, 2023, at 9:27 AM, J. D. Jordan wrote:
>>
>>
>> Congrats Josh!
>>
>> And thanks Mick for your time spent as Chair!
>>
>> On Mar 23, 2023, at 8:21 AM, Aaron Ploetz  wrote:
>>
>> 
>> Congratulations, Josh!
>>
>> And of course, thank you Mick for all you've done for the project while
>> in the PMC Chair role!
>>
>> On Thu, Mar 23, 2023 at 7:44 AM Derek Chen-Becker 
>> wrote:
>>
>> Congratulations, Josh!
>>
>> On Thu, Mar 23, 2023, 4:23 AM Mick Semb Wever  wrote:
>>
>> It is time to pass the baton on, and on behalf of the Apache Cassandra
>> Project Management Committee (PMC) I would like to welcome and congratulate
>> our next PMC Chair Josh McKenzie (jmckenzie).
>>
>> Most of you already know Josh, especially through his regular and
>> valuable project oversight and status emails, always presenting a balance
>> and understanding to the various views and concerns incoming.
>>
>> Repeating Paulo's words from last year: The chair is an administrative
>> position that interfaces with the Apache Software Foundation Board, by
>> submitting regular reports about project status and health. Read more about
>> the PMC chair role on Apache projects:
>> - https://www.apache.org/foundation/how-it-works.html#pmc
>> - https://www.apache.org/foundation/how-it-works.html#pmc-chair
>> - https://www.apache.org/foundation/faq.html#why-are-PMC-chairs-officers
>>
>> The PMC as a whole is the entity that oversees and leads the project and
>> any PMC member can be approached as a representative of the committee. A
>> list of Apache Cassandra PMC members can be found on:
>> https://cassandra.apache.org/_/community.html
>>
>>
>>

Re: Welcome Patrick McFadin as Cassandra Committer

2023-02-02 Thread Joseph Lynch

W! Congratulations Patrick!!

-Joey

On Thu, Feb 2, 2023 at 9:58 AM Benjamin Lerer  wrote:

> The PMC members are pleased to announce that Patrick McFadin has accepted
> the invitation to become committer today.
>
> Thanks a lot, Patrick, for everything you have done for this project and
> its community through the years.
>
> Congratulations and welcome!
>
> The Apache Cassandra PMC members
>

Re: Should we change 4.1 to G1 and offheap_objects ?

2022-11-17 Thread Joseph Lynch

It seems like this is a choice most users might not know how to make?

On Thu, Nov 17, 2022 at 7:06 AM Josh McKenzie  wrote:
>
> Have we ever discussed including multiple profiles that are simple to swap 
> between and documented for their tested / intended use cases?
>
> Then the burden of having a “sane” default for the wild variance of workloads 
> people use it for would be somewhat mitigated. Sure, there’s always going to 
> be folks that run the default and never think to change it but the UX could 
> be as simple as a one line config change to swap between GC profiles and we 
> could add and deprecate / remove over time.
>
> Concretely, having config files such as:
>
> jvm11-CMS-write.options
> jvm11-CMS-mixed.options
> jvm11-CMS-read.options
> jvm11-G1.options
> jvm11-ZGC.options
> jvm11-Shen.options
>
>
> Arguably we could take it a step further and not actually allow a C* node to 
> startup without pointing to one of the config files from your primary config, 
> and provide a clean mechanism to integrate that selection on headless 
> installs.
>
> Notably, this could be a terrible idea. But it does seem like we keep butting 
> up against the complexity and mixed pressures of having the One True Way to 
> GC via the default config and the lift to change that.
>
> On Wed, Nov 16, 2022, at 9:49 PM, Derek Chen-Becker wrote:
>
> I'm fine with not including G1 in 4.1, but would we consider inclusion
> for 4.1.X down the road once validation has been done?
>
> Derek
>
>
> On Wed, Nov 16, 2022 at 4:39 PM David Capwell  wrote:
> >
> > Getting poked in Slack to be more explicit in this thread…
> >
> > Switching to G1 on trunk, +1
> > Switching to G1 on 4.1, -1.  4.1 is about to be released and this isn’t a 
> > bug fix but a perf improvement ticket and as such should go through 
> > validation that the perf improvements are seen, there is not enough time 
> > left for that added performance work burden so strongly feel it should be 
> > pushed to 4.2/5.0 where it has plenty of time to be validated against.  The 
> > ticket even asks to avoid validating the claims; saying 'Hoping we can skip 
> > due diligence on this ticket because the data is "in the past” already”'.  
> > Others have attempted both shenandoah and ZGC and found mixed results, so 
> > nothing leads me to believe that won’t be true here either.
> >
> > > On Nov 16, 2022, at 9:15 AM, J. D. Jordan  
> > > wrote:
> > >
> > > Heap -
> > > +1 for G1 in trunk
> > > +0 for G1 in 4.1 - I think it’s worthwhile and fairly well tested but I 
> > > understand pushback against changing this so late in the game.
> > >
> > > Memtable -
> > > -1 for off heap in 4.1. I think this needs more testing and isn’t 
> > > something to change at the last minute.
> > > +1 for running performance/fuzz tests against the alternate memtable 
> > > choices in trunk and switching if they don’t show regressions.
> > >
> > >> On Nov 16, 2022, at 10:48 AM, Josh McKenzie  wrote:
> > >>
> > >> 
> > >> To clarify: -0 here on G1 as default for 4.1 as well; I'd like us to 
> > >> prioritize digging into G1's behavior on small heaps vs. CMS w/our 
> > >> default tuning sooner rather than later. With that info I'd likely be a 
> > >> strong +1 on the shift.
> > >>
> > >> -1 on switching to offheap_objects for 4.1 RC; again, think this is just 
> > >> a small step away from being a +1 w/some more rigor around seeing the 
> > >> current state of the technology's intersections.
> > >>
> > >> On Wed, Nov 16, 2022, at 7:47 AM, Aleksey Yeshchenko wrote:
> > >>> All right. I’ll clarify then.
> > >>>
> > >>> -0 on switching the default to G1 *this late* just before RC1.
> > >>> -1 on switching the default offheap_objects *for 4.1 RC1*, but all for 
> > >>> it in principle, for 4.2, after we run some more test and resolve the 
> > >>> concerns raised by Jeff.
> > >>>
> > >>> Let’s please try to avoid this kind of super late defaults switch going 
> > >>> forward?
> > >>>
> > >>> —
> > >>> AY
> > >>>
> > >>> > On 16 Nov 2022, at 03:27, Derek Chen-Becker  
> > >>> > wrote:
> > >>> >
> > >>> > For the record, I'm +100 on G1. Take it with whatever sized grain of
> > >>> > salt you think appropriate for a relative newcomer to the list, but
> > >>> > I've spent my last 7-8 years dealing with the intersection of
> > >>> > high-throughput, low latency systems and their interaction with GC and
> > >>> > in my personal experience G1 outperforms CMS in all cases and with
> > >>> > significantly less work (zero work, in many cases). The only things
> > >>> > I've seen perform better *with a similar heap footprint* are GenShen
> > >>> > (currently experimental) and Rust (beyond the scope of this topic).
> > >>> >
> > >>> > Derek
> > >>> >
> > >>> > On Tue, Nov 15, 2022 at 4:51 PM Jon Haddad 
> > >>> >  wrote:
> > >>> >>
> > >>> >> I'm curious what it would take for folks to be OK with merging this 
> > >>> >> into 4.1?  How much additional time would you want to feel 
> > >>> >> comfortable?
> > >>>

Re: Should we change 4.1 to G1 and offheap_objects ?

2022-11-17 Thread Joseph Lynch

I'm surprised we released 4.0 without changing the default to G1 given
that many Cassandra deployments have changed the project's default
because it is incorrect. I know that 7486 broke a user 7 years ago,
but I think we have had a ton of testing since then in the community
to build our confidence. Not to mention that Java 9+ (released 2017)
made G1 the default and Java 14 (2020) removes CMS entirely.

I have personally done targeted AB testing of G1GC vs CMS in a
controlled fashion using NDBench and our team had enough confidence in
~2019 to roll it to Netflix's entire fleet of O(1k) clusters and
O(10k) instances running Java 8. We found it vastly superior to CMS in
practically every way (no more 10s+ compacting STW phases after heap
fragmentation, better tail latency at a coordinator/replica level,
better average throughput, etc ...), and only identified a single very
minor p99 regression on one cluster (~5%) which we didn't consider
severe enough to roll back.

Right now our project defaults are hurting 99 users to help 1; let
that one user change the defaults? 4.1 seems like a great place to fix
the bug, absent being able to do that let's at least fix it in trunk?

-Joey

On Thu, Nov 17, 2022 at 8:27 AM Jon Haddad  wrote:
>
> I noticed nobody answered my actual question - what would it take for you to 
> be comfortable?
>
> It seems that the need to do a release is now more important than the best 
> interests of the new user's experience - despite having plenty of 
> *production* experience showing that what we ship isn't even remotely close 
> to usable.
>
> I tried to offer a compromise, and it's not cool with me that it was ignored 
> by everyone objecting.
>
> Jon
>
> On 2022/11/17 08:34:53 Mick Semb Wever wrote:
> > Ok, wrt G1 default, this is won't go ahead for 4.1-rc1
> >
> > We can revisit it for 4.1.x
> >
> > We have a lot of voices here adamantly positive for it, and those of us
> > that have done the performance testing over the years know why. But being
> > called to prove it is totally valid, if you have data to any such tests
> > please add them to the ticket 18027
> >

Re: Thanks to Nate for his service as PMC Chair

2022-07-14 Thread Joseph Lynch

Thank you for all your work and dedication Nate, it has been greatly
appreciated.

Congratulations Mick, we are in good hands with you as chair!

-Joey

On Mon, Jul 11, 2022 at 5:54 AM Paulo Motta  wrote:
>
> Hi,
>
> I wanted to announce on behalf of the Apache Cassandra Project Management 
> Committee (PMC) that Nate McCall (zznate) has stepped down from the PMC chair 
> role. Thank you Nate for all the work you did as the PMC chair!
>
> The Apache Cassandra PMC has nominated Mick Semb Wever (mck) as the new PMC 
> chair. Congratulations and good luck on the new role Mick!
>
> The chair is an administrative position that interfaces with the Apache 
> Software Foundation Board, by submitting regular reports about project status 
> and health. Read more about the PMC chair role on Apache projects:
> - https://www.apache.org/foundation/how-it-works.html#pmc
> - https://www.apache.org/foundation/how-it-works.html#pmc-chair
> - https://www.apache.org/foundation/faq.html#why-are-PMC-chairs-officers
>
> The PMC as a whole is the entity that oversees and leads the project and any 
> PMC member can be approached as a representative of the committee. A list of 
> Apache Cassandra PMC members can be found on: 
> https://cassandra.apache.org/_/community.html
>
> Kind regards,
>
> Paulo

Re: [VOTE] CEP-19: Trie memtable implementation

2022-02-16 Thread Joseph Lynch

+1 nb

Really excited for this, Thank you Branimir!

-Joey

On Wed, Feb 16, 2022 at 12:58 AM Branimir Lambov  wrote:
>
> Hi everyone,
>
> I'd like to propose CEP-19 for approval.
>
> Proposal: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-19%3A+Trie+memtable+implementation
> Discussion: https://lists.apache.org/thread/fdvf1wmxwnv5jod59jznbnql23nqosty
>
> The vote will be open for 72 hours.
> Votes by committers are considered binding.
> A vote passes if there are at least three binding +1s and no binding vetoes.
>
> Thank you,
> Branimir

Re: Welcome Anthony Grasso, Erick Ramirez and Lorina Poland as Cassandra committers

2022-02-16 Thread Joseph Lynch

Woo

Congratulations to the new committers and I am so excited to see the
project recognizing these contributions!

-Joey


On Tue, Feb 15, 2022 at 10:13 AM Benjamin Lerer  wrote:
>
> The PMC members are pleased to announce that Anthony Grasso, Erick Ramirez 
> and Lorina Poland have accepted the invitation to become committers.
>
> Thanks a lot, Anthony, Erick and Lorina for all the work you have done on the 
> website and documentation.
>
> Congratulations and welcome
>
> The Apache Cassandra PMC members

Re: [GSOC] Call for Mentors

2022-02-14 Thread Joseph Lynch

Hi Paulo!

Thanks for organizing this. I would like to propose CASSANDRA-17381
[1] which will implement/verify BoundedReadCompactionStrategy for this
year's GSOC and I can mentor (although I think we may need a
co-mentor?). Please let me know if there is any further context I need
to provide or jira tagging I need to do (I labeled it gsoc and
gsoc2022).

[1] https://issues.apache.org/jira/browse/CASSANDRA-17381

-Joey


On Fri, Feb 11, 2022 at 1:54 PM Paulo Motta  wrote:
>
> Unfortunately we didn't, so far.
>
> Em sex., 11 de fev. de 2022 às 15:32, Henrik Ingo  
> escreveu:
>>
>> Hi Paulo
>>
>> Just checking, am I using Jira right: 
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20labels%20%3D%20gsoc%20and%20statusCategory%20!%3D%20Done%20
>>
>> It looks like we ended up with no gsoc projects submitted? Or am I querying 
>> wrong?
>>
>> henrik
>>
>> On Thu, Feb 3, 2022 at 12:26 AM Paulo Motta  wrote:
>>>
>>> Hi Henrik,
>>>
>>> I am happy to give feedback to project ideas - but they ultimately need to 
>>> be registered by prospective mentors on JIRA with the "gsoc" tag to be 
>>> considered a "subscribed idea".
>>>
>>> The project idea JIRA should have a "high level" overview of what the 
>>> project is:
>>> - What is the problem statement?
>>> - Rough plan on how to approach the problem.
>>> - What are the main milestones/deliverables? (ie. 
>>> code/benchmark/framework/blog post etc)
>>> - What prior knowledge is required to complete the task?
>>> - What warm-up tasks can the candidate do to ramp up for the project?
>>>
>>> The mentor will work with potential participants to refine the high level 
>>> description into smaller subtasks at a later stage (during candidate 
>>> application period).
>>>
>>> Cheers,
>>>
>>> Paulo
>>>
>>> Em qua., 2 de fev. de 2022 às 19:02, Henrik Ingo  
>>> escreveu:

 Hi Paulo

 I think Shaunak and Aleks V already pinged you on Slack about their ideas. 
 When you say we don't have any subscribed ideas, what is missing?

 henrik

 On Wed, Feb 2, 2022 at 4:03 PM Paulo Motta  
 wrote:
>
> Hi everyone,
>
> We need to tell ASF how many slots we will need for GSoC (if any) by 
> February 20. So far we don't have any subscribed project ideas.
>
> If you are interested in being a GSoC mentor, just ping me on slack and I 
> will be happy to give you feedback on the project idea proposal. Please 
> do so by no later than February 10 to allow sufficient time for 
> follow-ups.
>
> Cheers,
>
> Paulo
>
> Em qua., 19 de jan. de 2022 às 10:54, Paulo Motta  
> escreveu:
>>
>> Hi everyone,
>>
>> Following up from the initial GSoC Kick-off thread [1] I would like to 
>> invite contributors to submit GSoC project ideas. In order to submit a 
>> project idea, just tag a JIRA ticket with the "gsoc" label and add 
>> yourself to the "Mentor" field to indicate you're willing to mentor this 
>> project.
>>
>> Existing JIRA tickets can be repurposed as GSoC projects or new tickets 
>> can be created with new features or improvements specifically for GSoC. 
>> The best GSoC project ideas are those which are self-contained: have a 
>> well defined scope, discrete milestones and definition of done. 
>> Generally the areas which are easier for GSoC contributors to get 
>> started are:
>> - UX improvements
>> - Tools
>> - Benchmarking
>> - Refactoring and Modularization
>>
>> Non-committers are more than welcome to submit project ideas and mentor 
>> projects, as long as a committer is willing to co-mentor the project. As 
>> a matter of fact I was a GSoC mentor before becoming a committer, so I 
>> can say this is a great way to pave your way to committership. ;)
>>
>> Mentor tasks involve having 1 or 2 weekly meetings with the GSoC 
>> participant to track the project status and give guidance to the 
>> participant towards the completion of the project, as well as reviewing 
>> code submissions.
>>
>> This year, GSoC is open to any participant over 18 years of age, no 
>> longer focusing solely on university students. GSoC projects can be of 
>> ~175 hour (medium) and 350 hour (large), and can range from 12 to 22 
>> weeks starting in July.
>>
>> We have little less than 2 months until the start of the GSoC 
>> application period on March 7, but ideally we want to have an "Ideas 
>> List" ready before that so prospective participants can start engaging 
>> with the project and working with mentors to refine the project before 
>> submitting an application.
>>
>> This year I will not be able to participate as a primary mentor but I 
>> would be happy to co-mentor other projects as well as help with 
>> questions and guidance.
>>
>> Kind regards,
>>
>> Paulo
>>
>> [1]

Re: [VOTE] Formalizing our CI process

2022-01-12 Thread Joseph Lynch

> "All releases by default are expected to have a green test run on 
> ci-cassandra Jenkins. In exceptional circumstances (security incidents, data 
> loss, etc requiring hotfix), members with binding votes on a release may 
> choose to approve a release with known failing tests."

+1 with amendment, thank you for driving this!

-Joey

Re: [VOTE] Formalizing our CI process

2022-01-12 Thread Joseph Lynch

On Wed, Jan 12, 2022 at 11:43 AM Joshua McKenzie  wrote:
>
> I fully concede your point and concern Joey but I propose we phrase that 
> differently to emphasize the importance of clean tests.
>
> "All releases by default are expected to have a green test run on 
> ci-cassandra Jenkins. In exceptional circumstances (security incidents, data 
> loss, etc requiring hotfix), members with binding votes on a release may 
> choose to approve a release with known failing tests."

I like the balance that strikes. Should we re-vote or should I propose
that text as an amendment after this vote (since a simple majority
will likely be reached)?

-Joey

Re: [VOTE] Formalizing our CI process

2022-01-12 Thread Joseph Lynch

On Wed, Jan 12, 2022 at 3:25 AM Berenguer Blasi
 wrote:
>
> jenkins CI was at 2/3 flakies consistently post 4.0 release.

That is really impressive and I absolutely don't mean to downplay that
achievement.

> Then things broke and we've been working hard to get back to the 2/3 flakies. 
> Most
> current failures imo are timeuuid C17133 or early termination of process
> C17140 related afaik. So getting back to the 2/3 'impossible' flakies
> should be doable and a reasonable target (famous last words...). My 2cts.

I really appreciate all the work folks have been doing to get the
project to green, and I support the parts of the proposal that try to
formalize methods to try to keep us there. I am only objecting to #2
in the proposal where we have a non-negotiable gate on tests before a
release.

-Joey

Re: [VOTE] Formalizing our CI process

2022-01-12 Thread Joseph Lynch

I've witnessed PMCs -1 releases due to failing tests or bugs reported
by users before, but prior to everyone's awesome work on CI I think a
number of times folks might have been voting without knowing what the
results of the full test runs were. One of the amazing contributions
of this group (and others working on the CI/CD solutions over the
years) is now we have an authoritative "which tests are failing" tool
and I do hope we use it as context during the next release vote as
suggested by this proposal.

I just think it should serve as context, and not a requirement. I also
vote -1 on this specific proposal and will happily change it to +1 if
the language on the release criteria is softened slightly, e.g. "When
a release is proposed, links to the associated test runs on
ci-cassandra.apache.org MUST be provided and members MAY use failing
tests as a valid reason to -1 a release".

-Joey

On Wed, Jan 12, 2022 at 8:11 AM Ekaterina Dimitrova
 wrote:
>
> “I particularly like the suggestion PMCs can use failing tests as a reason to 
> -1, but we do have critical patch releases now and again and common sense in 
> getting such releases out quickly needs to be applied. ”
>
> For some reason I assumed this would always be the case in case of emergency, 
> to consider it on a per case basis. Good catch on the wording! Thank you 
> Joeye! I think it doesn’t hurt to elaborate a bit more on this to be sure we 
> are all aligned that there will be special cases. (Hopefully not many)
>
> On Wed, 12 Jan 2022 at 3:25, Berenguer Blasi  wrote:
>>
>> Hi Joseph
>>
>> jenkins CI was at 2/3 flakies consistently post 4.0 release. Then things
>> broke and we've been working hard to get back to the 2/3 flakies. Most
>> current failures imo are timeuuid C17133 or early termination of process
>> C17140 related afaik. So getting back to the 2/3 'impossible' flakies
>> should be doable and a reasonable target (famous last words...). My 2cts.
>>
>> Regards
>>
>> On 12/1/22 7:21, Joseph Lynch wrote:
>> > On Wed, Jan 12, 2022 at 12:47 AM Berenguer Blasi
>> >  wrote:
>> >> We shouldn't be at 15-20 failures but at 2 or 3. The problem is that 
>> >> those 2 or 3 have already been hammered for over a year by 2 or 3 
>> >> different committers and they didn't crack.
>> >>
>> > Last I checked circleci was almost fully green on trunk only, and the
>> > asf builds all had around 15-20 failures. For example, as of the last
>> > build I checked, trunk had 22 failures [1], 4.0 had 12 [2], 3.11 had
>> > 35 [3] and 3.0 had 25 [4].
>> >
>> > [1] https://ci-cassandra.apache.org/job/Cassandra-trunk/901/
>> > [2] https://ci-cassandra.apache.org/job/Cassandra-4.0/308/
>> > [3] https://ci-cassandra.apache.org/job/Cassandra-3.11/300/
>> > [4] https://ci-cassandra.apache.org/job/Cassandra-3.0/234
>> >
>> > Looking at the failures they mostly seem to be consistent failures
>> > although there are some flakes as well. If I understand Josh's
>> > proposal correctly, and I could be mistaken, but if this vote passes
>> > it seems we would be unable to cut any release on any branch on the
>> > project?
>> >
>> > -Joey
>> > .

Re: [VOTE] Formalizing our CI process

2022-01-11 Thread Joseph Lynch

On Tue, Jan 11, 2022 at 4:48 PM Joshua McKenzie  wrote:
>> If this vote passes would that mean we cannot cut any release
>
> We would not cut a release with known failing tests, no. Which for critical 
> infrastructure software _seems_ like it should probably be table stakes, no?
>

While I very much support a "should" statement (or just acknowledging
that the PMCs can use failing tests as a reason to -1 which afaik was
already the case), a "must" statement seems like we could get in a
pickle quite easily in the case of critical bugs or security
vulnerabilities.

I like everything else about the proposal, and I like the idea of
using circle pre-commit to catch obviously broken stuff and the full
asf build for post merge validation. I am not in favor of removing the
project's capability to cut a release.

-Joey

Re: [VOTE] Formalizing our CI process

2022-01-11 Thread Joseph Lynch

On Wed, Jan 12, 2022 at 12:47 AM Berenguer Blasi
 wrote:
>
> We shouldn't be at 15-20 failures but at 2 or 3. The problem is that those 2 
> or 3 have already been hammered for over a year by 2 or 3 different 
> committers and they didn't crack.
>

Last I checked circleci was almost fully green on trunk only, and the
asf builds all had around 15-20 failures. For example, as of the last
build I checked, trunk had 22 failures [1], 4.0 had 12 [2], 3.11 had
35 [3] and 3.0 had 25 [4].

[1] https://ci-cassandra.apache.org/job/Cassandra-trunk/901/
[2] https://ci-cassandra.apache.org/job/Cassandra-4.0/308/
[3] https://ci-cassandra.apache.org/job/Cassandra-3.11/300/
[4] https://ci-cassandra.apache.org/job/Cassandra-3.0/234

Looking at the failures they mostly seem to be consistent failures
although there are some flakes as well. If I understand Josh's
proposal correctly, and I could be mistaken, but if this vote passes
it seems we would be unable to cut any release on any branch on the
project?

-Joey

Re: [VOTE] Formalizing our CI process

2022-01-11 Thread Joseph Lynch

> No release can be cut without a fully green CI run on ci-cassandra.apache.org

I appreciate the goal but this seems problematic given all four
release branches (2.2, 3.0, 3.11, 4.0) + trunk appear to have about
15-20 failures on ci-cassandra.apache.org at the time of this vote. If
this vote passes would that mean we cannot cut any release or does it
just mean that PMCs could -1 with this as a reason?

Perhaps it is possible to try to obtain a high quality bar in normal
times while leaving us some wiggle room for critical bug fix releases
(e.g. security patches) or other extenuating circumstances like
"Releases SHOULD have a fully green CI run but MAY proceed with
failing tests if those tests have a Jira issue tracking their
resolution" to leave the PMC room for allowing votes with
flakey/failing tests in an emergency situation? Then all we'd have to
do to get out of a sticky situation is cut jira tickets ... Or if this
is just a "valid reason" that a PMC could be -1 then that's fine too.

-Joey

On Mon, Jan 10, 2022 at 11:00 AM Joshua McKenzie  wrote:
>
> Wiki draft article here: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199530280
>
> The vote will be open for 72 hours (it's short + early indication on 
> discussion was consensus).
> Committer  / pmc votes binding.
> Simple majority passes.
>
> References:
> Background: original ML thread here: 
> https://lists.apache.org/thread/bq470ml17g106pwxpvwgws2stxc6d7b9
> Project governance guidelines here: 
> https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Project+Governance
>
> ~Josh

Re: [DISCUSS] Nested YAML configs for new features

2021-11-29 Thread Joseph Lynch

On Mon, Nov 29, 2021 at 11:51 AM bened...@apache.org
 wrote:
>
> Maybe we can make our query language more expressive 
>
> We might anyway want to introduce e.g. a LIKE filtering option to 
> find/discover flattened config parameters?

This sounds more complicated than just having the settings virtual
table return text (dot encoded) -> text (json) and probably not even
that much more useful. A full table scan on the settings table could
return all top level keys (strings before the first dot) and if we
just return a valid json string then users can bring their own
querying capabilities via jq [1], or one line of code in almost any
programming language (especially python, perl, etc ...).

Alternatively if we want to modify the grammar it seems supporting
structured data querying on text fields would maybe be more preferable
to LIKE since you could get what you want without a grammar change and
if we could generalize to any text column it would be amazingly useful
elsewhere to users. For example, we could emulate jq's query syntax in
the select which is, imo, best-in-class for quickly querying into
nearest structures. Assuming a key (text) -> value (json) schema:

'a' -> "{'b': [{'c': {'d': 4}}]}",

SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';

To have exactly jq syntax (but harder to parse) it would be:

SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';

Since we're not indexing the structured data in any way, filtering
before selection probably doesn't give us much performance improvement
as we'd still have to parse the whole text field in most cases.

-Joey

[1] https://stedolan.github.io/jq/

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

2021-11-24 Thread Joseph Lynch

On Wed, Nov 24, 2021 at 9:00 AM Bowen Song  wrote:
> Structured / nested config is easier for human eyes to read but very
> hard for simple scripts to handle. Flat config is harder for human eyes
> but easy for simple scripts. I can see user may prefer one over another
> depending on their own use case. If the structured / nested config must
> be introduced, I would like to see both syntaxes supported to allow the
> user to make their own choice.

To be clear, structured configuration was already adopted by Cassandra
a long time ago and is already used successfully in the status quo
(for example server/client encryption options, all of the pluggable
class configurations). I believe the question was "when we are adding
a number of related options should we structure them?". I think the
answer is clearly yes because it makes the configuration code in the
database a lot cleaner and allows us to leverage strongly typed
configuration. Related configuration should continue to be grouped as
if you were using a prefix of a dot encoded property (so {"a": {"b":
4}} is equivalent to "a.b: 4").

There is the separate question of "how can an operator tell what
configuration a node is running with" and for obvious reasons grepping
cassandra.yaml is not a good public interface, we can do better via
either virtual tables (JSON over CQL) or the sidecar (JSON over rest)
that preserves the structured configuration rather than trying to
flatten it.

-Joey

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

2021-11-24 Thread Joseph Lynch

On Wed, Nov 24, 2021 at 5:55 AM Jacek Lewandowski
 wrote:
>
> I am just wondering how to represent in properties things like lists of
> non-scalar values?
>

In my experience properties are not sufficient for complex
configuration sorta for this reason, that's why using structured YAML
(or any structured configuration language) is so much more powerful
than a properties file. I think if we leaned into structured
configuration we'd have mostly maps of maps pointing to scalars which
are well addressed by dot encoding.

Dot encoding only works down to the first non scalar/object leaf node
and then the value needs to be structured. So a list of maps for
example would be in the value, for example in {"a": {"b": 4, "c":
[{"d": 3}, {"d": 2}]}} you'd be able to query for 'a.b' -> 4 or
'a.b.c' -> [{"d": 3}, {"d": 2}]. Single scalar values are valid JSON
so if we have to have a text -> text encoding I'd go for the key is
the dot encoded key and the value is the JSON encoded value, that's
maybe the easiest way to generically represent complex structured
configuration in a flat key->value mapping.

I think Elasticsearch's live reconfiguration API [1] which accepts dot
encoded JSON and merges with on disk YAML and Puppet's Hiera
configuration language [2] which allows you to index into YAMLs using
dot encoding are some great interfaces for us to study. The latter
even allows the user to query into lists by using a number as the key
(similar to jq[3] except without the square brackets) so you could ask
for 'a.b.c.0' and get back {"d": 3}.

-Joey

[1] 
https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-update-settings.html
[2] https://puppet.com/docs/puppet/6/function.html#get
[3] https://stedolan.github.io/jq/manual/

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

2021-11-22 Thread Joseph Lynch

Isn't one of the primary reasons to have a YAML configuration instead
of a properties file is to allow typed and structured (implies nested)
configuration? I think it makes a lot of sense to group related
configuration options (e.g. a feature) into a typed class when we're
talking about more than one or two related options.

It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs to
period encoded key->value pairs when required (usually when providing
a property or override layer), Spring and Elasticsearch yamls both
come to mind. It seems pretty reasonable to support dot encoding and
decoding, for example {"a": {"b": 12}} -> '"a.b": 12'.

Regarding quickly telling what configuration a node is running I think
we should lean on virtual tables for "what is the current
configuration" now that we have them, as others have said the written
cassandra.yaml is not necessarily the current configuration ... and
also grep -C or -A exist for this reason.

-Joey

On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer  wrote:
>
> I do not have a strong opinion for one or the other but wanted to raise the
> issue I see with the "Settings" virtual table.
>
> Currently the "Settings" virtual table converts nested options into flat
> options using a "_" separator. For those options it allows a user to query
> the all set of options through some hack.
> If we decide to move to more nesting (more than one level), it seems to me
> that we need to change the way this table is behaving and how we can query
> its data.
>
> We would need to start using "." as a nesting separator to ensure that
> things are consistent between the configuration and the table and add
> support for LIKE restrictions for filtering queries to allow operators to
> be able to select the precise set of settings that the operator is looking
> for.
>
> Doing so is not really complicated in itself but might impact some users.
>
> Le ven. 19 nov. 2021 à 22:39, David Capwell  a
> écrit :
>
> > > it is really handy to grep
> > > cassandra.yaml on some config key and you know the value instantly.
> >
> > You can still do that
> >
> > $ grep -A2 coordinator_read_size conf/cassandra.yaml
> > # coordinator_read_size:
> > # warn_threshold_kb: 0
> > # abort_threshold_kb: 0
> >
> > I was also arguing we should support nested and flat, so if your infra
> > works better with flat then you could use
> >
> > track_warnings.coordinator_read_size.warn_threshold_kb: 0
> > track_warnings.coordinator_read_size.abort_threshold_kb: 0
> >
> > > On Nov 19, 2021, at 1:34 PM, David Capwell  wrote:
> > >
> > >> With the flat structure it turns into properties file - would it be
> > >> possible to support both formats - nested yaml and flat properties?
> > >
> > >
> > > For majority of our configs yes, but there are a subset where flat
> > properties is annoying
> > >
> > > hinted_handoff_disabled_datacenters - set type, so you could do
> > hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal
> > with separators as the format doesn’t support
> > > seed_provider.parameters - this is a map type… so would need to do
> > something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we special
> > case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We have
> > ParameterizedClass all over the code
> > >
> > > So, as long as we define how to deal with java collections; we could in
> > theory support properties files (not arguing for that in this thread) as
> > well as system properties.
> > >
> > >
> > >> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
> > lewandowski.ja...@gmail.com> wrote:
> > >>
> > >> With the flat structure it turns into properties file - would it be
> > >> possible to support both formats - nested yaml and flat properties?
> > >>
> > >>
> > >> - - -- --- -  -
> > >> Jacek Lewandowski
> > >>
> > >>
> > >> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
> > calebrackli...@gmail.com>
> > >> wrote:
> > >>
> > >>> If it's nested, "track_warnings" would still work if you're grepping
> > around
> > >>> vim or less.
> > >>>
> > >>> I'd have to concede the point about grep output, although there are
> > tools
> > >>> like https://github.com/kislyuk/yq that could probably be bent to do
> > what
> > >>> you want.
> > >>>
> > >>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> > >>> stefan.mikloso...@instaclustr.com> wrote:
> > >>>
> >  Hi David,
> > 
> >  while I do not oppose nested structure, it is really handy to grep
> >  cassandra.yaml on some config key and you know the value instantly.
> >  This is not possible when it is nested (easily & fastly) as it is on
> >  two lines. Or maybe my grepping is just not advanced enough to cover
> >  this case? If it is flat, I can just grep "track_warnings" and I have
> >  them all.
> > 
> >  Can you elaborate on your last bullet point? Parsing layer ... What do
> >  you mean specifically?
> > 
>

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-19 Thread Joseph Lynch

On Fri, Nov 19, 2021 at 9:52 AM Derek Chen-Becker  wrote:
>
> https://bugs.openjdk.java.net/browse/JDK-7184394 added AES intrinsics in
> Java 8, in 2012. While it's always possible to have a regression, and it's
> important to understand the performance impact, stories of 2-10x sound
> apocryphal. If they're all using the same intrinsics, the performance
> should be roughly the same. I think that the real challenge will be key
> management, not performance.
>
> Derek

> On Fri, Nov 19, 2021 at 7:41 AM Bowen Song  wrote:
>
> > On the performance note, I copy & pasted a small piece of Java code to
> > do AES256-CBC on the stdin and write the result to stdout. I then ran
> > the following two commands on the same machine (with AES-NI) for
> > comparison:
> >
> > $ dd if=/dev/zero bs=4096 count=$((4*1024*1024)) status=none | time
> > /usr/lib/jvm/java-11-openjdk/bin/java -jar aes-bench.jar >/dev/null
> > 36.24s user 5.96s system 100% cpu 41.912 total
> > $ dd if=/dev/zero bs=4096 count=$((4*1024*1024)) status=none | time
> > openssl enc -aes-256-cbc -e -K
> > "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
> > -iv "0123456789abcdef0123456789abcdef" >/dev/null
> > 31.09s user 3.92s system 99% cpu 35.043 total
> >
> > This is not an accurate test of the AES performance, as the Java test
> > includes the JVM start up time and the key and IV generation in the Java
> > code. But this gives us a pretty good idea that the total performance
> > regression is definitely far from the 2x to 10x slower claimed in some
> > previous emails.
> >
> >

I am aware that Java added AES intrinsics support in Java 8, but it is
still painfully slow doing authenticated AES-GCM and many other forms
of crypto [1]. Native AES-GCM on my laptop running at 4GHz [2]
achieves 3.7 GiB/s while Java 8 can manage a mere 289 MiB/s (13x
slower) and Java 11 manages 768 MiB/s (5x slower) [3]. AWS literally
funded an entire project [4] to speed up slow Java crypto which has
sped up basic crypto from 2-10x [5, 6] on real world workloads at
scale.

I don't think my claims are apocryphal when I and others have spent so
much time on this project and other JVM projects debugging why they
are so slow, including most recently determining the root cause to the
initial serious performance regressions in 4.0's networking code was
due to native Java 8's TLS stack and specifically AES-GCM
implementation being painfully slow (the fix we settled on was to use
tcnative with native AES-GCM) [6] as well as speeding up quorum reads
by 2x through using faster MD5 crypto [8, 9, 10].

-Joey

[1] 
https://gist.github.com/jolynch/a6db4409ddae8d5163894bef77204934#file-summary-txt
[2] 
https://gist.github.com/jolynch/a6db4409ddae8d5163894bef77204934#file-benchmarkon-sh
[3] 
https://gist.github.com/jolynch/a6db4409ddae8d5163894bef77204934#file-authenticated_encryption_perf-txt
[4] https://github.com/corretto/amazon-corretto-crypto-provider
[5] https://github.com/corretto/amazon-corretto-crypto-provider/pull/54
[6] https://github.com/corretto/amazon-corretto-crypto-provider/issues/52
[7] https://issues.apache.org/jira/browse/CASSANDRA-15175
[8] https://issues.apache.org/jira/browse/CASSANDRA-14611
[9] https://issues.apache.org/jira/browse/CASSANDRA-15294
[10] 
https://github.com/corretto/amazon-corretto-crypto-provider/issues/52#issuecomment-531921577

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-19 Thread Joseph Lynch

> For better or worse, different threat models mean that it’s not strictly 
> better to do FDE and some use cases definitely want this at the db layer 
> instead of file system.

Do you mind elaborating which threat models? The only one I can think
of is users can log onto the database machine and have read access to
the cassandra data directory but not read access to wherever the keys
are?

-Joey

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-19 Thread Joseph Lynch

> I think Joey's argument, and correct me if I'm wrong, is that implementing
> a complex feature in Cassandra that we then have to manage that's
> essentially worse in every way compared to a built-in full-disk encryption
> option via LUKS+LVM etc is a poor use of our time and energy.
>
> i.e. we'd be better off investing our time into documenting how to do full
> disk encryption in a variety of scenarios + explaining why that is our
> recommended approach instead of taking the time and energy to design,
> implement, debug, and then maintain an inferior solution.
>

Yes this is my argument. I also worry we're underestimating how hard
this is to do.

-Joey

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-19 Thread Joseph Lynch

>
> Yes, this needs to be done. The credentials for this stuff should be
> just fetched from wherever one wants. 100% agree with that and that
> maybe next iteration on top of that, should be rather easy. This was
> done in CEP-9 already for SSL context creation so we would just copy
> that approach here, more or less.
>
> I do not think you need to put the key in the yaml file. THE KEY? Why?
> Just a reference to it to read it from the beginning, no?
>
> What I do find quite ridiculous is to code up some tooling which would
> decrypt credentials in yaml. I hope we will avoid that approach here,
> that does not solve anything in my opinion.


+1 I think key management will be the main correctness challenge with this,
tooling will be the usability challenge, and the JVM will be the
performance challenge ...

-Joey

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-19 Thread Joseph Lynch

>
> Are you for real here?Nobody will ever guarantee you these %1 numbers
> ... come on. I think we are
> super paranoid about performance when we are not paranoid enough about
> security. This is a two way street.
> People are willing to give up on performance if security is a must.
>

I am for real that we should aspire to test performance (in addition to
correctness) when we implement complex features that impact performance of
the database. Given that the alternatives (e.g. using your cloud providers
out of the box encrypted ephemeral drives) have essentially no performance
penalty I think it's important to document how/if we are worse (we might
not be).

I don't actually think the 1% number is important, and I certainly don't
think we give any kind of guarantee, I'm just trying to say that if we
invest in encryption of the storage engine I hope we have clear metrics we
will measure that implementation by so we can try to gauge whether it is
worth doing and maintaining generally.


> You do not need to use it if you do not want to,
> it is not like we are going to turn it on and you have to stick with
> that. Are you just saying that we are going to
> protect people from using some security features because their db
> might be slow? What if they just dont care?
>

Certainly we can add this to the list of features that Cassandra supports
but few if any users can actually use due to either correctness issues
(e.g. not actually secure), usability (e.g. you have to configure 4
different properties to setup the various encryption options and restart
the database every week to rotate keys) or performance issues (e.g.
compaction slows down by 25x). I am just saying having a bit of design
(either on the ticket or the CEP) for how we might avoid that situation
might help.

-Joey

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-18 Thread Joseph Lynch

>
> I've seen this be a significant obstacle for people who want to adopt
> Apache Cassandra many times and an insurmountable obstacle on multiple
> occasions. From what I've seen, I think this is one of the most watched
> tickets with the most "is this coming soon" comments in the project backlog
> and it's something we pretty regularly get asked whether we know if/when
> it's coming.
>

I agree encrypted data at rest is a very important feature, but in the six
years since the ticket was originally proposed other systems kept getting
better at a faster rate, especially easy to use full disk and filesystem
encryption. LUKS+LVM in Linux is genuinely excellent and is relatively easy
to setup today while that was _not_ true five years ago.


> That said, I completely agree that we don't want to be engaging in security
> theatre or " introducing something that is either insecure or too slow to
> be useful." and I think there are some really good suggestions in this
> thread to come up with a strong solution for what will undoubtedly be a
> pretty complex and major change.
>

I think it's important to realize that for us to check the "data is
encrypted at rest" box we have to do a lot more than what's currently been
implemented. We have to design a pluggable key management system that
either retrieves the keys from a remote system (e.g. KMS) or gives some way
to load them directly into the process memory (virtual table? or maybe
loads them from a tmpfs mounted directory?). We can't just put the key in
the yaml file. This will also affect debuggability since we have to encrypt
every file that is ever produced by Cassandra including logs (which contain
primary keys) and heap dumps which are vital to debugging so we'll have to
ship custom tools to decrypt those things so humans can actually read them
to debug problems.

If our primary goal is facilitating our users in being compliant with
encryption at rest policies, I believe it is much easier to check that box
by encrypting the entire disk or filesystem than building partial solutions
into Cassandra.

-Joey

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-18 Thread Joseph Lynch

On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja 
wrote:

> To address Joey's concern, the OpenJDK JVM and its derivatives optimize
> Java crypto based on the underlying HW capabilities. For example, if the
> underlying HW supports AES-NI, JVM intrinsics will use those for crypto
> operations. Likewise, the new vector AES available on the latest Intel
> platform is utilized by the JVM while running on that platform to make
> crypto operations faster.
>

Which JDK version were you running? We have had a number of issues with the
JVM being 2-10x slower than native crypto on Java 8 (especially MD5, SHA1,
and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower). Again I
think we could get the JVM crypto penalty down to ~2x native if we linked
in e.g. ACCP by default [1, 2] but even the very best Java crypto I've seen
(fully utilizing hardware instructions) is still ~2x slower than native
code. The operating system has a number of advantages here in that they
don't pay JVM allocation costs or the JNI barrier (in the case of ACCP) and
the kernel also takes advantage of hardware instructions.

> From our internal experiments, we see single digit % regression when
> transparent data encryption is enabled.
>

Which workloads are you testing and how are you measuring the regression? I
suspect that compaction, repair (validation compaction), streaming, and
quorum reads are probably much slower (probably ~10x slower for the
throughput bound operations and ~2x slower on the read path). As
compaction/repair/streaming usually take up between 10-20% of available CPU
cycles making them 2x slower might show up as <10% overall utilization
increase when you've really regressed 100% or more on key metrics
(compaction throughput, streaming throughput, memory allocation rate, etc
...). For example, if compaction was able to achieve 2 MiBps of throughput
before encryption and it was only able to achieve 1MiBps of throughput
afterwards, that would be a huge real world impact to operators as
compactions now take twice as long.

I think a CEP or details on the ticket that indicate the performance tests
and workloads that will be run might be wise? Perhaps something like
"encryption creates no more than a 1% regression of: compaction throughput
(MiBps), streaming throughput (MiBps), repair validation throughput
(duration of full repair on the entire cluster), read throughput at 10ms
p99 tail at quorum consistency (QPS handled while not exceeding P99 SLO of
10ms), etc ... while a sustained load is applied to a multi-node cluster"?
Even a microbenchmark that just sees how long it takes to encrypt and
decrypt a 500MiB dataset using the proposed JVM implementation versus
encrypting it with a native implementation might be enough to confirm/deny.
For example, keypipe (C, [3]) achieves around 2.8 GiBps symmetric of
AES-GCM and age (golang, ChaCha20-Poly1305, [4]) achieves about 1.6 GiBps
encryption and 1.0 GiBps decryption; from my past experiences with Java
crypto is it would achieve maybe 200 MiBps of _non-authenticated_ AES.

Cheers,
-Joey

[1] https://issues.apache.org/jira/browse/CASSANDRA-15294
[2] https://github.com/corretto/amazon-corretto-crypto-provider
[3] https://github.com/FiloSottile/age
[4] https://github.com/hashbrowncipher/keypipe#encryption

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-16 Thread Joseph Lynch

For FDE you'd probably have  the key file in a tmpfs pulled from a
remote secret manager and when the machine boots it mounts the
encrypted partition that contains your data files. I'm not aware of
anyone doing FDE with a password in production. If you wanted
selective encryption it would make sense to me to support placing
keyspaces on different data directories (this may already be possible)
but since crypto in the kernel is so cheap I don't know why you'd do
selective encryption. Also I think it's worth noting many hosting
providers (e.g. AWS) just encrypt the disks for you so you can check
the "data is encrypted at rest" box.

I think Cassandra will be pretty handicapped by being in the JVM which
generally has very slow crypto. I'm slightly concerned that we're
already slow at streaming and compaction, and adding slow JVM crypto
will make C* even less competitive. For example, if we have to disable
full sstable streaming (zero copy or otherwise) I think that would be
very unfortunate (although Bowen's approach of sharing one secret
across the cluster and then having files use a key derivation function
may avoid that). Maybe if we did something like CASSANDRA-15294 [1] to
try to offload to native crypto like how internode networking did with
tcnative to fix the perf issues with netty TLS with JVM crypto I'd
feel a little less concerned but ... crypto that is both secure and
performant in the JVM is a hard problem ...

I guess I'm just concerned we're going to introduce something that is
either insecure or too slow to be useful.

-Joey

On Tue, Nov 16, 2021 at 8:10 AM Bowen Song  wrote:
>
> I don't like the idea that FDE Full Disk Encryption as an alternative to
> application managed encryption at rest. Each has their own advantages
> and disadvantages.
>
> For example, if the encryption key is the same across nodes in the same
> cluster, and Cassandra can share the key securely between authenticated
> nodes, rolling restart of the servers will be a lot simpler than if the
> servers were using FDE - someone will have to type in the passphrase on
> each reboot, or have a script to mount the encrypted device over SSH and
> then start Cassandra service after a reboot.
>
> Another valid use case of encryption implemented in Cassandra is
> selectively encrypt some tables, but leave others unencrypted. Doing
> this outside Cassandra on the filesystem level is very tedious and
> error-prone - a lots of symlinks and pretty hard to handle newly created
> tables or keyspaces.
>
> However, I don't know if there's enough demand to justify the above use
> cases.
>
>
> On 16/11/2021 14:45, Joseph Lynch wrote:
> > I think a CEP is wise (or a more thorough design document on the
> > ticket) given how easy it is to do security incorrectly and key
> > management, rotation and key derivation are not particularly
> > straightforward.
> >
> > I am curious what advantage Cassandra implementing encryption has over
> > asking the user to use an encrypted filesystem or disks instead where
> > the kernel or device will undoubtedly be able to do the crypto more
> > efficiently than we can in the JVM and we wouldn't have to further
> > complicate the storage engine? I think the state of encrypted
> > filesystems (e.g. LUKS on Linux) is significantly more user friendly
> > these days than it was in 2015 when that ticket was created.
> >
> > If the application has existing exfiltration paths (e.g. backups) it's
> > probably better to encrypt/decrypt in the backup/restore process via
> > something extremely fast (and modern) like piping through age [1]
> > isn't it?
> >
> > [1] https://github.com/FiloSottile/age
> >
> > -Joey
> >
> >
> > On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic
> >  wrote:
> >> Hi list,
> >>
> >> an engineer from Intel - Shylaja Kokoori (who is watching this list
> >> closely) has retrofitted the original code from CASSANDRA-9633 work in
> >> times of 3.4 to the current trunk with my help here and there, mostly
> >> cosmetic.
> >>
> >> I would like to know if there is a general consensus about me going to
> >> create a CEP for this feature or what is your perception on this. I
> >> know we have it a little bit backwards here as we should first discuss
> >> and then code but I am super glad that we have some POC we can
> >> elaborate further on and CEP would just cement  and summarise the
> >> approach / other implementation aspects of this feature.
> >>
> >> I think that having 9633 merged will fill quite a big operational gap
> >> when it comes to security. There are a lot of enterprises who

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-16 Thread Joseph Lynch

> I find it rather strange to offer commit log and hints
encryption at rest but for some reason sstable encryption would be
omitted.

I also think file/disk encryption may be superior in those cases, but
I imagine they were easier to implement in that you don't have to
worry nearly as much about key management since both commit logs and
hints are short lived files that should never leave the box (except
maybe for CDC but I feel like that's similar to backup in terms of
"exfiltration by design").

To be clear, I think in 2015 this feature would have been extremely
useful, but with operating systems and cloud providers often offering
full disk encryption by default now and doing it with really good
(performant and secure) implementations ... I question if it's
something we want to sink cycles into.

-Joey

On Tue, Nov 16, 2021 at 7:01 AM Stefan Miklosovic
 wrote:
>
> I don't object to having the discussion about whether we actually need
> this feature at all :)
>
> Let's hear from people in the field what their perception is on this.
>
> Btw, if we should rely on file system encryption, for what reason is
> there encryption of commit logs and hints already? So this should be
> removed? I find it rather strange to offer commit log and hints
> encryption at rest but for some reason sstable encryption would be
> omitted.
>
> On Tue, 16 Nov 2021 at 15:46, Joseph Lynch  wrote:
> >
> > I think a CEP is wise (or a more thorough design document on the
> > ticket) given how easy it is to do security incorrectly and key
> > management, rotation and key derivation are not particularly
> > straightforward.
> >
> > I am curious what advantage Cassandra implementing encryption has over
> > asking the user to use an encrypted filesystem or disks instead where
> > the kernel or device will undoubtedly be able to do the crypto more
> > efficiently than we can in the JVM and we wouldn't have to further
> > complicate the storage engine? I think the state of encrypted
> > filesystems (e.g. LUKS on Linux) is significantly more user friendly
> > these days than it was in 2015 when that ticket was created.
> >
> > If the application has existing exfiltration paths (e.g. backups) it's
> > probably better to encrypt/decrypt in the backup/restore process via
> > something extremely fast (and modern) like piping through age [1]
> > isn't it?
> >
> > [1] https://github.com/FiloSottile/age
> >
> > -Joey
> >
> >
> > On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic
> >  wrote:
> > >
> > > Hi list,
> > >
> > > an engineer from Intel - Shylaja Kokoori (who is watching this list
> > > closely) has retrofitted the original code from CASSANDRA-9633 work in
> > > times of 3.4 to the current trunk with my help here and there, mostly
> > > cosmetic.
> > >
> > > I would like to know if there is a general consensus about me going to
> > > create a CEP for this feature or what is your perception on this. I
> > > know we have it a little bit backwards here as we should first discuss
> > > and then code but I am super glad that we have some POC we can
> > > elaborate further on and CEP would just cement  and summarise the
> > > approach / other implementation aspects of this feature.
> > >
> > > I think that having 9633 merged will fill quite a big operational gap
> > > when it comes to security. There are a lot of enterprises who desire
> > > this feature so much. I can not remember when I last saw a ticket with
> > > 50 watchers which was inactive for such a long time.
> > >
> > > Regards
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Resurrection of CASSANDRA-9633 - SSTable encryption

2021-11-16 Thread Joseph Lynch

I think a CEP is wise (or a more thorough design document on the
ticket) given how easy it is to do security incorrectly and key
management, rotation and key derivation are not particularly
straightforward.

I am curious what advantage Cassandra implementing encryption has over
asking the user to use an encrypted filesystem or disks instead where
the kernel or device will undoubtedly be able to do the crypto more
efficiently than we can in the JVM and we wouldn't have to further
complicate the storage engine? I think the state of encrypted
filesystems (e.g. LUKS on Linux) is significantly more user friendly
these days than it was in 2015 when that ticket was created.

If the application has existing exfiltration paths (e.g. backups) it's
probably better to encrypt/decrypt in the backup/restore process via
something extremely fast (and modern) like piping through age [1]
isn't it?

[1] https://github.com/FiloSottile/age

-Joey

On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic
 wrote:
>
> Hi list,
>
> an engineer from Intel - Shylaja Kokoori (who is watching this list
> closely) has retrofitted the original code from CASSANDRA-9633 work in
> times of 3.4 to the current trunk with my help here and there, mostly
> cosmetic.
>
> I would like to know if there is a general consensus about me going to
> create a CEP for this feature or what is your perception on this. I
> know we have it a little bit backwards here as we should first discuss
> and then code but I am super glad that we have some POC we can
> elaborate further on and CEP would just cement  and summarise the
> approach / other implementation aspects of this feature.
>
> I think that having 9633 merged will fill quite a big operational gap
> when it comes to security. There are a lot of enterprises who desire
> this feature so much. I can not remember when I last saw a ticket with
> 50 watchers which was inactive for such a long time.
>
> Regards
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] Creating a new slack channel for newcomers

2021-11-09 Thread Joseph Lynch

I also feel that having all the resources to get help in more or less
one place (#cassandra-dev slack / ML) probably helps newcomers on the
whole since they can ask questions and likely engage with someone who
can help. I know that I've asked a few silly questions in
#cassandra-dev and appreciated that there were more experienced
project members to help answer them.

If we wanted to have a set of designated "newcomer mentors" or some
such that seems useful in addition. Perhaps their email/handles on the
website in the contributing section with an encouragement to ask them
first if you're unsure who to ask?

-Joey

On Tue, Nov 9, 2021 at 10:16 AM Sumanth Pasupuleti
 wrote:
>
> +1 that existing channels of communication (cassandra-dev slack and mailing
> lists) should ideally suffice, and I have not seen prohibitive
> communication in those forums thus far that goes against newcomers. I agree
> it can be intimidating, but to Bowen's point, the more traffic we see
> around newcomers in those forums, the more comfortable it gets.
> I agree starting a new channel is a low effort experiment we can do, but
> the success depends on finding mentors and the engagement of mentors vs I
> believe engagement in #cassandra-dev is almost guaranteed given the high
> number of people in the channel.
>
> Thanks,
> Sumanth
>
> On Tue, Nov 9, 2021 at 6:47 AM Bowen Song  wrote:
>
> > As a newcomer (made two commits since October) who has been watching
> > this mailing list since then, I don't like the idea of a separate
> > channel for beginner questions. The volume in this mailing list is
> > fairly low, I can't see any legitimate reason for diverting a portion of
> > that into another channel, further reducing the volume in the existing
> > channel and perhaps not creating much volume in the new channel either.
> >
> > Personally, I think a clearly written and easy to find community
> > guideline highlighting that this mailing list is suitable for beginner
> > questions, and give some suggestions/recommendations on when, where and
> > how to ask beginner questions would be more useful.
> >
> > At the moment because the volume of beginner questions is very very low
> > in this mailing list, newcomers like me don't feel comfortable asking
> > questions here. That's not because there's 600 pair of eyes watching
> > this (TBH, if you didn't mention it, I wouldn't have noticed it), but
> > because the herd mentality. If not many questions are asked here, most
> > people won't start doing that. It's all about creating the environment
> > that makes people feel comfortable asking questions here.
> >
> > On 08/11/2021 16:28, Benjamin Lerer wrote:
> > > Hi everybody,
> > >
> > > Aleksei Zotov mentioned to me that it was a bit intimidating for
> > newcomers
> > > to ask beginner questions in the cassandra-dev channel as it has over 600
> > > followers and that we should probably have a specific channel for
> > > newcomers.
> > > This proposal makes total sense to me.
> > >
> > > What is your opinion on this? Do you have any concerns about it?
> > >
> > > Benjamin
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Welcome Sumanth Pasupuleti as Apache Cassandra Committer

2021-11-05 Thread Joseph Lynch

Congratulations Sumanth!

Well deserved!!

-Joey

On Fri, Nov 5, 2021 at 11:17 AM Oleksandr Petrov
 wrote:
>
> The PMC members are pleased to announce that Sumanth Pasupuleti has
> recently accepted the invitation to become committer.
>
> Sumanth, thank you for all your contributions to the project over the years.
>
> Congratulations and welcome!
>
> The Apache Cassandra PMC members

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] CEP-15: General Purpose Transactions

2021-10-14 Thread Joseph Lynch

1. +1 nb
2. +1 nb
3. +1 nb

I am excited to see a real proposal backed by a number of competent
engineers that will meaningfully improve our ability to deliver
important and complex features for Cassandra.

To be frank, I'm somewhat confused as to the dissent on the CEP
strategy itself (tactical implementation questions aside). The text
seems rather uncontroversial (~= "let's make fast general purpose
transactions") and I feel like it's rather odd to say we don't want to
at least try out an actual solution that has actual engineers with
time to work on it versus any other option where the code doesn't even
begin to exist much less full time engineers willing to spend time on
it.

Certainly this CEP meets the standard for support? It is well thought
out, well researched, a prototype exists, that prototype appears to be
well tested, and the authors significantly engaged with the community
incorporating feedback.

-Joey

On Thu, Oct 14, 2021 at 9:31 AM bened...@apache.org  wrote:
>
> Hi everyone,
>
> I would like to start a vote on this CEP, split into three sub-decisions, as 
> discussion has been circular for some time.
>
> 1. Do you support adopting this CEP?
> 2. Do you support the transaction semantics proposed by the CEP for Cassandra?
> 3. Do you support an incremental approach to developing transactions in 
> Cassandra, leaving scope for future development?
>
> The first vote is a consensus vote of all committers, the second and third 
> however are about project direction and therefore are simple majority votes 
> of the PMC.
>
> Recall that all -1 votes must be accompanied by an explanation. If you reject 
> the CEP only on grounds (2) or (3) you should not veto the proposal. If a 
> majority reject grounds (2) or (3) then transaction developments will halt 
> for the time being.
>
> This vote will be open for 72 hours.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-10-09 Thread Joseph Lynch

> With the proposal hitting the one-month mark, the contributors are interested 
> in gauging the developer community's response to the proposal.

I support this proposal. From what I can understand, this proposal
moves us towards having the building blocks we need to correctly
deliver some of the most often requested features in Cassandra. For
example it seems to unlock: batches that actually work, registers that
offer fast compare and swap, global secondary indices that can be
correctly maintained, and more. Therefore, given the benefit to the
community, I support working towards that foundation that will allow
us to build solutions in Cassandra that pay consensus closer to
mutation instead of lazily at read/repair time.

I think the feedback in this thread around interface (what statements
will this facilitate and how will the library integrate with Cassandra
itself), performance (how fast will these transactions be, will we
offer bounded stale reads, etc ...), and implementation (how does this
compare/contrast with other consensus approaches) has been
informative, but at this point I think it makes sense to start trying
to make incremental progress towards a functional integration to
discover any remaining areas for improvement.

Cheers and thank you!
-Joey



On Thu, Oct 7, 2021 at 10:51 AM C. Scott Andreas  wrote:
>
> Hi Jonathan,
>
> Following up on my message yesterday as it looks like our replies may have 
> crossed en route.
>
> Thanks for bumping your message from earlier in our discussion. I believe we 
> have addressed most of these questions on the thread, in addition to offering 
> a presentation on this and related work at ApacheCon, a discussion hosted 
> following that presentation at ApacheCon, and in ASF Slack. Contributors have 
> further offered an opportuntity to discuss specific questions via 
> videoconference if it helps to speak live. I'd be happy to do so as well.
>
> Since your original message, discussion has covered a lot of ground on the 
> related databases you've mentioned:
> – Henrik has shared expertise related to MongoDB and its implementation.
> – You've shared an overview of Calvin.
> – Alex Miller has helped us review the work relative to other Paxos 
> algorithms and identified a few great enhancements to incorporate.
> – The paper discusses related approaches in FoundationDB, CockroachDB, and 
> Yugabyte.
> – Subsequent discussion has contrasted the implementation to DynamoDB, Google 
> Cloud BigTable, and Google Cloud Spanner (noting specifically that the 
> protocol achieves Spanner's 1x round-trip without requiring specialized 
> hardware).
>
> In my reply yesterday, I've attempted to crystallize what becomes possible 
> via CQL: one-shot multi-partition transactions in the first implementation 
> and a 4x latency reduction on writes / 2x latency reduction on reads relative 
> to today; along with the ability to build upon this work to enable 
> interactive transactions in the future.
>
> I believe we've exercised the questions you've raised and am grateful for the 
> ground we've covered. If you have further questions that are difficult to 
> exercise via email, please let me know if you'd like to arrange a call 
> (open-invite); we'd be happy to discuss live as well.
>
> With the proposal hitting the one-month mark, the contributors are interested 
> in gauging the developer community's response to the proposal. We warrant our 
> ability to focus durably on the project; execute this development on ASF JIRA 
> in collaboration with other contributors; engage with members of the 
> developer and user community on feedback, enhancements, and bugs; and intend 
> deliver it to completion at a standard of readiness suitable for production 
> transactional systems of record.
>
> Thanks,
>
> – Scott
>
> On Oct 6, 2021, at 8:25 AM, C. Scott Andreas  wrote:
>
>
>
> Hi folks,
>
> Thanks for discussion on this proposal, and also to Benedict who’s been 
> fielding questions on the list!
>
> I’d like to restate the goals and problem statement captured by this proposal 
> and frame context.
>
> Today, lightweight transactions limit users to transacting over a single 
> partition. This unit of atomicity has a very low upper limit in terms of the 
> amount of data that can be CAS’d over; and doing so leads many to design 
> contorted data models to cram different types of data into one partition for 
> the purposes of being able to CAS over it. We propose that Cassandra can and 
> should be extended to remove this limit, enabling users to issue one-shot 
> transactions that CAS over multiple keys – including CAS batches, which may 
> modify multiple keys.
>
> To enable this, the CEP authors have designed a novel, leaderless paxos-based 
> protocol unique to Cassandra, offered a proof of its correctness, a 
> whitepaper outlining it in detail, along with a prototype implementation to 
> incubate development, and integrated it with Maelstrom from jepsen.io to 
> validate

Re: [DISCUSS] CEP-15: General Purpose Transactions

2021-09-20 Thread Joseph Lynch

Benedict,

Thank you very much for advancing this proposal, I'm extremely excited
to see flexible quorums used in this way and am looking forward to the
integration of Accord into Cassandra! I read the whitepaper and have a
few questions, but I was wondering what do you think about having some
extended Q after your ApacheCon talk Wednesday (maybe at the end of
the C* track)? It might be higher bandwidth than going back and forth
on email/slack (also given you're presenting on it that might be a
good time to discuss it)?

Briefly
* It might help to have a diagram (perhaps I can collaborate with you
on this?) showing the happy path delay waiting in the reorder buffer
and the messages that are sent in a 2 and 3 datacenter deployment
during the PreAccept, Accept, Commit, Execute, Apply phases. In
particular it was hard for me to follow where exactly I was paying WAN
latency and where we could achieve progress with LAN only (I think
that WAN is always paid during the Consensus Protocol, and then in
most cases execution can remain LAN except in 3+ datacenters where I
think you'd have to include at least one replica in a neighboring
datacenter). In particular, it seems that Accord always pays clock
skew + WAN latency during the reorder buffer (as part of consensus) +
2x LAN latency during execution (to read and then write).
* Relatedly I'm curious if there is any way that the client can
acquire the timestamp used by the transaction before sending the data
so we can make the operations idempotent and unrelated to the
coordinator that was executing them as the storage nodes are
vulnerable to disk and heap failure modes which makes them much more
likely to enter grey failure (slow). Alternatively, perhaps it would
make sense to introduce a set of optional dedicated C* nodes for
reaching consensus that do not act as storage nodes so we don't have
to worry about hanging coordinators (join_ring=false?)?
* Should Algorithm 1 line 12 be PreAcceptOK from Et (not Qt) or should
line 2 read Qt instead of Et?
* I think your claims about clock skew being <1ms in general is
accurate at least for AWS except for when machines boot for the first
time (I can send you some data shortly). It might make sense for
participating members to wait for a minimum detected clock skew before
becoming eligible for electorate?
* I don't really understand how temporarily down replicas will learn
of mutations they missed, did I miss the part where a read replica
would recover all transactions between its last accepted time and
another replica's last accepted time? Or are we just leveraging some
external repair?
* Relatedly since non-transactional reads wouldn't flow through
consensus (I hope) would it make sense for a restarting node to learn
the latest accepted time once and then be deprioritized for all reads
until it has accepted what it missed? Or is the idea that you would
_always_ read transactionally (and since it's a read only transaction
you can skip the WAN consensus and just go straight to fast path
reads)?
* I know the paper says that we elide details of how the shards (aka
replica sets?) are chosen, but it seems that this system would have a
hard dependency on a strongly consistent shard selection system (aka
token metadata?) wouldn't it? In particular if the simple quorums
(which I interpreted to be replica sets in current C*, not sure if
that's correct) can change in non linearizable ways I don't think
Property 3.3 can hold. I think you hint at a solution to this in
section 5 but I'm not sure I grok it.

Super interesting proposal and I am looking forward to all the
improvements this will bring to the project!

Cheers,
-Joey

On Mon, Sep 20, 2021 at 1:34 AM Miles Garnsey
 wrote:
>
> If Accord can fulfil its aims it sounds like a huge improvement to the state 
> of the art in distributed transaction processing. Congrats to all involved in 
> pulling the proposal together.
>
> I was holding off on feedback since this is quite in depth and I don’t want 
> to bike shed, I still haven’t spent as much time understanding this as I’d 
> like.
>
> Regardless, I’ll make the following notes in case they’re helpful. My 
> feedback is more to satisfy my own curiosity and stimulate discussion than to 
> suggest that there are any flaws here. I applaud the proposed testing 
> approach and think it is the only way to be certain that the proposed 
> consistency guarantees will be upheld.
>
> General
>
> I’m curious if/how this proposal addresses issues we have seen when scaling; 
> I see reference to simple majorities of nodes - is there any plan to ensure 
> safety under scaling operations or DC (de)commissioning?
>
> What consistency levels will be supported under Accord? Will it simply be a 
> single CL representing a majority of nodes across the whole cluster? (This at 
> least would mitigate the issues I’ve seen when folks want to switch from 
> EACH_SERIAL to SERIAL).
>
> Accord
>
> > Accord instead assembles an inconsistent set of

Re: Welcome Adam Holmberg as Cassandra committer

2021-08-17 Thread Joseph Lynch

Congratulations Adam!

On Tue, Aug 17, 2021 at 10:25 AM Jordan West  wrote:
>
> Congrats Adam!
>
> On Tue, Aug 17, 2021 at 5:51 AM Paulo Motta 
> wrote:
>
> > Congratulations and well deserved Adam!
> >
> > Em ter., 17 de ago. de 2021 às 03:58, Sumanth Pasupuleti <
> > sumanth.pasupuleti...@gmail.com> escreveu:
> >
> > > Congratulations Adam!!
> > >
> > > On Mon, Aug 16, 2021 at 10:32 PM Berenguer Blasi <
> > berenguerbl...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Well done Adam, congrats!
> > > >
> > > > On 16/8/21 18:27, Andrés de la Peña wrote:
> > > > > Congrats Adam, well deserved!
> > > > >
> > > > > On Mon, 16 Aug 2021 at 17:14, Patrick McFadin 
> > > > wrote:
> > > > >
> > > > >> Great to see you on the committer list Adam!
> > > > >>
> > > > >> On Mon, Aug 16, 2021 at 7:06 AM Jonathan Ellis 
> > > > wrote:
> > > > >>
> > > > >>> Well deserved.  Congratulations!
> > > > >>>
> > > > >>> On Mon, Aug 16, 2021 at 5:57 AM Benjamin Lerer 
> > > > >> wrote:
> > > >   The PMC members are pleased to announce that Adam Holmberg has
> > > > >> accepted
> > > >  the invitation to become committer.
> > > > 
> > > >  Thanks a lot, Adam, for everything you have done for the project
> > all
> > > > >>> these
> > > >  years.
> > > > 
> > > >  Congratulations and welcome
> > > > 
> > > >  The Apache Cassandra PMC members
> > > > 
> > > > >>>
> > > > >>> --
> > > > >>> Jonathan Ellis
> > > > >>> co-founder, http://www.datastax.com
> > > > >>> @spyced
> > > > >>>
> > > >
> > > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > > >
> > >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Welcome Dinesh Joshi as Cassandra PMC member

2021-06-02 Thread Joseph Lynch

Congratulations Dinesh! Well deserved!

-Joey

On Wed, Jun 2, 2021 at 12:23 PM Benjamin Lerer  wrote:
>
>  The PMC's members are pleased to announce that Dinesh Joshi has accepted
> the invitation to become a PMC member.
>
> Thanks a lot, Dinesh, for everything you have done for the project all
> these years.
>
> Congratulations and welcome
>
> The Apache Cassandra PMC members

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Welcome Stefan Miklosovic as Cassandra committer

2021-05-04 Thread Joseph Lynch

Congratulations, Stefan!

On Tue, May 4, 2021 at 9:07 AM Andrés de la Peña
 wrote:
>
> Congrats!
>
> On Tue, 4 May 2021 at 05:47, Berenguer Blasi 
> wrote:
>
> > Congrats Stefan!
> >
> > On 3/5/21 22:24, Yifan Cai wrote:
> > > Congrats!
> > >
> > > On Mon, May 3, 2021 at 1:23 PM Paulo Motta 
> > wrote:
> > >
> > >> Congrats, Stefan! Happy to see you onboard! :)
> > >>
> > >> Em seg., 3 de mai. de 2021 às 17:17, Ben Bromhead 
> > >> escreveu:
> > >>
> > >>> Congrats mate!
> > >>>
> > >>> On Tue, May 4, 2021 at 4:20 AM Scott Andreas 
> > >> wrote:
> >  Congratulations, Štefan!
> > 
> >  
> >  From: David Capwell 
> >  Sent: Monday, May 3, 2021 10:53 AM
> >  To: dev@cassandra.apache.org
> >  Subject: Re: Welcome Stefan Miklosovic as Cassandra committer
> > 
> >  Congrats!
> > 
> > > On May 3, 2021, at 9:47 AM, Ekaterina Dimitrova <
> > >> e.dimitr...@gmail.com
> >  wrote:
> > > Congrat Stefan! Well done!!
> > >
> > > On Mon, 3 May 2021 at 11:49, J. D. Jordan  >  wrote:
> > >> Well deserved!  Congrats Stefan.
> > >>
> > >>> On May 3, 2021, at 10:46 AM, Sumanth Pasupuleti <
> > >> sumanth.pasupuleti...@gmail.com> wrote:
> > >>> Congratulations Stefan!!
> > >>>
> >  On Mon, May 3, 2021 at 8:41 AM Brandon Williams  > >> wrote:
> >  Congratulations, Stefan!
> > 
> > > On Mon, May 3, 2021 at 10:38 AM Benjamin Lerer <
> > >> b.le...@gmail.com>
> > >> wrote:
> > > The PMC's members are pleased to announce that Stefan Miklosovic
> > >>> has
> > > accepted the invitation to become committer last Wednesday.
> > >
> > > Thanks a lot, Stefan,  for all your contributions!
> > >
> > > Congratulations and welcome
> > >
> > > The Apache Cassandra PMC members
> > 
> > >>> -
> >  To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >  For additional commands, e-mail: dev-h...@cassandra.apache.org
> > 
> > 
> > >>
> > >> -
> > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>
> > >>
> > 
> >  -
> >  To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >  For additional commands, e-mail: dev-h...@cassandra.apache.org
> > 
> >  --
> > >>> Ben Bromhead
> > >>>
> > >>> Instaclustr | www.instaclustr.com | @instaclustr
> > >>>  | +64 27 383 8975
> > >>>
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 4.0-rc1

2021-03-31 Thread Joseph Lynch

We have been testing the release at various scales and workloads and
for the most part everything has been working really well (great
performance, zero copy streaming is amazing, compaction is fast, etc
...). However, upon testing incremental repair (currently the default
in 4.0) we hit a potential issue [1]. If confirmed, the bug would
indicate that after an incremental repair run via "nodetool repair"
that performed anticompaction, nodes may be unable to make any forward
progress on compaction.

Is it possible to extend the vote by 24 hours for us to triage the
issue and confirm if it is user error or a legitimate bug? My concern
is that a release candidate may plausibly be put in production and if
nodes might stop compacting that seems to be a potentially serious
stability issue.

[1] https://issues.apache.org/jira/browse/CASSANDRA-16552

-Joey

On Tue, Mar 30, 2021 at 8:54 PM Ben Bromhead  wrote:
>
> https://issues.apache.org/jira/browse/CASSANDRA-16550 :)
>
> On Wed, Mar 31, 2021 at 10:08 AM Mick Semb Wever  wrote:
>
> > >
> > > If we could tidy up the others quickly (I'm happy to submit a PR for
> > > anything that is outstanding) I'm ready to jump on board the train!
> > >
> >
> >
> > The LICENSE and NOTICE issues remain unassigned, if you are keen!
> >
>
>
> --
>
> Ben Bromhead
>
> Instaclustr | www.instaclustr.com | @instaclustr
>  | +64 27 383 8975

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] Releases after 4.0

2021-03-29 Thread Joseph Lynch

I am slightly concerned about removing support for critical bug fixes
in 3.0 on a short time-frame (<1 year). I know of at least a few major
installations, including ours, who are just now able to finish
upgrades to 3.0 in production due to the number of correctness and
performance bugs introduced in that release which have only been
debugged and fixed in the past ~2 years.

I like the idea of the 3-year support cycles, but I think since
3.0/3.11/4.0 took so long to stabilize to a point folks could upgrade
to, we should reset the clock somewhat. What about the following
assuming an April 2021 4.0 cut:

4.0: Fully supported until April 2023 and high severity bugs until
April 2024 (2 year full, 1 year bugfix)
3.11: Fully supported until April 2022 and high severity bugs until
April 2023 (1 year full, 1 year bugfix).
3.0: Supported for high severity correctness/performance bugs until
April 2022 (1 year bugfix)
2.2+2.1: EOL immediately.

Then going forward we could have this nice pattern when we cut the
yearly release:
Y(n-0): Support for 3 years from now (2 full, 1 bugfix)
Y(n-1): Fully supported for 1 more year and supported for high
severity correctness/perf bugs 1 year after that (1 full, 1 bugfix)
Y(n-2): Supported for high severity correctness/bugs for 1 more year (1 bugfix)

What do you think?
-Joey

On Mon, Mar 29, 2021 at 9:39 AM Benjamin Lerer
 wrote:
>
> Thanks to everybody and sorry for not finalizing that email thread sooner.
>
> For the release cadence the agreement is:* one release every year +
> periodic trunc snapshot*
> For the number of releases being supported the agreement is 3.  *Every
> incoming release should be supported for 3 years.*
>
> We did not reach a clear agreement on several points :
> * The naming of versions: semver versus another approach and the name of
> snapshot versions
> * How long will we support 3.11. Taking into account that it has been
> released 4 years ago does it make sense to support it for the next 3 years?
>
> I am planning to open some follow up discussions for those points in the
> coming weeks.
>
> When there is an agreement we should document the changes on the webpage
> > and also highlight it as part of the 4.0 release material as it's an
> > important change to the release cycle and LTS support.
> >
>
> It is a valid point. Do you mind if I update the documentation when we have
> clarified the version names and that we have a more precise idea of when
> 4.0 GA will be released? That will allow us to make a clear message on when
> to expect the next supported version.
>
> On Mon, Feb 8, 2021 at 10:05 PM Paulo Motta 
> wrote:
>
> > +1 to the yearly release cadence + periodic trunk snapshots + support to 3
> > previous release branches.. I think this will give some nice predictability
> > to the project.
> >
> > When there is an agreement we should document the changes on the webpage
> > and also highlight it as part of the 4.0 release material as it's an
> > important change to the release cycle and LTS support.
> >
> > Em sex., 5 de fev. de 2021 às 18:08, Brandon Williams 
> > escreveu:
> >
> > > Perhaps on my third try...  keep three branches total, including 3.11:
> > > 3.11, 4, next. Support for 3.11 begins ending after next+1, is what
> > > I'm trying to convey.
> > >
> > > On Fri, Feb 5, 2021 at 2:58 PM Brandon Williams 
> > wrote:
> > > >
> > > > Err, to be clear: keep 3.11 until we have 3 other branches.
> > > >
> > > > On Fri, Feb 5, 2021 at 2:57 PM Brandon Williams 
> > > wrote:
> > > > >
> > > > > I'm +1 on 3 branches, and thus ~3 years of support.  So in the
> > > > > transition, would we aim to keep 3.11 until after 4.0 and a successor
> > > > > are released?
> > > > >
> > > > > On Fri, Feb 5, 2021 at 11:44 AM Benjamin Lerer
> > > > >  wrote:
> > > > > >
> > > > > > >
> > > > > > > Are we also trying to reach a consensus here that a release
> > branch
> > > should
> > > > > > > be supported for ~3 years (i.e. that we are aiming to limit
> > > ourselves to 3
> > > > > > > release branches plus trunk)?
> > > > > >
> > > > > >
> > > > > > 3 release branches make sense to me +1
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 5, 2021 at 6:15 PM Michael Semb Wever 
> > > wrote:
> > > > > >
> > > > > > >
> > > > > > > > I believe that there is an appetite for the bleeding edge
> > > snapshots where
> > > > > > > > we do not guarantee stability and that the semver discussion is
> > > not
> > > > > > > > finished yet but I would like us to let those discussions go
> > for
> > > some
> > > > > > > > follow up threads.
> > > > > > > > My goal with this thread was to reach an agreement on a release
> > > cadence
> > > > > > > for
> > > > > > > > the version we will officially support after 4.0.
> > > > > > > >
> > > > > > > > My impression is that most people agree with *one release every
> > > year* so
> > > > > > > I
> > > > > > > > would like to propose it as our future release cadence.
> > > > > > > >
> > > > > > >
> > > > > > >
> > > >

Re: [VOTE] Accept the Harry donation

2020-09-16 Thread Joseph Lynch

+1 (non-binding)

On Wed, Sep 16, 2020 at 11:10 AM Jordan West  wrote:
>
> +1
>
> On Wed, Sep 16, 2020 at 10:29 AM sankalp kohli 
> wrote:
>
> > +1
> >
> > On Wed, Sep 16, 2020 at 10:07 AM Ekaterina Dimitrova <
> > e.dimitr...@gmail.com>
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > On Wed, 16 Sep 2020 at 12:52, Dinesh Joshi  wrote:
> > >
> > > > +1
> > > >
> > > >
> > > >
> > > > Dinesh
> > > >
> > > >
> > > >
> > > > > On Sep 16, 2020, at 9:30 AM, Joshua McKenzie 
> > > > wrote:
> > > >
> > > > >
> > > >
> > > > > +1
> > > >
> > > > >
> > > >
> > > > >
> > > >
> > > > >> On Wed, Sep 16, 2020 at 11:22 AM, Aleksey Yeshchenko <
> > > >
> > > > >> alek...@apple.com.invalid> wrote:
> > > >
> > > > >>
> > > >
> > > > >> +1
> > > >
> > > > >>
> > > >
> > > > >> On 16 Sep 2020, at 16:09, Sumanth Pasupuleti
> > > >  > > >
> > > > >> com> wrote:
> > > >
> > > > >>
> > > >
> > > > >> +1 (non-binding)
> > > >
> > > > >>
> > > >
> > > > >> On Wed, Sep 16, 2020 at 7:41 AM Jon Meredith  > >
> > > >
> > > > >> wrote:
> > > >
> > > > >>
> > > >
> > > > >> +1 (non-binding)
> > > >
> > > > >>
> > > >
> > > > >> On Wed, Sep 16, 2020 at 8:28 AM David Capwell
> > > >
> > > > >>  wrote:
> > > >
> > > > >>
> > > >
> > > > >> +1
> > > >
> > > > >>
> > > >
> > > > >> Sent from my iPhone
> > > >
> > > > >>
> > > >
> > > > >> On Sep 16, 2020, at 6:34 AM, Brandon Williams 
> > > >
> > > > >>
> > > >
> > > > >> wrote:
> > > >
> > > > >>
> > > >
> > > > >> +1
> > > >
> > > > >>
> > > >
> > > > >> On Wed, Sep 16, 2020, 4:45 AM Mick Semb Wever 
> > wrote:
> > > >
> > > > >>
> > > >
> > > > >> This vote is about officially accepting the Harry donation from Alex
> > > >
> > > > >>
> > > >
> > > > >> Petrov
> > > >
> > > > >>
> > > >
> > > > >> and Benedict Elliott Smith, that was worked on in CASSANDRA-15348.
> > > >
> > > > >>
> > > >
> > > > >> The Incubator IP Clearance has been filled out at
> > > >
> > > > >>
> > http://incubator.apache.org/ip-clearance/apache-cassandra-harry.html
> > > >
> > > > >>
> > > >
> > > > >> This vote is a required part of the IP Clearance process. It follows
> > > >
> > > > >>
> > > >
> > > > >> the
> > > >
> > > > >>
> > > >
> > > > >> same voting rules as releases, i.e. from the PMC a minimum of three
> > > >
> > > > >>
> > > >
> > > > >> +1s and
> > > >
> > > > >>
> > > >
> > > > >> no -1s.
> > > >
> > > > >>
> > > >
> > > > >> Please cast your votes:
> > > >
> > > > >> [ ] +1 Accept the contribution into Cassandra
> > > >
> > > > >> [ ] -1 Do not
> > > >
> > > > >>
> > > >
> > > > >>
> > -
> > > To
> > > >
> > > > >> unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For
> > > > additional
> > > >
> > > > >> commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > > > >>
> > > >
> > > > >>
> > -
> > > To
> > > >
> > > > >> unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For
> > > > additional
> > > >
> > > > >> commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > > > >>
> > > >
> > > > >>
> > -
> > > To
> > > >
> > > > >> unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For
> > > > additional
> > > >
> > > > >> commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > > > >>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > -
> > > >
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > >
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > > >
> > > >
> > > >
> > >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Project governance wiki doc (take 2)

2020-06-22 Thread Joseph Lynch

On Mon, Jun 22, 2020 at 3:23 AM Benedict Elliott Smith
 wrote:
>
> If you read the clauses literally there's no conflict - not all committers 
> that +1 the change need to review the work.  It just means that two 
> committers have indicated they are comfortable with the patch being merged.  
> One of the +1s could be based on another pre-existing review and trust in 
> both the contributor's and reviewer's knowledge of the area; and/or by 
> skimming the patch.  Though they should make it clear that they did not 
> review the patch when +1ing, so there's no ambiguity.

Ah, I understand now, thank you Benedict for explaining. If I
understand correctly the intention is that all patches must be
~"deeply understood" by at least two contributors (author + reviewer)
and one of those contributors must be a comitter. In addition, at
least two committers must support the patch being merged not
necessarily having done a detailed review.

I like the phrase "+1. I support this patch" vs a "+1 I have reviewed
this patch and support it". I suppose that if the +1 is coming from a
person in the reviewer field the "I have reviewed it" is perhaps
implicit.

> Perhaps we should elaborate on the document to avoid this confusion, as this 
> has come up multiple times.

I was confused but now I think I understand it and agree with you that
the wording is not in conflict. After the document is finalized I can
add a FAQ section and, if people think it reasonable, to
https://cassandra.apache.org/doc/latest/development/how_to_commit.html
.

-Joey

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Project governance wiki doc (take 2)

2020-06-21 Thread Joseph Lynch

+1 (nb).

Thank you Josh for advocating for these changes!

I am curious about how Code Contribution Guideline #2 reading "Code
modifications must have been reviewed by at least one other
contributor" and Guideline #3 reading "Code modifications require two
+1 committer votes (can be author + reviewer)" will work in practice.
Specifically, if a contributor submits a ticket reporting a bug with a
patch attached and then it is reviewed by a committer and committed
that would appear sufficient under Code Contribution Guideline #2 but
insufficient under Code Contribution Guideline #3? I'm sorry if this
was discussed before I just want to make sure going forward I properly
follow the to be adopted guidelines.

Thanks again!
-Joey


On Sun, Jun 21, 2020 at 8:34 AM Jon Haddad  wrote:
>
> +1 binding
>
> On Sat, Jun 20, 2020, 11:24 AM Jordan West  wrote:
>
> > +1 (nb)
> >
> > On Sat, Jun 20, 2020 at 11:13 AM Jonathan Ellis  wrote:
> >
> > > +1
> > >
> > > On Sat, Jun 20, 2020 at 10:12 AM Joshua McKenzie 
> > > wrote:
> > >
> > > > Link to doc:
> > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+Project+Governance
> > > >
> > > > Change since previous cancelled vote:
> > > > "A simple majority of this electorate becomes the low-watermark for
> > votes
> > > > in favour necessary to pass a motion, with new PMC members added to the
> > > > calculation."
> > > >
> > > > This previously read "super majority". We have lowered the low water
> > mark
> > > > to "simple majority" to balance strong consensus against risk of stall
> > > due
> > > > to low participation.
> > > >
> > > >
> > > >- Vote will run through 6/24/20
> > > >- pmc votes considered binding
> > > >- simple majority of binding participants passes the vote
> > > >- committer and community votes considered advisory
> > > >
> > > > Lastly, I propose we take the count of pmc votes in this thread as our
> > > > initial roll call count for electorate numbers and low watermark
> > > > calculation on subsequent votes.
> > > >
> > > > Thanks again everyone (and specifically Benedict and Jon) for the time
> > > and
> > > > collaboration on this.
> > > >
> > > > ~Josh
> > > >
> > >
> > >
> > > --
> > > Jonathan Ellis
> > > co-founder, http://www.datastax.com
> > > @spyced
> > >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Google Season of Docs 2020 participation

2020-04-29 Thread Joseph Lynch

Given the datastax donation I agree it makes sense for us to propose
projects that are unlikely to overlap. What do people think about
documentation that is slightly more instructional as opposed to
informational?

Perhaps some kind of tutorial series on "Practical examples of
building useful systems with Cassandra":
* How to use Cassandra to store sensor data (maybe from a fictional
IOT device). This could include data models, functioning Java/Python
code running a HTTP service, and benchmarking examples.
* How to use Cassandra as a global record store for storing strongly
typed configuration for services. This could include architecture,
data models, explore use of different consistency level, etc ...
* How to safely use Cassandra's CRDT data models (or just LWT for low
scale) to implement a state machine (aka locking) service.

Or we could go the operations direction:
* How to get a small test cluster running on Kubernetes (or in an ASG
on AWS / azure / ). This
could include setup, scaling, repairing, monitoring, etc ...

Or something along those lines...

Cheers,
-Joey

> On Monday, April 27, 2020, 10:37:08 p.m. UTC, Dinesh Joshi 
>  wrote:
>
>  Folks,
>
> GSoD 2020 is upon us. The organizational applications are due soon May 4th 
> 2020 and I'd like us to participate in it again. GSoD 2019 brought in great 
> deal of improvements to the C* docs and I believe GSoD 2020 will be able to 
> bring in more enhancements. I realize we are also talking about docs 
> donations from DataStax but the docs project can be focused on 4.0 which 
> would not overlap with the donation. If you have opinions, please let me know.
>
> Cheers,
>
> Dinesh
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Idea: Marking required scope for 4.0 GA vs. optional

2020-03-31 Thread Joseph Lynch

On Tue, Mar 31, 2020 at 1:27 PM Jake Luciani  wrote:
>
> Can we agree to move the improvements out to 4.0.x?

Generally I've been asked to put performance issues as improvements,
e.g. CASSANDRA-15379. To be frank though we can't run ZstdCompressor
on real clusters without that patch, and therefore I wouldn't feel
great releasing ZstdCompressor in 4.0 without that patch.

I'm fine to start calling all performance issues "bugs" since at least
in our deployments and I think in many others performance regressions
are P0 bugs that cost a lot of $$, or we can just keep calling them
improvements and just tag them with the ~right target fix version.
Namely 4.0-alpha if the change impacts any public interface in a non
backwards compatible way (yaml, properties, cql, jmx etc...), 4.0-beta
or later if it does not require changes to public interfaces.

-Joey

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Joseph Lynch

I think that we might be bikeshedding this number a bit because it is easy
to debate and there is not yet one right answer. I hope we recognize either
choice (4 or 16) is fine in that users can always override us and we can
always change our minds later or better yet improve allocation so users
don't have to care. Either choice is an improvement on the status quo. I
only truly care that when we change this default let's make sure that:
1. Users can still launch a cluster all at once. Last I checked even with
allocate_for_rf you need to bootstrap one node at a time for even
allocation to work properly; please someone correct me if I'm wrong, and if
I'm not let's get this fixed before the beta.
2. We get good documentation about this choice into our docs.
[documentation team and I are on it!]

I don't like phrasing this as a "small user" vs "large user" discussion.
Everybody using Cassandra wants it to be easy to operate with high
availability and consistent performance. Optimizing for "I can oops a few
nodes and not have an outage" is an important thing to optimize for
regardless of scale. It seems we have a lot of input on this thread that
we're frequently seeing users override this to 4 (apparently even with
random allocation? I am personally surprised by this if true). Some people
have indicated that they like a higher number like 16 or 32. Some (most?)
of our largest users by footprint are still using 1.

The only significant advantage I'm aware of for 16 over 4 is that users can
scale up and down in increments of N/16 (12 node cluster -> 1) instead of
N/4 (12 node cluster -> 3) without further token allocation improvements in
Cassandra. Practically speaking I think people are often spreading nodes
out over RF=3 "racks" (e.g. GCP, Azure, and AWS) so they'll want to scale
by increments of 3 anyways. I agree with Jon that optimizing for
scale-downs is odd; it's a pretty infrequent operation and all the users I
know doing autoscaling are doing it vertically using networked attached
storage (~EBS). Let's also remember repairing clusters with 16 tokens per
node is slower (probably about 2-4x slower) than repairing clusters with 4
tokens.

With zero copy streaming there should no benefit to more tokens for data
transfer, if there is, it is a bug in streaming performance and let's fix
it.
Honestly, in my opinion if we have balancing issues with small number of
tokens that is a bug and we should just fix it; token moves are safe, it is
definitely possible for Cassandra to just self-balance itself.

Let's not worry about scaring off users with this choice, choosing 4 will
not scare off users any more than 256 random tokens has scared off users
when they realized that they can't have any combination of two nodes down
in different racks.

-Joey

On Fri, Jan 31, 2020 at 10:16 AM Carl Mueller
 wrote:

> edit: 4 is bad at small cluster sizes and could scare off adoption
>
> On Fri, Jan 31, 2020 at 12:15 PM Carl Mueller <
> carl.muel...@smartthings.com>
> wrote:
>
> > "large/giant clusters and admins are the target audience for the value we
> > select"
> >
> > There are reasons aside from massive scale to pick cassandra, but the
> > primary reason cassandra is selected technically is to support vertically
> > scaling to large clusters.
> >
> > Why pick a value that once you reach scale you need to switch token
> count?
> > It's still a ticking time bomb, although 16 won't be what 256 is.
> >
> > H. But 4 is bad and could scare off adoption.
> >
> > Ultimately a well-written article on operations and how to transition
> from
> > 16 --> 4 and at what point that is a good idea (aka not when your cluster
> > is too big) should be a critical part of this.
> >
> > On Fri, Jan 31, 2020 at 11:45 AM Michael Shuler 
> > wrote:
> >
> >> On 1/31/20 9:58 AM, Dimitar Dimitrov wrote:
> >> > one corollary of the way the algorithm works (or more
> >> > precisely might not work) with multiple seeds or simultaneous
> >> > multi-node bootstraps or decommissions, is that a lot of dtests
> >> > start failing due to deterministic token conflicts. I wasn't
> >> > able to fix that by changing solely ccm and the dtests
> >> I appreciate all the detailed discussion. For a little historic context,
> >> since I brought up this topic in the contributors zoom meeting, unstable
> >> dtests was precisely the reason we moved the dtest configurations to
> >> 'num_tokens: 32'. That value has been used in CI dtest since something
> >> like 2014, when we found that this helped stabilize a large segment of
> >> flaky dtest failures. No real science there, other than "this hurts
> less."
> >>
> >> I have no real opinion on the suggestions of using 4 or 16, other than I
> >> believe most "default config using" new users are starting with smaller
> >> numbers of nodes. The small-but-growing users and veteran large cluster
> >> admins should be gaining more operational knowledge and be able to
> >> adjust their own config choices according to their needs

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-30 Thread Joseph Lynch

Any objections to the compromise of 16 as proposed in Chris's original
patch?

-Joey

On Thu, Jan 30, 2020, 3:47 PM Anthony Grasso 
wrote:

> I think lowering the number of tokens is a great idea! Similar to Jon, when
> I have reduced the number of tokens for clients it has been improvement in
> repair performance.
>
> I am concerned that the proposed default value for num_tokens is too low.
> If you set up a cluster using the proposed defaults, you will get a
> balanced cluster. However, if you decommission nodes you will start to see
> large imbalances especially for small clusters (< 20 nodes). This is
> because the allocate_tokens_for_local_replication_factor setting is only
> applied during the bootstrap process.
>
> I have recommended very low values for num_tokens to clients. This was
> because it was very unlikely that they would reduce their cluster size and
> I warned them of the caveats with using a small value for num_tokens.
>
> The proposed num_token default value is fine for devs and operators that
> know what they are doing. However, the general Cassandra community will be
> unaware of the potential issue with such a low value. We should consider
> setting num_tokens to 16 - 32 as the default. This will at least help
> reduce the severity of the imbalance when decommissioning a node whilst
> still providing the benefits of having a low number of tokens. In addition,
> we can add a comment to num_tokens that clusters over 100 nodes (per
> datacenter) should consider reducing it down to 4.
>
> Cheers,
> Anthony
>
> On Fri, 31 Jan 2020 at 01:58, Jon Haddad  wrote:
>
> > Larger clusters is where high token counts do the most damage. That's why
> > it's such a problem. You start out with a small cluster using 256, as you
> > grow into the hundreds it becomes more and more unstable.
> >
> >
> > On Thu, Jan 30, 2020, 8:19 AM onmstester onmstester
> >  wrote:
> >
> > > Shouldn't we consider the cluster size to configure num_tokens?
> > >
> > > For example is it OK to use num_tokens=4 for a cluster of more than 100
> > of
> > > nodes?
> > >
> > >
> > >
> > > Another question that is not so much relevant to this :
> > >
> > > When we use the token assignment algorithm (the new/non-random one)
> for a
> > > specific keyspace, why should we use initial token for all the seeds,
> > isn't
> > > one seed enough and then just set the keyspace for all other nodes?
> > >
> > >
> > >
> > > Also i do not understand why should we consider rack topology and
> number
> > > of racks for configuration of num_tokens?
> > >
> > >
> > >
> > > Sent using https://www.zoho.com/mail/
> > >
> > >
> > >
> > >
> > >  On Thu, 30 Jan 2020 04:33:57 +0330 Jeremy Hanna <
> > > jeremy.hanna1...@gmail.com> wrote 
> > >
> > >
> > > The new default wouldn't be retroactively set for 3.x, but the same
> > > principles apply.  The new algorithm is in 3.x as well as the
> > > simplification of the configuration.  So no reason not to use the same
> > > configuration on 3.x.
> > >
> > > > On Jan 30, 2020, at 4:34 AM, Chen-Becker, Derek  > > dchen...@amazon.com.INVALID> wrote:
> > > >
> > > > Does the same guidance apply to 3.x clusters? I read through the JIRA
> > > ticket linked below, along with tickets that it links to, but it's not
> > > clear that the new allocation algorithm is available in 3.x or if there
> > are
> > > other reasons that this would be problematic.
> > > >
> > > > Thanks,
> > > >
> > > > Derek
> > > >
> > > > On 1/29/20, 9:54 AM, "Jon Haddad"  wrote:
> > > >
> > > >Ive put a lot of my previous clients on 4 tokens, all of which
> have
> > > >resulted in a major improvement.
> > > >
> > > >I wouldn't use any more than 4 except under some pretty unusual
> > > >circumstances.
> > > >
> > > >Jon
> > > >
> > > >On Wed, Jan 29, 2020, 11:18 AM Ben Bromhead  > > b...@instaclustr.com> wrote:
> > > >
> > > >> +1 to reducing the number of tokens as low as possible for
> > availability
> > > >> issues. 4 lgtm
> > > >>
> > > >> On Wed, Jan 29, 2020 at 1:14 AM Dinesh Joshi  > djo...@apache.org>
> > > wrote:
> > > >>
> > > >>> Thanks for restarting this discussion Jeremy. I personally think 4
> is
> > > a
> > > >>> good number as a default. I think whatever we pick, we should have
> > > enough
> > > >>> documentation for operators to make sense of the new defaults in
> 4.0.
> > > >>>
> > > >>> Dinesh
> > > >>>
> > >  On Jan 28, 2020, at 9:25 PM, Jeremy Hanna  > > jeremy.hanna1...@gmail.com>
> > > >>> wrote:
> > > 
> > >  I wanted to start a discussion about the default for num_tokens
> that
> > > >>> we'd like for people starting in Cassandra 4.0.  This is for ticket
> > > >>> CASSANDRA-13701 <
> > https://issues.apache.org/jira/browse/CASSANDRA-13701>
> > >
> > > >>> (which has been duplicated a number of times, most recently by me).
> > > 
> > >  TLDR, based on availability concerns, skew concerns, operational
> > > >>> concerns, and based on the fact

Re: [VOTE] Cassandra Enhancement Proposal (CEP) documentation

2019-11-01 Thread Joseph Lynch

+1

-Joey

On Fri, Nov 1, 2019 at 5:33 AM Mick Semb Wever  wrote:

> Please vote on accepting the Cassandra Enhancement Proposal (CEP) document
> as a starting point and guide towards improving collaboration on, and
> success of, new features and significant improvements. In combination with
> the recently accepted Cassandra Release Lifecycle documentation this should
> help us moving forward as a project and community.
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652201
>
> Past discussions on the document/process have been touched on in a number
> of threads in the dev ML.  The most recent thread was
> https://lists.apache.org/thread.html/b5d1b1ca99324f84e4a40b9cba879e8f858f5f6e18447775fcf32155@%3Cdev.cassandra.apache.org%3E
>
> regards,
> Mick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: Cassandra image for Kubernetes

2019-09-20 Thread Joseph Lynch

On Fri, Sep 20, 2019 at 8:09 AM Ben Bromhead  wrote:
>
> Providing an official docker image is a little tricky, as despite what
> container marketing would tell you, containers need to make assumptions
> about outside orchestration/management methods. Folks in this thread have
> already identified differences in kubernetes distro's let alone other
> container schedulers.

Just to clarify the proposal Vinay and I made at the summit, I don't
think that we can provide a single image that works for every single
use case just like Cassandra's out of the box debian package does not
work perfectly out of the box for many configuration management /
orchestration systems, nor did I intend to propose we provide an
official image for kubernetes, marathon, DCOS, or whichever new
scheduler is popular these days.

I do think we can offer two generally configurable and reasonable base
Cassandra containers, one for testing and one for production. Both
containers must provide the requisite pluggability seams for common
use cases just like the debian and rpm packages do. For example, a
pluggable configuration file (I usually offer overrides via
environment variables in my images), ability to call a user provided
bash script before starting the daemon, a way to change jvm options,
etc ... These seams would then be documented so that people can easily
plug in their needed functionality. The testing image would optimize
for fast startup and single node clusters (e.g. turning off vnodes,
skipping integrity checks, etc ...). The production image would
naturally not turn these things off.

If a user cannot plug-in their functionality then they can raise a bug
report explaining the difficulty and we can either add the needed seam
or say "sounds like you need a custom image". It would be nice if the
community contributed documentation for "here is how you take the
production image and run it on kubernetes" but I don't think the
Cassandra developers need to maintain this integration.

Yes, these will not satisfy every use case, but they are still very
valuable even if only as a starting point for the community.

-Joey

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 4.0-alpha1 (24 hour vote)

2019-09-06 Thread Joseph Lynch

On Fri, Sep 6, 2019 at 12:57 AM Sankalp Kohli  wrote:
>
> Can we have a vote once the tests pass? I know we all are including me are 
> excited about cutting this alpha but we cannot cut a release with test 
> failing or not being run due to some Java home issue.
>
> If people have already started using the alpha artifacts, then I suggest we 
> make test passing a blocker for next alpha

Test followup tickets, all tagged 4.0-alpha fixversion to hit the next
alpha release:
https://issues.apache.org/jira/browse/CASSANDRA-15309 (probably most
important to make the upgrade tests run on trunk, I believe Vinay
already has a patch)
https://issues.apache.org/jira/browse/CASSANDRA-15311
https://issues.apache.org/jira/browse/CASSANDRA-15310
https://issues.apache.org/jira/browse/CASSANDRA-15309
https://issues.apache.org/jira/browse/CASSANDRA-15308
https://issues.apache.org/jira/browse/CASSANDRA-15307

Also found two more due to manual testing, neither appears alpha1
blocking IMO although 15305 might be nice to merge since Chris already
has a patch and I think it's a minor annoyance testing users will hit
quickly:
https://issues.apache.org/jira/browse/CASSANDRA-15306
https://issues.apache.org/jira/browse/CASSANDRA-15305 (already patch available)

-Joey

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Release Apache Cassandra 4.0-alpha1 (24 hour vote)

2019-09-05 Thread Joseph Lynch

Running all tests at
https://circleci.com/workflow-run/79918e2a-ea8e-48a6-a38d-96cf85de27ff

Will report back with results shortly,
-Joey

On Thu, Sep 5, 2019 at 3:55 PM Jon Haddad  wrote:

> +1
>
> On Thu, Sep 5, 2019 at 3:44 PM Michael Shuler 
> wrote:
>
> > I propose the following artifacts for release as 4.0-alpha1.
> >
> > sha1: fc4381ca89ab39a82c9018e5171975285cc3bfe7
> > Git:
> >
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0-alpha1-tentative
> > Artifacts:
> >
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1177/org/apache/cassandra/apache-cassandra/4.0-alpha1/
> > Staging repository:
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1177/
> >
> > The Debian and RPM packages are available here:
> > http://people.apache.org/~mshuler
> >
> > The vote will be open for 24 hours (longer if needed).
> >
> > [1]: CHANGES.txt:
> >
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0-alpha1-tentative
> > [2]: NEWS.txt:
> >
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0-alpha1-tentative
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>

Re: 4.0 alpha before apachecon?

2019-08-29 Thread Joseph Lynch

We hashed this out a bit in slack and I think the current rough
consensus (please speak up if you don't agree) is that we update our
release guidelines [1] to allow API changes between alpha and beta as
the common artifact is useful for testing and we will probably end up
finding API breakage while testing that must be fixed. Benedict helped
out by creating 4.0-alpha [2] and 4.0-beta [3] fix versions so we can
track what tickets are (roughly) blocking the next alpha/beta release.
If you feel that something you're working on should block the alpha or
the beta please help out and tag it with the proper fix version. The
idea is that we know which outstanding tickets exist and are impacting
the next alpha/beta, even if we ignore it and cut anyways at least we
can separate "this has to happen before beta" from "this has to happen
before release candidates".

I think the next decision is should we just cut 4.0-alpha1 now given
that Michael has some cycles regardless of the known issues and start
using the new fix versions for the 4.0-alpha2 release? I personally
feel we should cut 4.0-alpha1 with every imaginable "expect this
release to break" disclaimer and start working towards 4.0-alpha2.

[1] 
https://docs.google.com/document/d/1bS6sr-HSrHFjZb0welife6Qx7u3ZDgRiAoENMLYlfz8/edit
[2] 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20fixVersion%20%3D%204.0-alpha
[3] 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20fixVersion%20%3D%204.0-beta

What do people think?
-Joey

On Wed, Aug 28, 2019 at 10:58 AM Michael Shuler  wrote:
>
> Thanks for the reminder :) I have a few days of availability to prep a
> 4.0 alpha release. It's an alpha, so I don't have a problem with known
> issues needing work.
>
> I will have an internet-less period of time starting roughly Tuesday 9/3
> through about Friday 9/13. I might get lucky and have a little network
> access in the middle of that time, but I'm not counting on it.
>
> --
> Michael
>
> On 8/28/19 10:51 AM, Jon Haddad wrote:
> > Hey folks,
> >
> > I think it's time we cut a 4.0 alpha release.  Before I put up a vote
> > thread, is there a reason not to have a 4.0 alpha before ApacheCon /
> > Cassandra Summit?
> >
> > There's a handful of small issues that I should be done for 4.0 (client
> > list in virtual tables, dynamic snitch improvements, fixing token counts),
> > I'm not trying to suggest we don't include them, but they're small enough I
> > think it's OK to merge them in following the first alpha.
> >
> > Jon
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Stabilising Internode Messaging in 4.0

2019-04-09 Thread Joseph Lynch

Let's try this again, apparently email is hard ...

I am relatively new to these code paths—especially compared to the
committers that have been working on these issues for years such as
the 15066 authors as well as Jason Brown—but like many Cassandra users
I am familiar with many of the classes of issues Aleksey and Benedict
have identified with this patchset (especially related to messaging
correctness, performance and the lack of message backpressure). We
believe that every single fix and feature in this patch is valuable
and we desire that we are able to get them all merged in and
validated. We don’t think it’s even a question if we want to merge
these: we should want these excellent changes. The only questions—in
my opinion—are how do we safely merge them and when do we merge them?

Due to my and Vinay’s relative lack of knowledge of these code paths,
we hope that we can get as many experienced eyes as we can to review
the patch and evaluate the risk-reward tradeoffs of some of the deeper
changes. We don’t feel qualified to make assertions about risk vs
reward in this patchset, but I know there are a number of people on
this mailing list who are qualified and I think we would all
appreciate their insight and help.

I completely understand that we don’t live in an ideal world, but I do
personally feel that in an ideal world it would be possible to pull
the bug fixes (bugs specific to the 4.0 netty refactor) out from the
semantic changes (e.g. droppability, checksumming, back pressure,
handshake changes), code refactors (e.g. verb handler,
MessageIn/MessageOut) and performance changes (various
re-implementations of Netty internals, some optimizations around
dropping dead messages earlier). Then we can review, validate, and
benchmark each change independently and iteratively move towards
better messaging. At the same time, I recognize that it may be hard to
pull these changes apart, but I worry that review and validation of
the patch, as is, may take the testing community many months to
properly vet and will either mean that we cut 4.0 many, many months
from now or we cut 4.0 before we can properly test the patchset.

I think we are all agreed we don’t want an unstable 4.0, so the main
decision point here is: what set of changes from this valuable and
important patch set do we put in 4.0, and which do we try to put in
4.next? Once we determine that, the community can hopefully start
allocating the necessary review, testing, and benchmarking resources
to ensure that 4.0 is our first ever rock solid “.0” release.

-Joey

On Thu, Apr 4, 2019 at 5:56 PM Jon Haddad  wrote:
>
> Given the number of issues that are addressed, I definitely think it's
> worth strongly considering merging this in.  I think it might be a
> little unrealistic to cut the first alpha after the merge though.
> Being realistic, any 20K+ LOC change is going to introduce its own
> bugs, and we should be honest with ourselves about that.  It seems
> likely the issues the patch addressed would have affected the 4.0
> release in some form *anyways* so the question might be do we fix them
> now or after someone's cluster burns down because there's no inbound /
> outbound message load shedding.
>
> Giving it a quick code review and going through the JIRA comments
> (well written, thanks guys) there seem to be some pretty important bug
> fixes in here as well as paying off a bit of technical debt.
>
> Jon
>
> On Thu, Apr 4, 2019 at 1:37 PM Pavel Yaskevich  wrote:
> >
> > Great to see such a significant progress made in the area!
> >
> > On Thu, Apr 4, 2019 at 1:13 PM Aleksey Yeschenko  wrote:
> >
> > > I would like to propose CASSANDRA-15066 [1] - an important set of bug 
> > > fixes
> > > and stability improvements to internode messaging code that Benedict, I,
> > > and others have been working on for the past couple of months.
> > >
> > > First, some context.   This work started off as a review of 
> > > CASSANDRA-14503
> > > (Internode connection management is race-prone [2]), CASSANDRA-13630
> > > (Support large internode messages with netty) [3], and a pre-4.0
> > > confirmatory review of such a major new feature.
> > >
> > > However, as we dug in, we realized this was insufficient. With more than 
> > > 50
> > > bugs uncovered [4] - dozens of them critical to correctness and/or
> > > stability of the system - a substantial rework was necessary to guarantee 
> > > a
> > > solid internode messaging subsystem for the 4.0 release.
> > >
> > > In addition to addressing all of the uncovered bugs [4] that were unique 
> > > to
> > > trunk + 13630 [3] + 14503 [2], we used this opportunity to correct some
> > > long-existing, pre-4.0 bugs and stability issues. For the complete list of
> > > notable bug fixes, read the comments to CASSANDRA-15066 [1]. But I’d like
> > > to highlight a few.
> > >
> > > # Lack of message integrity checks
> > >
> > > It’s known that TCP checksums are too weak [5] and Ethernet CRC cannot be
> > > relied upon [6] for

Re: Stabilising Internode Messaging in 4.0

2019-04-09 Thread Joseph Lynch

*I am relatively new to these code paths—especially compared to the
committers that have been working on these issues for years such as the
15066 authors as well as Jason Brown—but like many Cassandra users I am
familiar with many of the classes of issues Aleksey and Benedict have
identified with this patchset (especially related to messaging correctness,
performance and the lack of message backpressure). We believe that every
single fix and feature in this patch is valuable and we desire that we are
able to get them all merged in and validated. We don’t think it’s even a
question if we want to merge these: we should want these excellent changes.
The only questions—in my opinion—are how do we safely merge them and when
do we merge them?Due to my and Vinay’s relative lack of knowledge of these
code paths, we hope that we can get as many experienced eyes as we can to
review the patch and evaluate the risk-reward tradeoffs of some of the
deeper changes. We don’t feel qualified to make assertions about risk vs
reward in this patchset, but I know there are a number of people on this
mailing list who are qualified and I think we would all appreciate their
insight and help.I completely understand that we don’t live in an ideal
world, but I do personally feel that in an ideal world it would be possible
to pull the bug fixes (bugs specific to the 4.0 netty refactor) out from
the semantic changes (e.g. droppability, checksumming, back pressure,
handshake changes), code refactors (e.g. verb handler,
MessageIn/MessageOut) and performance changes (various re-implementations
of Netty internals, some optimizations around dropping dead messages
earlier). Then we can review, validate, and benchmark each change
independently and iteratively move towards better messaging. At the same
time, I recognize that it may be hard to pull these changes apart, but I
worry that review and validation of the patch, as is, may take the testing
community many months to properly vet and will either mean that we cut 4.0
many, many months from now or we cut 4.0 before we can properly test the
patchset.I think we are all agreed we don’t want an unstable 4.0, so the
main decision point here is: what set of changes from this valuable and
important patch set do we put in 4.0, and which do we try to put in 4.next?
Once we determine that, the community can hopefully start allocating the
necessary review, testing, and benchmarking resources to ensure that 4.0 is
our first ever rock solid “.0” release.-Joey*

On Thu, Apr 4, 2019 at 5:56 PM Jon Haddad  wrote:

> Given the number of issues that are addressed, I definitely think it's
> worth strongly considering merging this in.  I think it might be a
> little unrealistic to cut the first alpha after the merge though.
> Being realistic, any 20K+ LOC change is going to introduce its own
> bugs, and we should be honest with ourselves about that.  It seems
> likely the issues the patch addressed would have affected the 4.0
> release in some form *anyways* so the question might be do we fix them
> now or after someone's cluster burns down because there's no inbound /
> outbound message load shedding.
>
> Giving it a quick code review and going through the JIRA comments
> (well written, thanks guys) there seem to be some pretty important bug
> fixes in here as well as paying off a bit of technical debt.
>
> Jon
>
> On Thu, Apr 4, 2019 at 1:37 PM Pavel Yaskevich  wrote:
> >
> > Great to see such a significant progress made in the area!
> >
> > On Thu, Apr 4, 2019 at 1:13 PM Aleksey Yeschenko 
> wrote:
> >
> > > I would like to propose CASSANDRA-15066 [1] - an important set of bug
> fixes
> > > and stability improvements to internode messaging code that Benedict,
> I,
> > > and others have been working on for the past couple of months.
> > >
> > > First, some context.   This work started off as a review of
> CASSANDRA-14503
> > > (Internode connection management is race-prone [2]), CASSANDRA-13630
> > > (Support large internode messages with netty) [3], and a pre-4.0
> > > confirmatory review of such a major new feature.
> > >
> > > However, as we dug in, we realized this was insufficient. With more
> than 50
> > > bugs uncovered [4] - dozens of them critical to correctness and/or
> > > stability of the system - a substantial rework was necessary to
> guarantee a
> > > solid internode messaging subsystem for the 4.0 release.
> > >
> > > In addition to addressing all of the uncovered bugs [4] that were
> unique to
> > > trunk + 13630 [3] + 14503 [2], we used this opportunity to correct some
> > > long-existing, pre-4.0 bugs and stability issues. For the complete
> list of
> > > notable bug fixes, read the comments to CASSANDRA-15066 [1]. But I’d
> like
> > > to highlight a few.
> > >
> > > # Lack of message integrity checks
> > >
> > > It’s known that TCP checksums are too weak [5] and Ethernet CRC cannot
> be
> > > relied upon [6] for integrity. With sufficient scale or time, you will
> hit
> > > bit flips.

Re: Choosing a supported Python 3 major version for cqlsh

2019-03-19 Thread Joseph Lynch

Since we'll be maintaining backwards compatibility with python 2.7, we
can't really use python 3 only language features or reserved keywords
anyways so we should probably just target the lowest common denominator (so
3.4 or 3.5 probably) and then after Python 2 is officially EOL in 2020
perhaps we can work on replacing 2.7 support with a newer Python 3 version?

Regarding common distros, I believe that these are the default py2 and py3
versions on CentOS and Ubuntu LTS:

Centos 7: python = python 2, python 2.7.5, python 3.5/3.6 available via SCL
Centos 6: python = python 2, python 2.6.6, python 3.5/3.6 available via SCL
Ubuntu 16.04 (xenial): python = python 2, python 2.7.12, python 3.5.2
Ubuntu 18.04 (bionic): python = python 2, python 2.7.15rc1, python 3.6.7

^^ based on variants of "docker run -it ubuntu:16.04 bin/bash -c 'apt
update && apt install -y python3 && python3 --version' | tail" and "docker
run -it centos:7 python --version" and such.

Cheers,
-Joey

On Tue, Mar 19, 2019 at 11:47 AM Jordan West  wrote:

> On Mon, Mar 18, 2019 at 7:52 PM Michael Shuler 
> wrote:
>
> > On 3/18/19 9:06 PM, Patrick Bannister wrote:
> > > I recommend we pick the longest supported stable release available.
> That
> > > would be Python 3.7, which is planned to get its last release in 2023,
> > four
> > > years from now.
> > > - Python 3.5 was planned to get its last major release yesterday
> > > - Python 3.6 is planned to get its last major release in December 2021,
> > > about three years from now
> > >
> > > Any feedback on picking a tested Python version for cqlshlib? I'm
> > inclined
> > > to focus on Python 3.7 as we push toward finishing the ticket.
> >
> > The correct method of choosing this would be to target runtime
> > functionality on the version in the latest LTS release of the likely
> > most-used OS. Ubuntu 18.04 LTS comes with python-3.6.5. I would think it
> > highly likely that if it runs properly on 3.6, it should run on 3.7
>
>
> In my experience working with a different python project recently this
> isn’t the case. There are reserved keywords that were added between 3.6 and
> 3.7:
> https://docs.python.org/3/whatsnew/3.7.html
>
> Jordan
>
>
>
> > fine. Using some 3.7-only feature/syntax and making it difficult on
> > people to install/use on Ubuntu LTS would be user-unfriendly.
> >
> > https://packages.ubuntu.com/bionic/python3
> >
> > There is not a similar CentOS package search, but I see a couple docs
> > say that python-3.6 is available via the SCL repository for this OS. I
> > do not see a 3.7 installation noted.
> >
> > Shoot for the lowest common denominator in real world usage, not the
> > latest release from upstream. Super strong opinion, here.
> >
> > --
> > Michael
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>

Re: Audit logging to tables.

2019-02-27 Thread Joseph Lynch

Hi Sagar,

Vinay can confirm, but as far as I am aware we have no current plans to
implement audit logging to a table directly, but the implementation is
fully pluggable (like compaction, compression, etc ...). Check out the blog
post [1] and documentation [2] Vinay wrote for more details, but the short
version is you can implement a custom IAuditLogger class that lives in the
"org.apache.cassandra.audit" package, put the jar in the Cassandra
classpath, and then load it by changing the "logger" configuration option
[3]. For example, if you implemented
"org.apache.cassandra.audit.TableAuditLogger" class you would build a jar,
put that in the libs directory, and then configure the "logger" property to
be "TableAuditLogger".

The interface tried to be simple so you should more or less have to define
the log(AuditLogEntry) method. If you're interested in making a table audit
logger and are having difficulty with the documentation/implementation
please free to reach out to Vinay (vinaykumar...@gmail.com) or myself. We'd
love to get the docs in a place where it's easy for users to get exactly
the audit logging behavior they need.

Cheers,
-Joey

[1] http://cassandra.apache.org/blog/2018/10/29/audit_logging_cassandra.html
[2] http://cassandra.apache.org/doc/latest/operating/audit_logging.html
[3]
http://cassandra.apache.org/doc/latest/operating/audit_logging.html#cassandra-yaml-configurations-for-auditlog

On Wed, Feb 27, 2019 at 9:44 AM Rahul Singh 
wrote:

> I understand why you’d want it but it would add more data management to
> the database. Generally for logging you could consider putting into ELK and
> then it can be more queried on arbitrarily.
> On Feb 27, 2019, 12:42 PM -0500, Dinesh Joshi ,
> wrote:
> > I don’t believe there is a plan to do it. If it were available in a
> table how would that help you?
> >
> > Dinesh
> >
> > > On Feb 27, 2019, at 9:32 AM, Sagar  wrote:
> > >
> > > Hey All,
> > >
> > > While following some of the recent developments on Cassandra, I found
> the
> > > new feature on Audit logging quite useful.
> > >
> > > I wanted to understand is there any plan of pushing the audit logs to a
> > > table?
> > >
> > > Thanks!
> > > Sagar.
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>

Re: [VOTE] Release Apache Cassandra 2.2.14

2019-02-05 Thread Joseph Lynch

2.2.14-tentative unit and dtest run:
https://circleci.com/gh/jolynch/cassandra/tree/2.2.14-tentative

unit tests: 0 failures
dtests: 5 failures
* test_closing_connections - thrift_hsha_test.TestThriftHSHA (
https://issues.apache.org/jira/browse/CASSANDRA-14595)
* test_multi_dc_tokens_default - token_generator_test.TestTokenGenerator
* test_multi_dc_tokens_murmur3 - token_generator_test.TestTokenGenerator
* test_multi_dc_tokens_random - token_generator_test.TestTokenGenerator
* test_multiple_repair - repair_tests.incremental_repair_test.TestIncRepair
(flake?)

I've cut https://issues.apache.org/jira/browse/CASSANDRA-15012 for fixing
the TestTokenGenerator tests, it looks straightforward.

+1 non binding

-Joey

On Sat, Feb 2, 2019 at 4:32 PM Michael Shuler 
wrote:

> I propose the following artifacts for release as 2.2.14.
>
> sha1: af91658353ba601fc8cd08627e8d36bac62e936a
> Git:
>
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.2.14-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1172/org/apache/cassandra/apache-cassandra/2.2.14/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1172/
>
> The Debian and RPM packages are available here:
> http://people.apache.org/~mshuler
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: CHANGES.txt:
>
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.2.14-tentative
> [2]: NEWS.txt:
>
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/2.2.14-tentative
>
>

Re: [VOTE] Release Apache Cassandra 3.11.4

2019-02-03 Thread Joseph Lynch

3.11.4-tentative unit and dtest run:
https://circleci.com/gh/jolynch/cassandra/tree/3.11.4-tentative

unit tests: 0 failures
dtests: 1 failure
* test_closing_connections - thrift_hsha_test.TestThriftHSHA (
https://issues.apache.org/jira/browse/CASSANDRA-14595)

+1 non binding

-Joey

On Sat, Feb 2, 2019 at 4:38 PM Michael Shuler 
wrote:

> I propose the following artifacts for release as 3.11.4.
>
> sha1: fd47391aae13bcf4ee995abcde1b0e180372d193
> Git:
>
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.11.4-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1170/org/apache/cassandra/apache-cassandra/3.11.4/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1170/
>
> The Debian and RPM packages are available here:
> http://people.apache.org/~mshuler
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: CHANGES.txt:
>
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.11.4-tentative
> [2]: NEWS.txt:
>
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.11.4-tentative
>
>

Re: [VOTE] Release Apache Cassandra 3.0.18

2019-02-03 Thread Joseph Lynch

3.0.18-tentative unit and dtest run:
https://circleci.com/gh/jolynch/cassandra/tree/3.0.18-tentative

unit tests: 0 failures
dtests: 1 failure
* test_closing_connections - thrift_hsha_test.TestThriftHSHA (
https://issues.apache.org/jira/browse/CASSANDRA-14595)

+1 non binding

-Joey

On Sat, Feb 2, 2019 at 4:32 PM Michael Shuler 
wrote:

> I propose the following artifacts for release as 3.0.18.
>
> sha1: edd52cef50a6242609a20d0d84c8eb74c580035e
> Git:
>
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.18-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1171/org/apache/cassandra/apache-cassandra/3.0.18/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1171/
>
> The Debian and RPM packages are available here:
> http://people.apache.org/~mshuler
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: CHANGES.txt:
>
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.0.18-tentative
> [2]: NEWS.txt:
>
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/3.0.18-tentative
>
>

Re: [VOTE] Change Jira Workflow

2018-12-18 Thread Joseph Lynch

+1 non-binding

On Tue, Dec 18, 2018 at 1:15 AM Sylvain Lebresne  wrote:

> +1
> --
> Sylvain
>
>
> On Tue, Dec 18, 2018 at 9:34 AM Oleksandr Petrov <
> oleksandr.pet...@gmail.com>
> wrote:
>
> > +1
> >
> > On Mon, Dec 17, 2018 at 7:12 PM Nate McCall  wrote:
> > >
> > > On Tue, Dec 18, 2018 at 4:19 AM Benedict Elliott Smith
> > >  wrote:
> > > >
> > > > I propose these changes <
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/JIRA+Workflow+Proposals
> >*
> > to the Jira Workflow for the project.  The vote will be open for 72
> hours**.
> > > >
> > >
> > >
> > > +1
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> >
> >
> > --
> > alex p
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>

Re: JIRA Workflow Proposals

2018-12-11 Thread Joseph Lynch

Just my 2c

1. D C B E A
2. B, C, A
3. A
4. +0.5

-Joey

On Tue, Dec 11, 2018 at 8:28 AM Benedict Elliott Smith 
wrote:

> Just to re-summarise the questions for people:
>
> 1. (A) Only contributors may edit or transition issues; (B) Only
> contributors may transition issues; (C) Only Jira-users may transition
> issues*; (D) (C)+Remove contributor role entirely; (E) Leave permission as
> they are today
> 2. Priority on Bug issue type: (A) remove it; (B) auto-populate it; (C)
> leave it.  Please rank.
> 3. Top priority: (A) Urgent; (B) Blocker.  See here for my explanation of
> why I chose Urgent <
> https://lists.apache.org/thread.html/c7b95b827d8da4efc5c017df80029676a032b150ec00bf11ca9c7fa7@%3Cdev.cassandra.apache.org%3E
> <
> https://lists.apache.org/thread.html/c7b95b827d8da4efc5c017df80029676a032b150ec00bf11ca9c7fa7@%3Cdev.cassandra.apache.org%3E
> >>.
> 4. Priority keep ‘Wish’ (to replace issue type): +1/-1
>
> With my answers (again, sorry):
>
> 1: A B C D E
> 2: B C A
> 3: A
> 4: +0.5
>
> > On 11 Dec 2018, at 16:25, Benedict Elliott Smith 
> wrote:
> >
> > It looks like we have a multiplicity of views on permissions, so perhaps
> we should complicate this particular vote with all of the possible options
> that have been raised so far (including one off-list).  Sorry everyone for
> the confusion.
> >
> > (A) Only contributors may edit or transition issues; (B) Only
> contributors may transition issues; (C) Only Jira-users may transition
> issues*; (D) (C)+Remove contributor role entirely; (E) Leave permission as
> they are today
> >
> > * Today apparently ‘Anyone’ can perform this operation
> >
> > A ranked vote, please.  This makes my votes:
> >
> > 1: A B C D E
> > 2: B C A
> > 3: A
> > 4: +0.5
> >
> >
> >> On 11 Dec 2018, at 05:51, Dinesh Joshi 
> wrote:
> >>
> >> I agree with this. I generally am on the side of freedom and
> responsibility. Limiting edits to certain fields turns people off.
> Something I want to very much avoid if we can.
> >>
> >> Dinesh
> >>
> >>> On Dec 10, 2018, at 6:14 PM, Murukesh Mohanan <
> murukesh.moha...@gmail.com> wrote:
> >>>
> >>> On Tue, 11 Dec 2018 at 10:51, Benedict Elliott Smith
> >>>  wrote:
> 
> > On 10 Dec 2018, at 16:21, Ariel Weisberg  wrote:
> >
> > Hi,
> >
> > RE #1, does this mean if you submit a ticket and you are not a
> contributor you can't modify any of the fields including description or
> adding/removing attachments?
> 
>  Attachment operations have their own permissions, like comments.
> Description would be prohibited though.  I don’t see this as a major
> problem, really; it is generally much more useful to add comments.  If we
> particularly want to make a subset of fields editable there is a
> workaround, though I’m not sure anybody would have the patience to
> implement it:
> https://confluence.atlassian.com/jira/how-can-i-control-the-editing-of-issue-fields-via-workflow-149834.html
> <
> https://confluence.atlassian.com/jira/how-can-i-control-the-editing-of-issue-fields-via-workflow-149834.html
> >
> 
> >>>
> >>> That would be disappointing. Descriptions with broken or no formatting
> >>> aren't rare (say, command output without code formatting). And every
> >>> now and then the description might need to be updated. For example, in
> >>> https://issues.apache.org/jira/browse/CASSANDRA-10989, the link to the
> >>> paper had rotted, but I managed to find a new one, so I could edit it
> >>> in. If such a change had to be posted as a comment, it might easily
> >>> get lost in some of the more active issues.
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
>

Re: JIRA Workflow Proposals

2018-11-26 Thread Joseph Lynch

Benedict,

Thank you for putting this document together, I think something like this
will really improve the quality and usefulness of the Jira tickets!

A few pieces of overall feedback on the proposal:
* I agree with Jeremy and Joshua on keeping labels. Labels are the only way
that contributors without access to the Jira project configuration can use
to try to group related tickets together and while I agree stronger and
well curated components are valuable, labels are a valuable supplement (and
they natively work with search so you can link to a group of tickets really
easily)
* Throughout the text there are various elements only available for bugs or
features or both (e.g. Bug Category vs Feature Category), which I find hard
to keep track of in the body of the text. Maybe we can separate the
document into "Bug Fields" and "Feature Fields" ("Improvement Fields"?)?
* I think it's pretty odd that only "jira contributors" can assign tickets
(even to themselves), and this proposal seems to make that go further in
that contributors are the only ones who can move tickets out of triage into
the open state. I'm somewhat concerned that tickets will just languish in
triage state and unassigned because the person who cut the issue can't move
it to open and can't assign themselves to fix it... If we had an SLO on
triage like "an expert in the tagged category will triage new tickets
within three days" I'd feel different but I'm not sure how/if we can offer
that. To be clear I like the Triage/Awaiting Feedback state to indicate
that we need more details from the reporter, but I think if a ticket stays
in Triage for more than some amount of time it should be because a
contributor has triaged it and has asked for more information and not
because nobody is looking (maybe we should even auto-close triage state
tickets after some period of inactivity).
* Can we clarify how users can receive emails on Jiras awaiting triage
(perhaps a mailing list they can join that gets emails when Jiras are
cut).  I know this would really help me with knowing that new Jiras have
been cut and I can help triage them or something and afaik it isn't
currently possible/documented.

I'm still reading through the entire document, but so far I have the
following specific feedback:
* Instead of "Since Version" I'd recommend just having "Versions" for all
issues since bugs can apply in earlier version but not later ones. "Fix
Versions" then refers to the versions that have any fix or commit related
to that bug/feature.
* For "Platform" should we include an "Other" field that allows users to
provide additional free-form context? I know that having a free form text
has helped me provide additional context like "NVMe SSDs" so that reviewers
don't start with "have you checked your drives are not slow". Worst case
this can be included in the description but personally I like separation of
environment description from bug/improvement description.
* Huge +1 to the "Review in Progress" vs "Change Requested", that will
really help new contributors know when they need to make changes.
* For the workflow how can we provide some guidance for contributors to get
high level feedback before they get to the "Patch Available", maybe we
explicitly indicate that before transitioning to "In Progress" the reviewer
should be found on IRC or the mailing list and they should have signed off
on the high level idea/issue? Maybe this should be part of the new "Triage"
step? I feel one of the more frustrating issues for new contributors is to
do the work and then get the "well what if you did it a completely
different way" feedback.

I'll try to finish internalizing the rest of the document later today and
provide more specific feedback. Thanks again for starting this discussion
and I look forward to the resulting updates!

-Joey

On Mon, Nov 26, 2018 at 7:06 AM Jeremy Hanna 
wrote:

> Regarding labels, I am personally a fan of both - the mapping of commonly
> used labels to things like components, features, tools, etc. as well as
> keeping labels for newer and more arbitrary groupings.  I’ve tried to
> maintain certain labels like virtual-tables, lcs, lwt, fqltool, etc because
> there are new things (e.g. fqltool and virtual tables) that we don’t
> immediately make into components and it's really nice to group them to see
> where there might be stability or feature specific (thinking virtual
> tables) items.  I agree that arbitrary and misspelled labels make things a
> bit noisy but as long as we strive to use the components/features and do
> some periodic upkeep of labels.  By periodic upkeep I mean, converting new
> labels into components or what have you.  Beyond new features or arbitrary
> groupings, it might have been nice to have had ngcc labeled tickets to see
> how that’s contributed to the project over time or some other similar event.
>
> In summary, I really like the mapping but I also really like the way that
> labels can still be of value.  Also, if we strive

Re: 4.0 Testing Signup

2018-11-08 Thread Joseph Lynch

On Thu, Nov 8, 2018 at 1:42 PM kurt greaves  wrote:

> Been thinking about this for a while and agree it's how we should approach
> it. BIkeshedding but seems like a nice big table would be suitable here,
> and I think rather than a separate confluence page per component we just
> create separate JIRA tickets that detail what's being tested and the
> approach, and discussion can be kept in JIRA.
>
Can we let each component group figure out how they want to do project
management with the one caveat that they list the component on the page and
have a tracking ticket with the right label? I think that's the lightest
touch process that will work.


> I'm happy to volunteer for testing repair. I can also add lots of different
> components to the list if you're happy for me to attack the page.
>
Go for it! I just jotted down some seed topics to get it started. Please do
edit and refactor to make it better.

-Joey

Re: 4.0 Testing Signup

2018-11-08 Thread Joseph Lynch

On Thu, Nov 8, 2018 at 11:04 AM Romain Hardouin 
wrote:

>
> Hi,
> I'm volunteer to be contributor on Metrics or Tooling component. Are we
> supposed/allowed to edit Confluence page directly?Btw I think that tooling
> should be split, maybe one ticket per tool?
>

Awesome! Yes feel free to add yourself as a contributor to whichever
component you can contribute testing to (I think you need to make an Apache
confluence account to do so), if it isn't working let me know and I'll add
your contact information. Right now we don't have a shepherd for either
component yet but I think it's pretty reasonable to have a tracking ticket
that either has subtasks for each tool (e.g. CASSANDRA-14746) or just use
linking (e.g. CASSANDRA-14697). Just try to describe in the tickets what
kinds of tests you're running and make sure they're tagged with 4.0-QA
label if possible.

Thanks!
- Joey

4.0 Testing Signup

2018-11-07 Thread Joseph Lynch

Following up on Jon's call

for QA, I put together the start of a confluence page
for
people to list out important components that they think should be tested
before 4.0 releases and hopefully committers and contributors can signup
and present their progress to the community. I've certainly missed a ton of
components that need testing but I figured that it may be good to get the
conversation started and moving forward.

What do people think? Is there a more effective way to list these out or if
people like this maybe folks can start contributing sections and
volunteering to shepherd or test them?

Let me know,
-Joey

Re: MD5 in the read path

2018-09-26 Thread Joseph Lynch

>
> Thank you all for the response.
> For RandomPartitioner, MD5 is used to avoid collision. However, why is it
> necessary for comparing data between different replicas? Is it not feasible
> to use CRC for data comparison?
>
My understanding is that it is not necessary to use MD5 and we can switch
out the message digest function as long as we have an upgrade path. I
believe this is the goal of
https://issues.apache.org/jira/browse/CASSANDRA-13292.

-Joey

Re: MD5 in the read path

2018-09-26 Thread Joseph Lynch

Michael Kjellman and others (Jason, Sam, et al.) have already done a lot of
work in 4.0 to help change the use of MD5 to something more modern [1][2].
Also I cut a ticket a little while back about the significant performance
penalty of using MD5 for digests when doing quorum reads of wide partitions
[1]. Given the profiling that Michael has done and the production profiling
we did I think it's fair to say that changing the digest from MD5 to
murmur3 or xxHash would lead to a noticeable performance improvement for
quorum reads, perhaps even something like a 2x throughput increase for e.g.
wide partition workloads.

The hard part is changing the digest hash without breaking older versions,
e.g. during a rolling restart you can't have one node give a MD5 hash and
the other give a xxHash hash as you'll end up with lots of mismatches and
read repairs ... so that would be the tricky part. I believe that we just
need to do what was done during the 3.0 storage engine refactor (I can't
remember the ticket but I'm pretty sure Sylvain did the work) which checked
the messaging version of the destination node and sent the appropriate hash
back.

-Joey

[1] https://issues.apache.org/jira/browse/CASSANDRA-13291
[2] https://issues.apache.org/jira/browse/CASSANDRA-13292
[3] https://issues.apache.org/jira/browse/CASSANDRA-14611

On Wed, Sep 26, 2018 at 5:00 PM Elliott Sims  wrote:

> They also don't matter for digests, as long as we're assuming all nodes in
> the cluster are non-malicious (which is a pretty reasonable and probably
> necessary assumption).  Or at least, deliberate collisions don't.
> Accidental collisions do, but 128 bits is sufficient to make that
> sufficiently unlikely (as in, chances are nobody will ever see a single
> collision)
>
> On Wed, Sep 26, 2018 at 7:58 PM Brandon Williams  wrote:
>
> > Collisions don't matter in the partitioner.
> >
> > On Wed, Sep 26, 2018, 6:53 PM Anirudh Kubatoor <
> anirudh.kubat...@gmail.com
> > >
> > wrote:
> >
> > > Isn't MD5 broken from a security standpoint? From wikipedia
> > > *"One basic requirement of any cryptographic hash function is that it
> > > should be computationally infeasible
> > > <
> > >
> >
> https://en.wikipedia.org/wiki/Computational_complexity_theory#Intractability
> > > >
> > > to
> > > find two non-identical messages which hash to the same value. MD5 fails
> > > this requirement catastrophically; such collisions
> > >  can be found in
> > > seconds on an ordinary home computer"*
> > >
> > > Regards,
> > > Anirudh
> > >
> > > On Wed, Sep 26, 2018 at 7:14 PM Jeff Jirsa  wrote:
> > >
> > > > In some installations, it's used for hashing the partition key to
> find
> > > the
> > > > host ( RandomPartitioner )
> > > > It's used for prepared statement IDs
> > > > It's used for hashing the data for reads to know if the data matches
> on
> > > all
> > > > different replicas.
> > > >
> > > > We don't use CRC because conflicts would be really bad. There's
> > probably
> > > > something in the middle that's slightly faster than md5 without the
> > > > drawbacks of crc32
> > > >
> > > >
> > > > On Wed, Sep 26, 2018 at 3:47 PM Tyagi, Preetika <
> > > preetika.ty...@intel.com>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I have a question about MD5 being used in the read path in
> Cassandra.
> > > > > I wanted to understand what exactly it is being used for and why
> not
> > > > > something like CRC is used which is less complex in comparison to
> > MD5.
> > > > >
> > > > > Thanks,
> > > > > Preetika
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] changing default token behavior for 4.0

2018-09-24 Thread Joseph Lynch

I am a big fan of lowering the default number of tokens for many
reasons (availability, repair, etc...). I also agree there are some
usability blockers to "just lowering the number today", but I very
much agree that the current default of 256 random tokens is a huge bug
I hope we fix by 4.0 release.

It sounds like Kurt and Jon have done a lot of work already on this
problem, and internally I've worked on this as well (Netflix's
internal token allocation as well as evaluating vnodes that resulted
in the paper I sent out) so I would be excited to help fix this for
4.0. Maybe the three of us (plus any others that are interested) can
put together a short proposal over the next few days including the
following:

1. What precisely should we change the defaults to
2. Given the new defaults how would a user bootstrap a new cluster
3. Given the new defaults how would a user add capacity to an existing cluster
4. Concrete jiras that would implement #1 with minimal possible scope

Then we could send the proposal to the dev list for feedback and if
there is consensus that the scope is not too large/dangerous and a
committer (Jon perhaps) can commit to reviewing/merging, we can work
on them and be accountable to merge them before the 4.0 release?

-Joey
On Sun, Sep 23, 2018 at 1:42 PM Nate McCall  wrote:
>
> Let's pick a default setup that works for most people (IME clusters <
> 30 nodes, but TLP and Instaclustr peeps probably have the most insight
> here).
>
> Then we just explain the heck out of it in the comments. I would also
> like to see this include some details add/remove a DC to change the
> values (perhaps we sub-task a doc creation for that?).
>
> Good discussion though - thanks folks.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Development Approach for Apache Cassandra Management process

2018-09-12 Thread Joseph Lynch

> I'd like to ask those of you that are +1'ing, are you willing to contribute
> or are you just voting we start an admin tool from scratch because you
> think it'll somehow produce a perfect codebase?

Roopa, Vinay, Sumanth and I are voting as community members (and a
sizeable user) and our willingness to contribute should be clear from
the close to six months of engineering work we've invested presenting,
prototyping, deploying, refactoring, designing, more discussing, and
producing the patch on CASSANDRA-14346 that then happened to revive
the April maintenance process discussion as we needed something to put
the scheduler in. Dinesh and other Apple contributors were the ones
who floated the idea in the first place and we just tried to help that
proposal actually happen. Clearly we have to re-work that patch as it
seems we have turned the management process discussion into a
discussion about repair which was not the point and we are already in
the process of converting the patch into a pluggable maintenance
execution engine for any scheduled task.

I didn't understand this vote as a vote on on releasing the yet to
exist maintenance process ... I thought we're trying to get consensus
on if we should invest in a fresh repo and build to the design (when
we have achieved the design there can be an actual release vote) or
start from an existing project which has made existing choices and
trying to refactor towards the scope/desires.

-Joey

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Development Approach for Apache Cassandra Management process

2018-09-12 Thread Joseph Lynch

+1 for piecemeal (option b).

I think I've explained my opinion on all the various threads and tickets.

-Joey
On Wed, Sep 12, 2018 at 10:48 AM Vinay Chella  wrote:
>
> +1 for option b, considering the advantages mentioned in dev email thread
> that Sankalp linked.
>
> ~Vinay
>
>
> On Wed, Sep 12, 2018 at 10:36 AM Dinesh Joshi
>  wrote:
>
> > +1 for piecemeal (option b)
> >
> > Dinesh
> >
> > > On Sep 12, 2018, at 8:18 AM, sankalp kohli 
> > wrote:
> > >
> > > Hi,
> > >Community has been discussing about Apache Cassandra Management
> > process
> > > since April and we had lot of discussion about which approach to take to
> > > get started. Several contributors have been interested in doing this and
> > we
> > > need to make a decision of which approach to take.
> > >
> > > The current approaches being evaluated are
> > > a. Donate an existing project to Apache Cassandra like Reaper. If this
> > > option is selected, we will evaluate various projects and see which one
> > > fits best.
> > > b. Take a piecemeal approach and use the features from different OSS
> > > projects and build a new project.
> > >
> > > Available options to vote
> > > a. +1 to use existing project.
> > > b. +1 to take piecemeal approach
> > > c  -1 to both
> > > d +0 I dont mind either option
> > >
> > > You can also just type a,b,c,d as well to chose an option.
> > >
> > > Dev threads with discussions
> > >
> > >
> > https://lists.apache.org/thread.html/4eace8cb258aab83fc3a220ff2203a281ea59f4d6557ebeb1af7b7f1@%3Cdev.cassandra.apache.org%3E
> > >
> > >
> > https://lists.apache.org/thread.html/4a7e608c46aa2256e8bcb696104a4e6d6aaa1f302834d211018ec96e@%3Cdev.cassandra.apache.org%3E
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Proposing an Apache Cassandra Management process

2018-09-08 Thread Joseph Lynch

On Fri, Sep 7, 2018 at 10:00 PM Blake Eggleston  wrote:
>
> Right, I understand the arguments for starting a new project. I’m not saying 
> reaper is, technically speaking, the best place to start. The point I’m 
> trying to make is that the non-technical advantages of using an existing 
> project as a starting point may outweigh the technical benefits of a clean 
> slate. Whether that’s the case or not, it’s not a strictly technical 
> decision, and the non-technical advantages of starting with reaper need to be 
> weighed.
>

Technical debt doesn't just refer to the technical solutions, having
an existing user base means that a project has made promises in the
past (in the case of Priam 5+ years ago) that the new project would
have to keep if we make keeping users of those sidecars a goal (which
for the record I don't think should be a goal, I think the goal is to
make Cassandra the database work out of the box in the 4.x series).

For example, Priam has to continue supporting the following as users
actively use them (including Netflix):
* Parallel token assignment and creation allowing parallel bootstrap
and parallel doubling of hundred node clusters at once (as long as you
use single tokens and run in AWS with asgs).
* 3+ backup solutions, as well as assumptions about where in e.g. S3
backups are present and what format they're present in.
* Numerous configuration options and UI elements (mostly 5 year old JSON APIs)
* Support for Cassandra 2.0, 2.1, 2.2, 3.0, 3.11 and soon 4.0

Reaper has to continue supporting the following as users actively use them:
* Postgres and h2 storage backends. These were the original storage
engines and users may not have (probably haven't?) migrated to the
relatively recently added Cassandra backend (which is probably the
only one an official sidecar should support imo).
* The three historical modes of running Reaper [1] all of which
involve remote JMX (disallowed by many companies security policies
including Netflix's) and none of which are a sidecar design (although
Mick says we can add that back in, at which point it has to support
four).
* Numerous configuration options and UI elements (e.g. a UI around a
single Reaper instance controlling many Cassandra clusters instead of
each cluster having a separate UI more consistent with how Cassandra
architecture works)
* Support for Cassandra 2.2, 3.0, 3.11, and soon 4.0

[1] http://cassandra-reaper.io/docs/usage/multi_dc/
[2] https://github.com/hashbrowncipher/cassandra-mirror

We can't "get the community" of these sidecars and drop support for
90+% of the supported configurations and features at the same time ...
These projects have made promises to users, as per the name technical
debt these choices made over years have explicit costs that we have to
take into account if we accept a project as is with the goal of taking
the community with us. If we don't have the goal of taking the
existing community with us and are instead aiming to simply make
Cassandra work out of the box without external tools, then we don't
have to continue fulfilling these promises.

Instead I think the organizations that made those promises (TLP and
Netflix in these specific cases) should continue keeping them, and the
Cassandra management process should be incrementally built by the
Cassandra community with decisions made as a community and we only GA
it when the PMC/community is happy with making a promise of support
for the features that we've merged (and since we're starting from a
fresh start if people have strong opinions about fundamental
architecture we can try to take those into account like we did with
the months of feedback on repair scheduling
runtime/architecture/design). If we can't prove value over other
community tools for running 4.x, which is definitely a possibility,
then we don't mark the management process GA, we re-focus on
individual community tools, and imo failure is a reasonable result of
attempted innovation.

-Joey

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Proposing an Apache Cassandra Management process

2018-09-07 Thread Joseph Lynch

> What’s the benefit of doing it that way vs starting with reaper and 
> integrating the netflix scheduler? If reaper was just a really inappropriate 
> choice for the cassandra management process, I could see that being a better 
> approach, but I don’t think that’s the case.
>
The benefit, as Dinesh and I argued earlier, is starting without
technical debt (especially architectural technical debt) and taking
only the best components from the multiple community sidecars for the
Cassandra management sidecar. To be clear, I think Priam is much
closer to the proposed management sidecar than Reaper is (and Priam +
the repair scheduler has basically all proposed scope), but like I
said earlier in the other thread I don't think we should take Priam as
is due to technical debt and I don't think we should take Reaper
either. The community should learn from the many sidecars we've built
and solve the problem once inside the Cassandra sidecar.

> If our management process isn’t a drop in replacement for reaper, then reaper 
> will continue to exist, which will split the user and developers base between 
> the 2 projects. That won't be good for either project.
I think Reaper is a great repair tool for some infrastructures, but I
don't think the management sidecar is about running repairs. It's
about building a general purpose tool that may happen to run repairs
if someone chooses to use that particular plugin. To be honest I think
it's great that there are competing community repair tools ... this is
how we learn from all of them and build the simplest and most narrowly
tailored solution into the database itself...

-Joey

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Proposing an Apache Cassandra Management process

2018-09-07 Thread Joseph Lynch

On Fri, Sep 7, 2018 at 5:03 PM Jonathan Haddad  wrote:
>
> We haven’t even defined any requirements for an admin tool. It’s hard to
> make a case for anything without agreement on what we’re trying to build.
>
We were/are trying to sketch out scope/requirements in the #14395 and
#14346 tickets as well as their associated design documents. I think
the general proposed direction is a distributed 1:1 management sidecar
process similar in architecture to Netflix's Priam except explicitly
built to be general and pluggable by anyone rather than tightly
coupled to AWS.

Dinesh, Vinay and I were aiming for low amounts of scope at first and
take things in an iterative approach with just enough upfront design
but not so much we are unable to make any progress at all. For example
maybe something like:

1. Get a super simple and non controversial sidecar process that ships
with Cassandra and exposes a lightweight HTTP interface to e.g. some
basic JMX endpoints
2a. Add a pluggable execution engine for cron/oneshot/scheduled jobs
with the basic interfaces and state store and such
2b. Start scoping and implementing the full HTTP interface, e.g.
backup status, cluster health status, etc ...
3a. Start integrating implementations of the jobs from 2a such as
snapshot, backup, cluster restart, daemon + sstable upgrade, repair,
etc
3b. Start integrating UI components that pair with the HTTP interface from 2b
4. ?? Perhaps start unlocking next generation operations like moving
"background" activities like compaction, streaming, repair etc into
one or more sidecar contained processes to ensure the main daemon only
handles read+write requests

There are going to be a lot of questions to answer, and I think trying
to answer them all up front will mean that we get nowhere or make
unfortunate compromises that cripple the project from the start. If
people think we need to do more design and discussion than we have
been doing then we can spend more time on the design, but personally
I'd rather start iterating on code and prove value incrementally. If
it doesn't work out we won't release it GA to the community ...

-Joey

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: QA signup

2018-09-07 Thread Joseph Lynch

I don't think anyone has mentioned this yet but we probably want to
consider releasing 4.0 alpha jars to maven central soon so the open
source ecosystem can start testing a consistent Cassandra 4.0; for
example I had to hack 4.0 into Priam's build [1] by manually building
a jar and checking it in which is ... not particularly good or
reproducible for others. I'm not sure how hard it would be but
supporting periodic SNAPSHOT releases would also at least allow
building against trunk and would be great too. It also might be a good
idea to have a document (confluence page?) of breaking changes that
are most likely to require a change from users. For example the
SeedProvider interface change is probably going to break almost
everyone's deployment (but is easy to fix), and having a central list
of removed yaml options might be helpful past the NEWs file.

Regarding testing areas, we deployed trunk in the Netflix testing
environment on Wednesday with the aim to test the netty internode
messaging subsystem on 200+ node clusters. We are working with Jason,
Dinesh, and Jordan and have already found some interesting results and
would like to write them down as well as working on establishing good
baselines and testing methodology for stressing that subsystem. Is the
consensus here to create Jira epics tagged with 4.0 blocker for each
subsystem, or confluence pages (if confluence I think we need to give
people permissions to add pages?)?

Other areas we can help test and are looking for collaborators on are
audit/full query logging and we are potentially interested in helping
to test repair, but our internal implementation doesn't support
Cassandra 4.x ... we can re-work the CASSANDRA-14346 patch without too
much effort I think to thoroughly test full/incremental repair at any
scale cluster (or maybe Reaper folks can test repair).

[1] 
https://github.com/Netflix/Priam/pull/713/files#diff-3c33bef9f0334cf724470d50eae8dd8b

-Joey

On Fri, Sep 7, 2018 at 9:57 AM Jonathan Haddad  wrote:
>
> Really good idea JD. Keeping all the tests under an umbrella ticket for the
> feature with everything linked back makes a lot of sense.
>
> On Thu, Sep 6, 2018 at 11:09 PM J. D. Jordan 
> wrote:
>
> > I would suggest that JIRA’s tagged as 4.0 blockers be created for the list
> > once it is fleshed out.  Test plans and results could be posted to said
> > JIRAs, to be closed once a given test passes. Any bugs found can also then
> > be related back to such a ticket for tracking them as well.
> >
> > -Jeremiah
> >
> > > On Sep 6, 2018, at 12:27 PM, Jonathan Haddad  wrote:
> > >
> > > I completely agree with you, Sankalp.  I didn't want to dig too deep into
> > > the underlying testing methodology (and I still think we shouldn't just
> > > yet) but if the goal is to have confidence in the release, our QA process
> > > needs to be comprehensive.
> > >
> > > I believe that having focused teams for each component with a team leader
> > > with support from committers & contributors gives us the best shot at
> > > defining large scale functional tests that can be used to form both
> > > progress and bug reports.  (A person could / hopefully will be on more
> > than
> > > one team).  Coming up with those comprehensive tests will be the jobs of
> > > the teams, getting frequent bidirectional feedback on the dev ML.  Bugs
> > go
> > > in JIRA as per usual.
> > >
> > > Hopefully we can continue this process after the release, giving the
> > > project more structure, and folding more people in over time as
> > > contributors and ideally committers / PMC.
> > >
> > > Jon
> > >
> > >
> > >> On Thu, Sep 6, 2018 at 1:15 PM sankalp kohli 
> > wrote:
> > >>
> > >> Thanks for starting this Jon.
> > >> Instead of saying "I tested streaming", we should define what all was
> > >> tested like was all data transferred, what happened when stream failed,
> > >> etc.
> > >> Based on talking to a few users, looks like most testing is done by
> > doing
> > >> an operation or running a load and seeing if it "worked" and no errors
> > in
> > >> logs.
> > >>
> > >> Another important thing will be to fix bugs asap ahead of testing,  as
> > >> fixes can lead to more bugs :)
> > >>
> >  On Thu, Sep 6, 2018 at 7:52 AM Jonathan Haddad 
> > wrote:
> > >>>
> > >>> I was thinking along the same lines.  For this to be successful I think
> > >>> either weekly or bi-weekly summary reports back to the mailing list by
> > >> the
> > >>> team lead for each subsection on what's been tested and how it's been
> > >>> tested will help keep things moving along.
> > >>>
> > >>> In my opinion the lead for each team should *not* be the contributor
> > that
> > >>> wrote the feature, but someone who's very interested in it and can use
> > >> the
> > >>> contributor as a resource.  I think it would be difficult for the
> > >>> contributor to poke holes in their own work - if they could do that it
> > >>> would have been done already.  This should be a verification process
> > >> that's
>

Re: Yet another repair solution

2018-08-28 Thread Joseph Lynch

I'm pretty interested in seeing and understanding your solution! When we
started on CASSANDRA-14346 reading your design documents and plan you
sketched out in CASSANDRA-10070 were really helpful in improving our
design. I'm particularly interested in how the Scheduler/Job/Task APIs
turned out (we're working on something similar internally and would love to
compare notes and figure out the best way to implement that kind of
abstraction)?

-Joey

On Tue, Aug 28, 2018 at 6:34 AM Marcus Olsson 
wrote:

> Hi,
>
> With the risk of stirring the repair/side-car topic  even further I'd just
> like to mention that we have recently gotten approval to contribute our
> repair management side-car solution.
> It's based on the proposal in
> https://issues.apache.org/jira/browse/CASSANDRA-10070 as a standalone
> application sitting next to each instance.
> With the recent discussions in mind I'd just like to hear the thoughts
> from the community on this before we put in the effort of bringing our
> solution into open source.
>
> Would there be an interest of having yet another repair solution in the
> discussion?
>
> Best Regards
> Marcus Olsson
>

Re: Reaper as cassandra-admin

2018-08-28 Thread Joseph Lynch

I and the rest of the Netflix Cassandra team share Dinesh's concerns. I was
excited to work on this project precisely because we were taking only the
best designs, techniques, and functionality out of the community sidecars
such as Priam, Reaper, and any other community tool and building the
simplest possible tool into Cassandra that could deliver the maximum value
to our users with the minimal amount of technical debt. For example, a
distributed, shared nothing architecture that communicates only through
state transitions in Cassandra data itself seems to be the most robust and
secure architecture (and indeed Reaper appears to be working towards
refactoring towards that).  Fundamental architecture is, in my experience,
very hard to refactor, and often starting fresh with the lessons learned
from the N previous iterations is the faster way to build real value. For
example, Reaper was built to be a repair tool, it is baked into the core
abstractions. It sounds like the community needs something more like a
distributed task execution engine which is fully pluggable (plugin whatever
ops task you want) and operates scheduled, oneshot, and daemon tasks.

What if we started with a basic framework as proposed in CASSANDRA-14395,
maybe add a pluggable execution engine as the first few commits and then
various community members can contribute plugins/modules that add various
functionality such as repair, backup, distributed restarts, upgrades,
etc..?  We would be striving very hard not to reinvent the wheel, rather we
would want to learn from previous iterations, keep what works well and
leave the rest.

Regarding Priam, we could offer to donate it but I think that the community
shouldn't accept it because it is full of years of technical debt and
decisions made by Netflix for Netflix. For example Priam currently has four
different backup solutions (three used in production, the latest not used
in production) that we have implemented over the years, and only the latest
one that is not yet in production should be contributed to the official
sidecar. The latest iteration is similar to the architecture of
https://github.com/hashbrowncipher/cassandra-mirror which is capable of per
minute, point in time backups; no previous iteration is capable of this.
Yes the earlier versions are "battle hardened" but we know those
architectures have fundamental flaws, are overly expensive, or simply won't
scale to the next 10x requirement. We have learned from those previous
iterations and are creating the next iteration that will scale another
order of magnitude. I also wouldn’t want to burden reviewers with looking
at the first three implementations or building the mental model all at once
of how Priam works end to end.

Practically speaking, I think it's much more logistically difficult to
accept one of the sidecar projects as is than building a new one
incrementally. The existing sidecars have dependencies that have to be
vetted, technical debt that must be trimmed, tens of thousands of lines of
code that have to be reviewed, and even if the community wants to make
changes those changes might be prohibitively difficult as the underlying
architecture has solidified.

Furthermore, all of these tools were designed without the notion that they
were shipping with Cassandra, which precluded them from being capable of
next generation features like removing compaction entirely from the live
request-response path into a separate process that can be limited with e.g.
cgroups to ensure isolation. Also they have supported many versions of
Cassandra over the years and therefore have layers of indirection and
abstraction added simply for dealing with various different APIs and
versions (I personally think the official sidecar should branch with
Cassandra and support current plus previous versions of Cassandra just like
the server does).

I hope that we decide as a community to put all the options on the table in
the open, learn from all of them, and pursue a solution that takes the best
from all the solutions and is unencumbered by historical decisions.

-Joey

Re: JIRAs in Review

2018-08-22 Thread Joseph Lynch

Just want to bump this up if any reviewers have time before the 9/1 window.
I think these are all patch available and ready for review at this point.

Useful improvements for 4.0:
>
> https://issues.apache.org/jira/browse/CASSANDRA-14303 and
> https://issues.apache.org/jira/browse/CASSANDRA-14557 - Makes the user
> interface for creating keyspaces easier to use and less error prone. For
> example "CREATE KEYSPACE foo WITH replication = {'class':
> 'NetworkTopologyStrategy'}" would just do the right thing after both
> patches.
>
> https://issues.apache.org/jira/browse/CASSANDRA-14459 - Limits the
> DynamicEndpointSnitch's use of latent replicas. This should help with slow
> queries on startup and every 10 minutes during DES reset (when the snitch's
> latency data is empty). There is also a riskier potential follow up patch
> which improves the performance of the latency tracking mechanism by ~10x
> and reduces garbage to near zero. I believe that Jason is looking at this.
>
> https://issues.apache.org/jira/browse/CASSANDRA-14297 - Improves the
> startup check functionality to make it so that operators can be more
> confident that they won't throw unavailable or timeouts during node
> restarts. The patch has merge conflicts right now but I was waiting on
> someone to confirm it's worth doing before I spend more time on it.
>


> Minor cleanup patches:
> https://issues.apache.org/jira/browse/CASSANDRA-9452 - Cleanup of some
> old configuration, probably an easy thing to commit.
>

Since then we have a few more:

https://issues.apache.org/jira/browse/CASSANDRA-14358 - Improves the
handling of network partitions, especially those caused by statefull
firewalls (e.g. AWS security groups)
https://issues.apache.org/jira/browse/CASSANDRA-14319 - Doesn't let
nodetool rebuild pass invalid datacenters

Cheers,
-Joey

Re: Side Car New Repo vs not

2018-08-20 Thread Joseph Lynch

I think that the pros of incubating the sidecar in tree as a tool first
outweigh the alternatives at this point of time. Rough tradeoffs that I see:

Unique pros of in tree sidecar:
* Faster iteration speed in general. For example when we need to add a new
JMX endpoint that the sidecar needs, or change something from JMX to a
virtual table (e.g. for repair, or monitoring) we can do all changes
including tests as one commit within the main repository and don't have to
commit to main repo, sidecar repo, and dtest repo (juggling version
compatibility along the way).
* We can in the future more easily move serious background functionality
like compaction or repair itself (not repair scheduling, actual repairing)
into the sidecar with a single atomic commit, we don't have to do two phase
commits where we add some IPC mechanism to allow us to support it in both,
then turn it on in the sidecar, then turn it off in the server, etc...
* I think that the verification is much easier (sounds like Jonathan
disagreed on the other thread, I could certainly be wrong), and we don't
have to worry about testing matrices to assure that the sidecar works with
various versions as the version of the sidecar that is released with that
version of Cassandra is the only one we have to certify works. If people
want to pull in new versions or maintain backports they can do that at
their discretion/testing.
* We can iterate and prove value before committing to a choice. Since it
will be a separate artifact from the start we can always move the artifact
to a separate repo later (but moving the other way is harder).
* Users will get the sidecar "for free" when they install the daemon, they
don't need to take affirmative action to e.g. be able to restart their
cluster, run repair, or back their data up; it just comes out of the box
for free.

Unique pros of a separate repository sidecar:
* We can use a more modern build system like gradle instead of ant
* Merging changes is less "scary" I guess (I feel like if you're not
touching the daemon this is already true but I could see this being less
worrisome for some).
* Releasing a separate artifact is somewhat easier from a separate repo
(especially if we have gradle which makes e.g. building debs and rpms
trivial).
* We could backport to previous versions without getting into arguments
about bug fixes vs features.
* Committers could be different from the main repo, which ... may be a
useful thing

Non unique pros of a sidecar (could be achieved in the main repo or in a
separate repo):
* A separate build artifact .jar/.deb/.rpm that can be installed
separately. It's slightly easier with a separate repo but certainly not out
of reach within a single repo (indeed the current patch already creates a
separate jar, and we could create a separate .deb reasonably easily).
Personally I think having a separate .deb/.rpm is premature at this point
(for companies that really want it they can build their own packages using
the .jars), but I think it really is a distracting issue from where the
patch should go as we can always choose to remove experimental .jar files
that the main daemon doesn't touch.
* A separate process lifecycle. No matter where the sidecar goes, we get
the benefit of restarting it being less dangerous for availability than
restarting the main daemon.

That all being said, these are strong opinions weakly held and I would
rather get something actually committed so that we can prove value one way
or the other and am therefore, of course, happy to put sidecar patches
wherever someone can review and commit it.

-Joey

On Mon, Aug 20, 2018 at 1:52 PM sankalp kohli 
wrote:

> Hi,
> I am starting a new thread to get consensus on where the side car
> should be contributed.
>
> Please send your responses with pro/cons of each approach or any other
> approach. Please be clear which approach you will pick while still giving
> pros/cons of both approaches.
>
> Thanks.
> Sankalp
>

Re: Proposing an Apache Cassandra Management process

2018-08-20 Thread Joseph Lynch

> We are looking to contribute Reaper to the Cassandra project.
>
Just to clarify are you proposing contributing Reaper as a project via
donation or you are planning on contributing the features of Reaper as
patches to Cassandra? If the former how far along are you on the donation
process? If the latter, when do you think you would have patches ready for
consideration / review?


> Looking at the patch it's very similar in its base design already, but
> Reaper does has a lot more to offer. We have all been working hard to move
> it to also being a side-car so it can be contributed. This raises a number
> of relevant questions to this thread: would we then accept both works in
> the Cassandra project, and what burden would it put on the current PMC to
> maintain both works.
>
I would hope that we would collaborate on merging the best parts of all
into the official Cassandra sidecar, taking the always on, shared nothing,
highly available system that we've contributed a patchset for and adding in
many of the repair features (e.g. schedules, a nice web UI) that Reaper has.


> I share Stefan's concern that consensus had not been met around a
> side-car, and that it was somehow default accepted before a patch landed.


I feel this is not correct or fair. The sidecar and repair discussions have
been anything _but_ "default accepted". The timeline of consensus building
involving the management sidecar and repair scheduling plans:

Dec 2016: Vinay worked with Jon and Alex to try to collaborate on Reaper to
come up with design goals for a repair scheduler that could work at Netflix
scale.

~Feb 2017: Netflix believes that the fundamental design gaps prevented us
from using Reaper as it relies heavily on remote JMX connections and
central coordination.

Sep. 2017: Vinay gives a lightning talk at NGCC about a highly available
and distributed repair scheduling sidecar/tool. He is encouraged by
multiple committers to build repair scheduling into the daemon itself and
not as a sidecar so the database is truly eventually consistent.

~Jun. 2017 - Feb. 2018: Based on internal need and the positive feedback at
NGCC, Vinay and myself prototype the distributed repair scheduler within
Priam and roll it out at Netflix scale.

Mar. 2018: I open a Jira (CASSANDRA-14346) along with a detailed 20 page
design document for adding repair scheduling to the daemon itself and open
the design up for feedback from the community. We get feedback from Alex,
Blake, Nate, Stefan, and Mick. As far as I know there were zero proposals
to contribute Reaper at this point. We hear the consensus that the
community would prefer repair scheduling in a separate distributed sidecar
rather than in the daemon itself and we re-work the design to match this
consensus, re-aligning with our original proposal at NGCC.

Apr 2018: Blake brings the discussion of repair scheduling to the dev list (
https://lists.apache.org/thread.html/760fbef677f27aa5c2ab4c375c7efeb81304fea428deff986ba1c2eb@%3Cdev.cassandra.apache.org%3E).
Many community members give positive feedback that we should solve it as
part of Cassandra and there is still no mention of contributing Reaper at
this point. The last message is my attempted summary giving context on how
we want to take the best of all the sidecars (OpsCenter, Priam, Reaper) and
ship them with Cassandra.

Apr. 2018: Dinesh opens CASSANDRA-14395 along with a public design document
for gathering feedback on a general management sidecar. Sankalp and Dinesh
encourage Vinay and myself to kickstart that sidecar using the repair
scheduler patch

Apr 2018: Dinesh reaches out to the dev list (
https://lists.apache.org/thread.html/a098341efd8f344494bcd2761dba5125e971b59b1dd54f282ffda253@%3Cdev.cassandra.apache.org%3E)
about the general management process to gain further feedback. All feedback
remains positive as it is a potential place for multiple community members
to contribute their various sidecar functionality.

May-Jul 2017: Vinay and I work on creating a basic sidecar for running the
repair scheduler based on the feedback from the community in
CASSANDRA-14346 and CASSANDRA-14395

Jun 2018: I bump CASSANDRA-14346 indicating we're still working on this,
nobody objects

Jul 2018: Sankalp asks on the dev list if anyone has feature Jiras anyone
needs review for before 4.0, I mention again that we've nearly got the
basic sidecar and repair scheduling work done and will need help with
review. No one responds.

Aug 2018: We submit a patch that brings a basic distributed sidecar and
robust distributed repair to Cassandra itself. Dinesh mentions that he will
try to review. Now folks appear concerned about it being in tree and
instead maybe it should go in a different repo all together. I don't think
we have consensus on the repo choice yet.

This seems at odds when we're already struggling to keep up with the
> incoming patches/contributions, and there could be other git repos in the
> project we will need to support in the future too. But

Re: Proposing an Apache Cassandra Management process

2018-08-17 Thread Joseph Lynch

While I would love to use a different build system (e.g. gradle) for the
sidecar, I agree with Dinesh that a separate repo would make sidecar
development much harder to verify, especially on the testing and
compatibility front. As Jeremiah mentioned we can always choose later to
release the sidecar artifact separately or more frequently than the main
server regardless of repo choice and as per Roopa's point having a separate
release artifact (jar or deb/rpm) is probably a good idea until the default
Cassandra packages don't automatically stop and start Cassandra on install.

While we were developing the repair scheduler in a separate repo we had a
number of annoying issues that only started surfacing once we started
merging it directly into the trunk tree:
1. It is hard to compile/test against unreleased versions of Cassandra
(e.g. the JMX interfaces changed a lot with 4.x, and it was pretty tricky
to compile against that as the main project doesn't release nightly
snapshots or anything like that, so we had to build local trunk jars and
then manually dep them).
2. Integration testing and cross version compatibility is extremely hard.
The sidecar is going to be involved in multi node coordination (e.g.
monitoring, metrics, maintenance) and will be tightly coupled to JMX
interface choices in the server and trying to make sure that it all works
with multiple versions of Cassandra is much harder if it's a separate repo
that has to have a mirroring release cycle to Cassandra. It seems much
easier to have it in tree and just be like "the in tree sidecar is tested
against that version of Cassandra". Every time we cut a Cassandra server
branch the sidecar branches with it.

We experience these pains all the time with Priam being in a separate repo,
where every time we support a new Cassandra version a bunch of JMX
endpoints break and we have to refactor the code to either call JMX methods
by string or cut a new Priam branch. A separate artifact is pretty
important, but a separate repo just allows drifts. Furthermore from the
Priam experience I also don't think it's realistic to say one version of a
sidecar artifact can actually support multiple versions.

-Joey

On Fri, Aug 17, 2018 at 12:00 PM Jeremiah D Jordan 
wrote:

> Not sure why the two things being in the same repo means they need the
> same release process.  You can always do interim releases of the management
> artifact between server releases, or even have completely decoupled
> releases.
>
> -Jeremiah
>
> > On Aug 17, 2018, at 10:52 AM, Blake Eggleston 
> wrote:
> >
> > I'd be more in favor of making it a separate project, basically for all
> the reasons listed below. I'm assuming we'd want a management process to
> work across different versions, which will be more awkward if it's in tree.
> Even if that's not the case, keeping it in a different repo at this point
> will make iteration easier than if it were in tree. I'd imagine (or at
> least hope) that validating the management process for release would be
> less difficult than the main project, so tying them to the Cassandra
> release cycle seems unnecessarily restrictive.
> >
> >
> > On August 17, 2018 at 12:07:18 AM, Dinesh Joshi 
> > (dinesh.jo...@yahoo.com.invalid)
> wrote:
> >
> >> On Aug 16, 2018, at 9:27 PM, Sankalp Kohli 
> wrote:
> >>
> >> I am bumping this thread because patch has landed for this with repair
> functionality.
> >>
> >> I have a following proposal for this which I can put in the JIRA or doc
> >>
> >> 1. We should see if we can keep this in a separate repo like Dtest.
> >
> > This would imply a looser coupling between the two. Keeping things
> in-tree is my preferred approach. It makes testing, dependency management
> and code sharing easier.
> >
> >> 2. It should have its own release process.
> >
> > This means now there would be two releases that need to be managed and
> coordinated.
> >
> >> 3. It should have integration tests for different versions of Cassandra
> it will support.
> >
> > Given the lack of test infrastructure - this will be hard especially if
> you have to qualify a matrix of builds.
> >
> > Dinesh
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: JIRAs in Review

2018-07-20 Thread Joseph Lynch

We have a few improvements and bug fixes that could use reviewer feedback.

Useful improvements for 4.0:

https://issues.apache.org/jira/browse/CASSANDRA-14303 and
https://issues.apache.org/jira/browse/CASSANDRA-14557 - Makes the user
interface for creating keyspaces easier to use and less error prone. For
example "CREATE KEYSPACE foo WITH replication = {'class':
'NetworkTopologyStrategy'}" would just do the right thing after both
patches.

https://issues.apache.org/jira/browse/CASSANDRA-14459 - Limits the
DynamicEndpointSnitch's use of latent replicas. This should help with slow
queries on startup and every 10 minutes during DES reset (when the snitch's
latency data is empty). There is also a riskier potential follow up patch
which improves the performance of the latency tracking mechanism by ~10x
and reduces garbage to near zero. I believe that Jason is looking at this.

https://issues.apache.org/jira/browse/CASSANDRA-14297 - Improves the
startup check functionality to make it so that operators can be more
confident that they won't throw unavailable or timeouts during node
restarts. The patch has merge conflicts right now but I was waiting on
someone to confirm it's worth doing before I spend more time on it.

In progress patches for 4.0 that will probably need reviewers soon:

https://issues.apache.org/jira/browse/CASSANDRA-14358 - Improves the
handling of network partitions, especially those caused by statefull
firewalls (e.g. AWS security groups)
https://issues.apache.org/jira/browse/CASSANDRA-14346 - Initial sidecar and
distributed scheduler for repair. We're working on this port more or less
fulltime to try to get a patch before August 1st so someone can review.

Minor cleanup patches:
https://issues.apache.org/jira/browse/CASSANDRA-9452 - Cleanup of some old
configuration, probably an easy thing to commit.

Unrelated to this, how do I get the ability to assign myself to issues?
Right now someone else has to assign me/sumanth whenever we work on a
ticket and that's slightly odd.

Thanks Sankalp for sending this request out!
-Joey

On Tue, Jul 17, 2018 at 12:35 PM sankalp kohli 
wrote:

> Hi,
> We are 7 weeks away from 4.0 freeze and there are ~150 JIRAs waiting
> for review. It is hard to know which ones should be prioritized as some of
> them could be not valid(fixes 2.0 bug), some of them will have the assignee
> who no longer is active, etc.
>
> If anyone is *not* getting traction on the JIRA to get it reviewed, please
> use this thread to send your JIRA number and optionally why it is
> important.
>
> Thanks,
> Sankalp
>

Re: reroll the builds?

2018-07-17 Thread Joseph Lynch

We ran the tests against 3.0, 2.2 and 3.11 using circleci and there are
various failing dtests but all three have green unit tests.

3.11.3 tentative (31d5d87, test branch
,
unit tests  pass, 5
 and 6

dtest failures)
3.0.17 tentative (d52c7b8, test branch
, unit
tests  pass, 14
 and 15
 dtest failures)
2.2.13 tentative (3482370, test branch
,
unit tests  pass, 9
 and 10

dtest failures)

It looks like many (~6) of the failures in 3.0.x are related to
snapshot_test.TestArchiveCommitlog. I'm not sure if this is abnormal.

I don't see a good historical record to know if these are just flakes, but
if we only want to go on green builds perhaps we can either disable the
flakey tests or fix them up? If someone feels strongly we should fix
particular tests up please link a jira and I can take a whack at some of
them.

-Joey

On Tue, Jul 17, 2018 at 9:35 AM Michael Shuler 
wrote:

> On 07/16/2018 11:27 PM, Jason Brown wrote:
> > Hey all,
> >
> > The recent builds were -1'd, but it appears the issues have been resolved
> > (2.2.13 with CASSANDRA-14423, and 3.0.17 / 3.11.3 reverting
> > CASSANDRA-14252). Can we go ahead and reroll now?
>
> Could someone run through the tests on 2.2, 3.0, 3.11 branches and link
> them?  Thanks!
>
> Michael
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: [VOTE] Release Apache Cassandra 2.2.13

2018-07-03 Thread Joseph Lynch

+1 nb

Tests look reasonable

with passing unit tests  and
about 13 failing dtests 

On Tue, Jul 3, 2018 at 1:55 PM kurt greaves  wrote:

> +1 nb
>
> On Wed., 4 Jul. 2018, 03:26 Brandon Williams,  wrote:
>
> > +1
> >
> > On Mon, Jul 2, 2018 at 3:10 PM, Michael Shuler 
> > wrote:
> >
> > > I propose the following artifacts for release as 2.2.13.
> > >
> > > sha1: 9ff78249a0a5e87bd04bf9804ef1a3b29b5e1645
> > > Git: http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=sho
> > > rtlog;h=refs/tags/2.2.13-tentative
> > > Artifacts:
> https://repository.apache.org/content/repositories/orgapache
> > > cassandra-1159/org/apache/cassandra/apache-cassandra/2.2.13/
> > > Staging repository: https://repository.apache.org/
> > > content/repositories/orgapachecassandra-1159/
> > >
> > > The Debian and RPM packages are available here:
> > > http://people.apache.org/~mshuler/
> > >
> > > The vote will be open for 72 hours (longer if needed).
> > >
> > > [1]: (CHANGES.txt) http://git-wip-us.apache.org/r
> > > epos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/
> > > tags/2.2.13-tentative
> > > [2]: (NEWS.txt) http://git-wip-us.apache.org/r
> > > epos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tag
> > > s/2.2.13-tentative
> > >
> > > --
> > > Michael
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> >
>

Re: Difference between heartbeat and generation on a Gossip packet

2018-06-28 Thread Joseph Lynch

Hi Abdelkarim,

Other people on this list are much more knowledgeable than me and can
correct me if I'm wrong, but my understanding is that the combination of
generation and version (aka heartbeat) form a logical clock tuple
consisting of (generation, version) and that combination is the
HeartBeatState.

The generation is the really important part and roughly corresponds to the
last start time of that particular Cassandra process in seconds since epoch
plus any forced increments due to e.g. the gossiper stopping or starting
(nodetool disable/enable gossip). The generation is further stored on disk
in the system.local table so that during a crash or restart, even if the
system's clock moves backwards, the Cassandra node's generation should
never move backwards. Whenever a node's generation number changes it's
considered a major gossip state update by other nodes because they have to
do things like ensure they are speaking the right protocol version, compare
schema, etc ... in addition to all the versioned state changes seen below.

The version is a counter used to show the passage of time within a
generation and is used to signal versioned gossip state changes. It starts
at zero on process start and increases by one roughly every second. There
are various pieces of metadata like a node's status, schema, rack, dc, host
id, tokens, etc... which are all versioned using this version counter when
they change (whatever shows up in nodetool gossipinfo is a good example of
these states). When the gossiper is enabled, every second, each node increments
their local version by one, picks another peer to gossip with, and sends
out their map of versioned items to that peer; other nodes know to pick up
any new data if the version has increased. Since nodes are all gossiping
with each other, any update to one node's versioned data get's propagated
out quickly even if that node may not have directly gossiped with everyone.
Naturally, the version number only increases for a given generation, but if
the generation changes the version moves backwards (resets to zero).

So yea, think of (generation, version) as forming a logical clock which
roughly corresponds to (~last process start in seconds since the epoch,
~seconds since last process start) for each node. This logical clock is
used to create ordering in gossip state changes.

Hope that was helpful,
-Joey Lynch

On Tue, Jun 26, 2018 at 3:09 PM Abdelkrim Fitouri 
wrote:

> Hello,
>
> I  am studying the gossip part on casssandra and wondering about the
> difference between the heartbeat and generation data exchanged for the
> autodiscovery.
>
> many thanks for any help.
>
> --
>
> Best Regards.
>
> Abdelkarim.
>

Re: Quantifying Virtual Node Impact on Cassandra Availability

2018-04-17 Thread Joseph Lynch

t; > > availability, etc.
> > > >
> > > > I also wonder if in vnodes (and manually managed tokens... I'll
> return
> > to
> > > > this) the node recovery scenarios are being hampered by sstables
> having
> > > the
> > > > hash ranges of the vnodes intermingled in the same set of sstables. I
> > > > wondered in another thread in vnodes why sstables are separated into
> > sets
> > > > by the vnode ranges they represent. For a manually managed contiguous
> > > token
> > > > range, you could separate the sstables into a fixed number of sets,
> > kind
> > > of
> > > > vnode-light.
> > > >
> > > > So if there was rebalancing or reconstruction, you could sneakernet
> or
> > > > reliably send entire sstable sets that would belong in a range.
> > > >
> > > > I also thing this would improve compactions and repairs too.
> > Compactions
> > > > would be naturally parallelizable in all compaction schemes, and
> > repairs
> > > > would have natural subsets to do merkle tree calculations.
> > > >
> > > > Granted sending sstables might result in "overstreaming" due to data
> > > > replication across the sstables, but you wouldn't have CPU and random
> > I/O
> > > > to look up the data. Just sequential transfers.
> > > >
> > > > For manually managed tokens with subdivided sstables, if there was
> > > > rebalancing, you would have the "fringe" edges of the hash range
> > > subdivided
> > > > already, and you would only need to deal with the data in the border
> > > areas
> > > > of the token range, and again could sneakernet / dumb transfer the
> > tables
> > > > and then let the new node remove the unneeded in future repairs.
> > > > (Compaction does not remove data that is not longer managed by a
> node,
> > > only
> > > > repair does? Or does only nodetool clean do that?)
> > > >
> > > > Pre-subdivided sstables for manually maanged tokens would REALLY pay
> > big
> > > > dividends in large-scale cluster expansion. Say you wanted to double
> or
> > > > triple the cluster. Since the sstables are already split by some
> > numeric
> > > > factor that has lots of even divisors (60 for RF 2,3,4,5), you simply
> > > bulk
> > > > copy the already-subdivided sstables for the new nodes' hash ranges
> and
> > > > you'd basically be done. In AWS EBS volumes, that could just be a
> drive
> > > > detach / drive attach.
> > > >
> > > >
> > > >
> > > >
> > > >> On Tue, Apr 17, 2018 at 7:37 AM, kurt greaves <k...@instaclustr.com
> >
> > > wrote:
> > > >>
> > > >> Great write up. Glad someone finally did the math for us. I don't
> > think
> > > >> this will come as a surprise for many of the developers.
> Availability
> > is
> > > >> only one issue raised by vnodes. Load distribution and performance
> are
> > > also
> > > >> pretty big concerns.
> > > >>
> > > >> I'm always a proponent for fixing vnodes, and removing them as a
> > default
> > > >> until we do. Happy to help on this and we have ideas in mind that at
> > > some
> > > >> point I'll create tickets for...
> > > >>
> > > >>> On Tue., 17 Apr. 2018, 06:16 Joseph Lynch, <joe.e.ly...@gmail.com>
> > > wrote:
> > > >>>
> > > >>> If the blob link on github doesn't work for the pdf (looks like
> > mobile
> > > >>> might not like it), try:
> > > >>>
> > > >>>
> > > >>> https://github.com/jolynch/python_performance_toolkit/
> > > >> raw/master/notebooks/cassandra_availability/whitepaper/cassandra-
> > > >> availability-virtual.pdf
> > > >>>
> > > >>> -Joey
> > > >>> <
> > > >>> https://github.com/jolynch/python_performance_toolkit/
> > > >> raw/master/notebooks/cassandra_availability/whitepaper/cassandra-
> > > >> availability-virtual.pdf
> > > >>>>
> > > >>>
> > > >>> On Mon, Apr 16, 2018 at 1:14 PM, Joseph Lynch <
> joe.e.ly...@gmail.com
> > >
> > > >>> wrote:
> > > >>>
> > &g

Re: Quantifying Virtual Node Impact on Cassandra Availability

2018-04-17 Thread Joseph Lynch

nge
> > subdivided
> > > already, and you would only need to deal with the data in the border
> > areas
> > > of the token range, and again could sneakernet / dumb transfer the
> tables
> > > and then let the new node remove the unneeded in future repairs.
> > > (Compaction does not remove data that is not longer managed by a node,
> > only
> > > repair does? Or does only nodetool clean do that?)
> > >
> > > Pre-subdivided sstables for manually maanged tokens would REALLY pay
> big
> > > dividends in large-scale cluster expansion. Say you wanted to double or
> > > triple the cluster. Since the sstables are already split by some
> numeric
> > > factor that has lots of even divisors (60 for RF 2,3,4,5), you simply
> > bulk
> > > copy the already-subdivided sstables for the new nodes' hash ranges and
> > > you'd basically be done. In AWS EBS volumes, that could just be a drive
> > > detach / drive attach.
> > >
> > >
> > >
> > >
> > >> On Tue, Apr 17, 2018 at 7:37 AM, kurt greaves <k...@instaclustr.com>
> > wrote:
> > >>
> > >> Great write up. Glad someone finally did the math for us. I don't
> think
> > >> this will come as a surprise for many of the developers. Availability
> is
> > >> only one issue raised by vnodes. Load distribution and performance are
> > also
> > >> pretty big concerns.
> > >>
> > >> I'm always a proponent for fixing vnodes, and removing them as a
> default
> > >> until we do. Happy to help on this and we have ideas in mind that at
> > some
> > >> point I'll create tickets for...
> > >>
> > >>> On Tue., 17 Apr. 2018, 06:16 Joseph Lynch, <joe.e.ly...@gmail.com>
> > wrote:
> > >>>
> > >>> If the blob link on github doesn't work for the pdf (looks like
> mobile
> > >>> might not like it), try:
> > >>>
> > >>>
> > >>> https://github.com/jolynch/python_performance_toolkit/
> > >> raw/master/notebooks/cassandra_availability/whitepaper/cassandra-
> > >> availability-virtual.pdf
> > >>>
> > >>> -Joey
> > >>> <
> > >>> https://github.com/jolynch/python_performance_toolkit/
> > >> raw/master/notebooks/cassandra_availability/whitepaper/cassandra-
> > >> availability-virtual.pdf
> > >>>>
> > >>>
> > >>> On Mon, Apr 16, 2018 at 1:14 PM, Joseph Lynch <joe.e.ly...@gmail.com
> >
> > >>> wrote:
> > >>>
> > >>>> Josh Snyder and I have been working on evaluating virtual nodes for
> > >> large
> > >>>> scale deployments and while it seems like there is a lot of
> anecdotal
> > >>>> support for reducing the vnode count [1], we couldn't find any
> > concrete
> > >>>> math on the topic, so we had some fun and took a whack at
> quantifying
> > >> how
> > >>>> different choices of num_tokens impact a Cassandra cluster.
> > >>>>
> > >>>> According to the model we developed [2] it seems that at small
> cluster
> > >>>> sizes there isn't much of a negative impact on availability, but
> when
> > >>>> clusters scale up to hundreds of hosts, vnodes have a major impact
> on
> > >>>> availability. In particular, the probability of outage during short
> > >>>> failures (e.g. process restarts or failures) or permanent failure
> > (e.g.
> > >>>> disk or machine failure) appears to be orders of magnitude higher
> for
> > >>> large
> > >>>> clusters.
> > >>>>
> > >>>> The model attempts to explain why we may care about this and
> advances
> > a
> > >>>> few existing/new ideas for how to fix the scalability problems that
> > >>> vnodes
> > >>>> fix without the availability (and consistency—due to the effects on
> > >>> repair)
> > >>>> problems high num_tokens create. We would of course be very
> interested
> > >> in
> > >>>> any feedback. The model source code is on github [3], PRs are
> welcome
> > >> or
> > >>>> feel free to play around with the jupyter notebook to match your
> > >>>> environment and see what the graphs look like. I didn't attach the
> pdf
> > >>> here
> > >>>> because it's too large apparently (lots of pretty graphs).
> > >>>>
> > >>>> I know that users can always just pick whichever number they prefer,
> > >> but
> > >>> I
> > >>>> think the current default was chosen when token placement was
> random,
> > >>> and I
> > >>>> wonder whether it's still the right default.
> > >>>>
> > >>>> Thank you,
> > >>>> -Joey Lynch
> > >>>>
> > >>>> [1] https://issues.apache.org/jira/browse/CASSANDRA-13701
> > >>>> [2] https://github.com/jolynch/python_performance_toolkit/
> > >>>> raw/master/notebooks/cassandra_availability/whitepaper/cassandra-
> > >>>> availability-virtual.pdf
> > >>>>
> > >>>> <
> > >>> https://github.com/jolynch/python_performance_toolkit/
> > >> blob/master/notebooks/cassandra_availability/whitepaper/cassandra-
> > >> availability-virtual.pdf
> > >>>>
> > >>>> [3] https://github.com/jolynch/python_performance_toolkit/tree/m
> > >>>> aster/notebooks/cassandra_availability
> > >>>>
> > >>>
> > >>
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>

Re: Quantifying Virtual Node Impact on Cassandra Availability

2018-04-16 Thread Joseph Lynch

If the blob link on github doesn't work for the pdf (looks like mobile
might not like it), try:

https://github.com/jolynch/python_performance_toolkit/raw/master/notebooks/cassandra_availability/whitepaper/cassandra-availability-virtual.pdf

-Joey
<https://github.com/jolynch/python_performance_toolkit/raw/master/notebooks/cassandra_availability/whitepaper/cassandra-availability-virtual.pdf>

On Mon, Apr 16, 2018 at 1:14 PM, Joseph Lynch <joe.e.ly...@gmail.com> wrote:

> Josh Snyder and I have been working on evaluating virtual nodes for large
> scale deployments and while it seems like there is a lot of anecdotal
> support for reducing the vnode count [1], we couldn't find any concrete
> math on the topic, so we had some fun and took a whack at quantifying how
> different choices of num_tokens impact a Cassandra cluster.
>
> According to the model we developed [2] it seems that at small cluster
> sizes there isn't much of a negative impact on availability, but when
> clusters scale up to hundreds of hosts, vnodes have a major impact on
> availability. In particular, the probability of outage during short
> failures (e.g. process restarts or failures) or permanent failure (e.g.
> disk or machine failure) appears to be orders of magnitude higher for large
> clusters.
>
> The model attempts to explain why we may care about this and advances a
> few existing/new ideas for how to fix the scalability problems that vnodes
> fix without the availability (and consistency—due to the effects on repair)
> problems high num_tokens create. We would of course be very interested in
> any feedback. The model source code is on github [3], PRs are welcome or
> feel free to play around with the jupyter notebook to match your
> environment and see what the graphs look like. I didn't attach the pdf here
> because it's too large apparently (lots of pretty graphs).
>
> I know that users can always just pick whichever number they prefer, but I
> think the current default was chosen when token placement was random, and I
> wonder whether it's still the right default.
>
> Thank you,
> -Joey Lynch
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-13701
> [2] https://github.com/jolynch/python_performance_toolkit/
> raw/master/notebooks/cassandra_availability/whitepaper/cassandra-
> availability-virtual.pdf
>
> <https://github.com/jolynch/python_performance_toolkit/blob/master/notebooks/cassandra_availability/whitepaper/cassandra-availability-virtual.pdf>
> [3] https://github.com/jolynch/python_performance_toolkit/tree/m
> aster/notebooks/cassandra_availability
>

Quantifying Virtual Node Impact on Cassandra Availability

2018-04-16 Thread Joseph Lynch

Josh Snyder and I have been working on evaluating virtual nodes for large
scale deployments and while it seems like there is a lot of anecdotal
support for reducing the vnode count [1], we couldn't find any concrete
math on the topic, so we had some fun and took a whack at quantifying how
different choices of num_tokens impact a Cassandra cluster.

According to the model we developed [2] it seems that at small cluster
sizes there isn't much of a negative impact on availability, but when
clusters scale up to hundreds of hosts, vnodes have a major impact on
availability. In particular, the probability of outage during short
failures (e.g. process restarts or failures) or permanent failure (e.g.
disk or machine failure) appears to be orders of magnitude higher for large
clusters.

The model attempts to explain why we may care about this and advances a few
existing/new ideas for how to fix the scalability problems that vnodes fix
without the availability (and consistency—due to the effects on repair)
problems high num_tokens create. We would of course be very interested in
any feedback. The model source code is on github [3], PRs are welcome or
feel free to play around with the jupyter notebook to match your
environment and see what the graphs look like. I didn't attach the pdf here
because it's too large apparently (lots of pretty graphs).

I know that users can always just pick whichever number they prefer, but I
think the current default was chosen when token placement was random, and I
wonder whether it's still the right default.

Thank you,
-Joey Lynch

[1] https://issues.apache.org/jira/browse/CASSANDRA-13701
[2]
https://github.com/jolynch/python_performance_toolkit/raw/master/notebooks/cassandra_availability/whitepaper/cassandra-availability-virtual.pdf

[3] https://github.com/jolynch/python_performance_toolkit/tree/
master/notebooks/cassandra_availability

Re: Repair scheduling tools

2018-04-12 Thread Joseph Lynch

Given the feedback here and on the ticket, I've written up a proposal
for a repair
sidecar tool

in the ticket's design document. If there are no major concerns we're going
to start working on porting the Priam implementation into this new tool
soon.

-Joey

On Tue, Apr 10, 2018 at 4:17 PM, Elliott Sims  wrote:

> My two cents as a (relatively small) user.  I'm coming at this from the
> ops/user side, so my apologies if some of these don't make sense based on a
> more detailed understanding of the codebase:
>
> Repair is definitely a major missing piece of Cassandra.  Integrated would
> be easier, but a sidecar might be more flexible.  As an intermediate step
> that works towards both options, does it make sense to start with
> finer-grained tracking and reporting for subrange repairs?  That is, expose
> a set of interfaces (both internally and via JMX) that give a scheduler
> enough information to run subrange repairs across multiple keyspaces or
> even non-overlapping ranges at the same time.  That lets people experiment
> with and quickly/safely/easily iterate on different scheduling strategies
> in the short term, and long-term those strategies can be integrated into a
> built-in scheduler
>
> On the subject of scheduling, I think adjusting parallelism/aggression with
> a possible whitelist or blacklist would be a lot more useful than a "time
> between repairs".  That is, if repairs run for a few hours then don't run
> for a few (somewhat hard-to-predict) hours, I still have to size the
> cluster for the load when the repairs are running.   The only reason I can
> think of for an interval between repairs is to allow re-compaction from
> repair anticompactions, and subrange repairs seem to eliminate this.  Even
> if they didn't, a more direct method along the lines of "don't repair when
> the compaction queue is too long" might make more sense.  Blacklisted
> timeslots might be useful for avoiding peak time or batch jobs, but only if
> they can be specified for consistent time-of-day intervals instead of
> unpredictable lulls between repairs.
>
> I really like the idea of automatically adjusting gc_grace_seconds based on
> repair state.  The only_purge_repaired_tombstones option fixes this
> elegantly for sequential/incremental repairs on STCS, but not for subrange
> repairs or LCS (unless a scheduler gains the ability somehow to determine
> that every subrange in an sstable has been repaired and mark it
> accordingly?)
>
>
> On 2018/04/03 17:48:14, Blake Eggleston  wrote:
> > Hi dev@,>
> >
> >  >
> >
> > The question of the best way to schedule repairs came up on
> CASSANDRA-14346, and I thought it would be good to bring up the idea of an
> external tool on the dev list.>
> >
> >  >
> >
> > Cassandra lacks any sort of tools for automating routine tasks that are
> required for running clusters, specifically repair. Regular repair is a
> must for most clusters, like compaction. This means that, especially as far
> as eventual consistency is concerned, Cassandra isn’t totally functional
> out of the box. Operators either need to find a 3rd party solution or
> implement one themselves. Adding this to Cassandra would make it easier to
> use.>
> >
> >  >
> >
> > Is this something we should be doing? If so, what should it look like?>
> >
> >  >
> >
> > Personally, I feel like this is a pretty big gap in the project and would
> like to see an out of process tool offered. Ideally, Cassandra would just
> take care of itself, but writing a distributed repair scheduler that you
> trust to run in production is a lot harder than writing a single process
> management application that can failover.>
> >
> >  >
> >
> > Any thoughts on this?>
> >
> >  >
> >
> > Thanks,>
> >
> >  >
> >
> > Blake>
> >
> >
>

Re: Repair scheduling tools

2018-04-12 Thread Joseph Lynch

>
> I personally would rather see improvements to reaper and supporting reaper
> so the repair tool improvements aren't tied to Cassandra releases. If we
> get to a place where the repair tools are stable then figuring out how to
> bundle for the best install makes sense to me.
>

I view the design we've proposed as taking many of the core ideas of
Datastax Repair Service and Reaper, adding in production experience from
Netflix (see the resiliency points and e.g. how remote JMX is inherently
insecure and unreliable) and harmonizing them with Cassandra's shared
nothing design. A few Reaper developers have already made really good
contributions to the design document and we will certainly be taking
Reaper's experience into account as we try to move this forward.


> If we add things that will support reaper other repair solutions could also
> take advantage.
>

I strongly believe that continuous, always on, repair is too important to
leave to an external tool as it impacts the fundamental correctness of the
database. Without continuous repair you can have data loss, data
resurrection, and violations of quorum-quorum read after write consistency.

-Joey

Re: Roadmap for 4.0

2018-04-12 Thread Joseph Lynch

The Netflix team prefers September as well. We don't have time before that
to do a full certification (e2e and performance testing), but can probably
work it into end of Q3 / start of Q4.

I personally hope that the extra time gives us as a community a chance to
come up with a compelling user story for why users would want to upgrade. I
don't feel we have one right now.

-Joey


On Thu, Apr 12, 2018 at 2:51 PM, Ariel Weisberg  wrote:

> Hi,
>
> +1 to September 1st. I know I will have much better availability then.
>
> Ariel
> On Thu, Apr 12, 2018, at 5:15 PM, Sankalp Kohli wrote:
> > +1 with Sept 1st as I am seeing willingness for people to test it after
> it
> >
> > > On Apr 12, 2018, at 13:59, Ben Bromhead  wrote:
> > >
> > > While I would prefer earlier, if Sept 1 gets better buy-in and we can
> have
> > > broader commitment to testing. I'm super happy with that. As Nate said,
> > > having a solid line to work towards is going to help massively.
> > >
> > > On Thu, Apr 12, 2018 at 4:07 PM Nate McCall 
> wrote:
> > >
> > >>> If we push it to Sept 1 freeze, I'll personally spend a lot of time
> > >> testing.
> > >>>
> > >>> What can I do to help convince the Jun1 folks that Sept1 is
> acceptable?
> > >>
> > >> I can come around to that. At this point, I really just want us to
> > >> have a date we can start talking to/planning around.
> > >>
> > >> -
> > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>
> > >> --
> > > Ben Bromhead
> > > CTO | Instaclustr 
> > > +1 650 284 9692
> > > Reliability at Scale
> > > Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch

>
> We see this in larger clusters regularly. Usually folks have just
> 'grown into it' because it was the default.
>

I could understand a few dozen nodes with 256 vnodes, but hundreds is
surprising. I have a whitepaper draft lying around showing how vnodes
decrease availability in large clusters by orders of magnitude, I'll polish
it up and send it out to the list when I get a second.

In the meantime, sorry for de-railing a conversation about repair
scheduling to talk about vnodes, let's chat about that in a different
thread :-)

-Joey

Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch

Sorry sent early.

To explain further, the scheduler is entirely decentralized in the proposed
design, and no node holds all the information you're talking about in heap
at once (in fact no one node would ever hold that information). Each node
is responsible only for tokens that they are "primary" replicas of. Then
each token is split by tables and then each table range is individually
split into subranges, into at most a few hundred range splits (typically
one or two, you don't want too many otherwise you'll have too many small
sstables) at a time. This is all at most megabytes of data, and I really do
believe would not cause significant, if any, heap pressure. The repairs
*themselves* certainly would create heap pressure, but that happens
regardless of the scheduler.

-Joey

On Thu, Apr 5, 2018 at 7:25 PM, Joseph Lynch <joe.e.ly...@gmail.com> wrote:

> I wouldn't trivialize it, scheduling can end up dealing with more than a
>> single repair. If theres 1000 keyspace/tables, with 400 nodes and 256
>> vnodes on each thats a lot of repairs to plan out and keep track of and can
>> easily cause heap allocation spikes if opted in.
>>
>> Chris
>
> The current proposal never keeps track of more than a few hundred range
> splits for a single table at a time, and nothing ever keeps state for the
> entire 400 node  Compared to the load generated by actually repairing the
> data, I actually do think it is trivial heap pressure.
>
>
> Somewhat beside the point, I wasn't aware there were any 100 node +
> clusters running with vnodes, if my math is correct they would be
> excessively vulnerable to outages with that many vnodes and that many
> nodes. Most of the large clusters I've heard of (100 nodes plus) are
> running with single or at most 4 tokens per node.
>

Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch

>
> I wouldn't trivialize it, scheduling can end up dealing with more than a
> single repair. If theres 1000 keyspace/tables, with 400 nodes and 256
> vnodes on each thats a lot of repairs to plan out and keep track of and can
> easily cause heap allocation spikes if opted in.
>
> Chris

The current proposal never keeps track of more than a few hundred range
splits for a single table at a time, and nothing ever keeps state for the
entire 400 node  Compared to the load generated by actually repairing the
data, I actually do think it is trivial heap pressure.


Somewhat beside the point, I wasn't aware there were any 100 node +
clusters running with vnodes, if my math is correct they would be
excessively vulnerable to outages with that many vnodes and that many
nodes. Most of the large clusters I've heard of (100 nodes plus) are
running with single or at most 4 tokens per node.

Re: Repair scheduling tools

2018-04-05 Thread Joseph Lynch

vantage for repair to embedded in the core is that
> there
> > > is
> > > > no
> > > > > >> need to expose
> > > > > >> internal state to the repair logic. So an external program
> doesn't
> > > > need
> > > > > to
> > > > > >> deal with different
> > > > > >> version of Cassandra, different repair capabilities of the core
> > > (such
> > > > as
> > > > > >> incremental on/off)
> > > > > >> and so forth. A good database should schedule its own repair, it
> > > knows
> > > > > >> whether the shreshold
> > > > > >> of hintedhandoff was cross or not, it knows whether nodes where
> > > > > replaced,
> > > > > >> etc,
> > > > > >>
> > > > > >> My 2 cents. Dor
> > > > > >>
> > > > > >> On Tue, Apr 3, 2018 at 11:13 PM, Dinesh Joshi <
> > > > > >> dinesh.jo...@yahoo.com.invalid> wrote:
> > > > > >>
> > > > > >>> Simon,
> > > > > >>> You could still do load aware repair outside of the main
> process
> > by
> > > > > >>> reading Cassandra's metrics.
> > > > > >>> In general, I don't think the maintenance tasks necessarily
> need
> > to
> > > > > live
> > > > > >>> in the main process. They could negatively impact the read /
> > write
> > > > > path.
> > > > > >>> Unless strictly required by the serving path, it could live in
> a
> > > > > sidecar
> > > > > >>> process. There are multiple benefits including isolation,
> faster
> > > > > iteration,
> > > > > >>> loose coupling. For example - this would mean that the
> > maintenance
> > > > > tasks
> > > > > >>> can have a different gc profile than the main process and it
> > would
> > > be
> > > > > ok.
> > > > > >>> Today that is not the case.
> > > > > >>> The only issue I see is that the project does not provide an
> > > official
> > > > > >>> sidecar. Perhaps there should be one. We probably would've not
> > had
> > > to
> > > > > have
> > > > > >>> this discussion ;)
> > > > > >>> Dinesh
> > > > > >>>
> > > > > >>> On Tuesday, April 3, 2018, 10:12:56 PM PDT, Qingcun Zhou <
> > > > > >>> zhouqing...@gmail.com> wrote:
> > > > > >>>
> > > > > >>> Repair has been a problem for us at Uber. In general I'm in
> favor
> > > of
> > > > > >>> including the scheduling logic in Cassandra daemon. It has the
> > > > benefit
> > > > > of
> > > > > >>> introducing something like load-aware repair, eg, only schedule
> > > > repair
> > > > > >>> while no ongoing compaction or traffic is low, etc. As proposed
> > by
> > > > > others,
> > > > > >>> we can expose keyspace/table-level configurations so that users
> > can
> > > > > opt-in.
> > > > > >>> Regarding the risk, yes there will be problems at the beginning
> > but
> > > > in
> > > > > the
> > > > > >>> long run, users will appreciate that repair works out of the
> box,
> > > > just
> > > > > like
> > > > > >>> compaction. We have large Cassandra deployments and can work
> with
> > > > > Netflix
> > > > > >>> folks for intensive testing to boost user confidence.
> > > > > >>>
> > > > > >>> On the other hand, have we looked into how other NoSQL
> databases
> > do
> > > > > repair?
> > > > > >>> Is there a side car process?
> > > > > >>>
> > > > > >>>
> > > > > >>> On Tue, Apr 3, 2018 at 9:21 PM, sankalp kohli <
> > > > kohlisank...@gmail.com
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>> Repair is critical for running C* and I agree with Roopa that
> it
> >

1 2 >

1 - 100 of 102 matches

Mail list logo