Re: [DISCUSS] Updating the C* website design

2020-08-21 Thread Rahul Singh
Seems like even Antora uses another SSG called middleman for their “marketing” 
home page.

https://gitlab.com/antora/antora.org

If the convenience of having both content and docs all in one SSG for code 
maintenance is compatible with the aesthetic/ content / taxonomy strategy need 
for the site visitors, we’ll find out soon enough.




rahul.xavier.si...@gmail.com

http://cassandra.link
The Apache Cassandra Knowledge Base.
On Aug 21, 2020, 8:54 PM -0400, Rahul Singh , wrote:
> Folks,
>
> I applaud the choice of Antora for documentation but I’m not sure it is the 
> best choice for generating an appealing site.
>
> Antora’s self professed strength is in technical documentation. Do we want to 
> stick to a “documentation” / utility look for the front facing site or for a 
> blog?
>
> https://gitlab.com/antora/antora/-/issues/444
>
> I don’t want to rehash any conclusion on choosing Antora for docs or whether 
> asciidoc is the choice for writing documentation.
>
> Could we think about using something like Gatsby or similar for the front 
> facing 5-10 pages + blog ? E. G. Skywalking uses vuepress.
>
> We can use asciidoc as the common format while using Antora for the docs and 
> something else for the rest of the content 
> (https://www.gatsbyjs.com/plugins/gatsby-transformer-asciidoc/)
>
> Something like Gatsby can use both Markdown and Asciidoc and we can migrate 
> from one to the other while still using the same tooling.
>
> Just some thoughts would love feedback!
>
> rahul.xavier.si...@gmail.com
>
> http://cassandra.link
> The Apache Cassandra Knowledge Base.
> On Jul 29, 2020, 1:28 PM -0400, M Brandon Williams , wrote:
> >
> > web


Re: [DISCUSS] Updating the C* website design

2020-08-21 Thread Rahul Singh
Folks,

I applaud the choice of Antora for documentation but I’m not sure it is the 
best choice for generating an appealing site.

Antora’s self professed strength is in technical documentation. Do we want to 
stick to a “documentation” / utility look for the front facing site or for a 
blog?

https://gitlab.com/antora/antora/-/issues/444

I don’t want to rehash any conclusion on choosing Antora for docs or whether 
asciidoc is the choice for writing documentation.

Could we think about using something like Gatsby or similar for the front 
facing 5-10 pages + blog ? E. G. Skywalking uses vuepress.

We can use asciidoc as the common format while using Antora for the docs and 
something else for the rest of the content 
(https://www.gatsbyjs.com/plugins/gatsby-transformer-asciidoc/)

Something like Gatsby can use both Markdown and Asciidoc and we can migrate 
from one to the other while still using the same tooling.

Just some thoughts would love feedback!

rahul.xavier.si...@gmail.com

http://cassandra.link
The Apache Cassandra Knowledge Base.
On Jul 29, 2020, 1:28 PM -0400, M Brandon Williams , wrote:
>
> web


Re: [Discuss] num_tokens default in Cassandra 4.0

2020-02-17 Thread Rahul Singh
+1 on 8

rahul.xavier.si...@gmail.com

http://cassandra.link
The Apache Cassandra Knowledge Base.
On Feb 17, 2020, 5:20 PM -0500, Erick Ramirez , 
wrote:
> +1 on 8 tokens. I'd personally like us to be able to move this along pretty
> quickly as it's confusing for users looking for direction. Cheers!
>
> On Tue, 18 Feb 2020, 9:14 am Jeremy Hanna, 
> wrote:
>
> > I just wanted to close the loop on this if possible. After some discussion
> > in slack about various topics, I would like to see if people are okay with
> > num_tokens=8 by default (as it's not much different operationally than
> > 16). Joey brought up a few small changes that I can put on the ticket. It
> > also requires some documentation for things like decommission order and
> > skew.
> >
> > Are people okay with this change moving forward like this? If so, I'll
> > comment on the ticket and we can move forward.
> >
> > Thanks,
> >
> > Jeremy
> >


Re: Ideas for Cassandra 2020 - Remote Meetups / Mastermind

2020-02-10 Thread Rahul Singh
Thanks, Michael. I sent that before I read up on the notes.

@nate , @jon , @dinesh I can help with the Documentation.

If you have any specific doc issues you want reviewed, edited, etc,
please let me know.

If there's a specific JQL on the JIRA board I can start with there.

rahul.xavier.si...@gmail.com

http://cassandra.link



On Mon, Feb 10, 2020 at 8:36 AM Michael Shuler 
wrote:

> This is great, thanks, the project appreciates the effort. 2019 is over,
> don't worry about the past. Moving forward in little or large steps is
> the goal. :)
>
> If you didn't get a chance to attend the first Contributor Meeting,
> there will be more. Patrick sent out a survey last week for feedback, so
> I imagine these will continue at a regular interval with continued
> interest and attendance. Perhaps you could schedule some in-person
> meetup thing to coincide, as an idea, or just get the word out to others
> that might be interested in listening in?
>
> (Next one has not been scheduled yet, but will show up here on dev@ list.)
>
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+Contributor+Meeting
>
> Michael
>
> On 2/8/20 10:06 PM, Rahul Singh wrote:
> > Folks, (Initially meant for User , but realized after I wrote it , it’s
> more sausage making talk which en users probably don’t care about)
> >
> > I took on a bunch of work and finally starting to get my head out of the
> sand and realized I failed to deliver on some promises last year I made to
> myself and others to contribute to this community. I wanted to resurface a
> few thoughts on which I would like to contribute.
> >
> > We had a conversation on here a while ago to try doing a virtual
> conference.. which I think is a bit too ambitious. I also spoke to Dinesh
> last year briefly about doing periodic development meetings which focused
> on the development planning and execution.
> >
> > I’d like to help this project but I don’t know where to start. I tried
> getting some Jr. members internally at Anant who had time to make fixes on
> content and docs but it didn’t get looked at or reviewed so they lost
> interest. There’s only so much they would want to do based on my requests.
> The failure to deliver on better documentation organization was mainly mine
> because I didn’t commit enough time into it.
> >
> > I don’t think our community does a good enough job communicating the
> Cassandra value proposition to the enterprise community whether they are
> developers, architects, or directors. I’ve been meeting with many folks
> that haven’t touched their clusters since installing 2.1 (because it’s
> pretty damn good for most people!). When I ask them why, it’s a combination
> of team member churn but also because the knowledge is not as accessible.
> >
> > This year as January closes I am recommitting myself to some ideas and
> would LOVE your feedback. If somethings like this are in progress, I will
> help.
> >
> >
> > 1. Cassandra Lunch - I’ve been seeing a colleague getting together with
> his fellow practitioners for a weekly “Sitecore Lunch” and I found it a
> very easy way to get people talking that normally wouldn’t be interacting
> with each other in realtime.
> > 2. Coordinated Remote Meetup - I think this would be way easier to
> organize and get cross promoted as a quarterly event with the help of local
> organizers. I’m currently organizing DC / Chicago and have been cross
> promoting virtual talks to both and have gotten a good show with people
> curious about Cassandra.
> > 3. Documentation - I know I said I’d help last year. I underestimated my
> free time and over estimated my capacity to focus. That being said , this
> is one of my passions and I help a lot of orgs get their [blank] together
> on how to manage their people, process, info and systems and the first
> thing is always knowledge management. If there’s someone I can shadow and
> apprentice under to help with Cassandra.Apache.org I really want to help
> revitalize our site.
> >
> >
> > These may still be overestimating my capacity but I’m willing to fail
> and try again. :)
> >
> >
> > rahul.xavier.si...@gmail.com
> >
> > http://cassandra.link
> > The Apache Cassandra Knowledge Base.
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Do we need Javadoc in binary distribution? Was: [RELEASE] Apache Cassandra 4.0-alpha3 released

2020-02-09 Thread Rahul Singh
Non binding +1

For not deploying java docs on every node.

Rahul Singh | Business Platform Architect
1.202.390.9200 | rahul.si...@anant.us

Anant
3 Washington Circle NW, #301
Washington , D.C. 20037
https://anant.us
On Feb 9, 2020, 5:26 AM -0500, Alex Ott , wrote:
> https://issues.apache.org/jira/browse/CASSANDRA-15561
>
> Michael Shuler at "Sat, 8 Feb 2020 11:24:01 -0600" wrote:
> MS> I like this idea for keeping binary deployment size down. I'm not sure 
> how to handle it
> MS> for the tarballs, but we could certainly split the docs out of the debian 
> and rpm packages
> MS> to add cassandra-docs_.{deb,rpm} packages, so they are 
> installable separately, if
> MS> the user wants them. This is common when docs get large. I suppose the 
> same could be done
> MS> for apache-cassandra-docs-,tar.gz, but I'm not sure about the 
> release policy part
> MS> of things here. Needs research.
>
> MS> Please, open a JIRA on this as a packaging improvement.
>
> MS> Kind regards,
> MS> Michael
>
> MS> On 2/8/20 3:06 AM, Alex Ott wrote:
> > > Hi
> > >
> > > I've unpacked binary distribution & noticed that we ship many files in the
> > > javadoc directory - more than 5 thousand files, that occupy 99Mb on disk
> > > out of 149Mb for whole unpacked Cassandra.
> > >
> > > If we look from practical standpoint - do we expect that people who run
> > > Cassandra will use javadoc for any purpose? I know that it often contains
> > > useful details about implementation, but if we talk about day-to-day work,
> > > imho, these files aren't required, at least not on every machine that has
> > > Cassandra on it.
> > >
> > > Maybe we can generate a separate artifact for Javadoc files?
> > >
> > > Mick Semb Wever at "Fri, 07 Feb 2020 21:02:09 +0100" wrote:
> > > MSW> The Cassandra team is pleased to announce the release of Apache 
> > > Cassandra version 4.0-alpha3.
> > >
> > > MSW> Apache Cassandra is a fully distributed database. It is the right 
> > > choice when you
> > > MSW> need scalability and high availability without compromising 
> > > performance.
> > >
> > > MSW> http://cassandra.apache.org/
> > >
> > > MSW> Downloads of source and binary distributions are listed in our 
> > > download section:
> > > MSW> http://cassandra.apache.org/download/
> > >
> > >
> > > MSW> Downloads of source and binary distributions:
> > > MSW> 
> > > http://www.apache.org/dyn/closer.lua/cassandra/4.0-alpha3/apache-cassandra-4.0-alpha3-bin.tar.gz
> > > MSW> 
> > > http://www.apache.org/dyn/closer.lua/cassandra/4.0-alpha3/apache-cassandra-4.0-alpha3-src.tar.gz
> > >
> > > MSW> Debian and Redhat configurations.
> > >
> > > MSW> sources.list:
> > > MSW> deb http://www.apache.org/dist/cassandra/debian 40x main
> > >
> > > MSW> yum config:
> > > MSW> baseurl=https://www.apache.org/dist/cassandra/redhat/40x/
> > >
> > > MSW> See http://cassandra.apache.org/download/ for full install 
> > > instructions.
> > >
> > > MSW> This is an ALPHA version! It is not intended for production use, 
> > > however
> > > MSW> the project would appreciate your testing and feedback to make the 
> > > final
> > > MSW> release better. As always, please pay attention to the release 
> > > notes[2]
> > > MSW> and let us know[3] if you encounter any problems.
> > >
> > > MSW> Enjoy!
> > >
> > > MSW> [1]: CHANGES.txt 
> > > ?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-4.0-alpha3
> > > MSW> [2]: NEWS.txt 
> > > ?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-4.0-alpha3
> > > MSW> [3]: https://issues.apache.org/jira/browse/CASSANDRA
> > >
> > > MSW> -
> > > MSW> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > > MSW> For additional commands, e-mail: user-h...@cassandra.apache.org
> > >
> > >
> > >
>
> MS> -
> MS> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> MS> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>
>
> --
> With best wishes, Alex Ott
> Principal Architect, DataStax
> http://datastax.com/
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>


Re: Do we need Javadoc in binary distribution? Was: [RELEASE] Apache Cassandra 4.0-alpha3 released

2020-02-09 Thread Rahul Singh
+1

Nonbinding


Rahul Singh | Business Platform Architect
1.202.390.9200 | rahul.si...@anant.us

Anant
3 Washington Circle NW, #301
Washington , D.C. 20037
https://anant.us
On Feb 9, 2020, 5:26 AM -0500, Alex Ott , wrote:
> https://issues.apache.org/jira/browse/CASSANDRA-15561
>
> Michael Shuler at "Sat, 8 Feb 2020 11:24:01 -0600" wrote:
> MS> I like this idea for keeping binary deployment size down. I'm not sure 
> how to handle it
> MS> for the tarballs, but we could certainly split the docs out of the debian 
> and rpm packages
> MS> to add cassandra-docs_.{deb,rpm} packages, so they are 
> installable separately, if
> MS> the user wants them. This is common when docs get large. I suppose the 
> same could be done
> MS> for apache-cassandra-docs-,tar.gz, but I'm not sure about the 
> release policy part
> MS> of things here. Needs research.
>
> MS> Please, open a JIRA on this as a packaging improvement.
>
> MS> Kind regards,
> MS> Michael
>
> MS> On 2/8/20 3:06 AM, Alex Ott wrote:
> > > Hi
> > >
> > > I've unpacked binary distribution & noticed that we ship many files in the
> > > javadoc directory - more than 5 thousand files, that occupy 99Mb on disk
> > > out of 149Mb for whole unpacked Cassandra.
> > >
> > > If we look from practical standpoint - do we expect that people who run
> > > Cassandra will use javadoc for any purpose? I know that it often contains
> > > useful details about implementation, but if we talk about day-to-day work,
> > > imho, these files aren't required, at least not on every machine that has
> > > Cassandra on it.
> > >
> > > Maybe we can generate a separate artifact for Javadoc files?
> > >
> > > Mick Semb Wever at "Fri, 07 Feb 2020 21:02:09 +0100" wrote:
> > > MSW> The Cassandra team is pleased to announce the release of Apache 
> > > Cassandra version 4.0-alpha3.
> > >
> > > MSW> Apache Cassandra is a fully distributed database. It is the right 
> > > choice when you
> > > MSW> need scalability and high availability without compromising 
> > > performance.
> > >
> > > MSW> http://cassandra.apache.org/
> > >
> > > MSW> Downloads of source and binary distributions are listed in our 
> > > download section:
> > > MSW> http://cassandra.apache.org/download/
> > >
> > >
> > > MSW> Downloads of source and binary distributions:
> > > MSW> 
> > > http://www.apache.org/dyn/closer.lua/cassandra/4.0-alpha3/apache-cassandra-4.0-alpha3-bin.tar.gz
> > > MSW> 
> > > http://www.apache.org/dyn/closer.lua/cassandra/4.0-alpha3/apache-cassandra-4.0-alpha3-src.tar.gz
> > >
> > > MSW> Debian and Redhat configurations.
> > >
> > > MSW> sources.list:
> > > MSW> deb http://www.apache.org/dist/cassandra/debian 40x main
> > >
> > > MSW> yum config:
> > > MSW> baseurl=https://www.apache.org/dist/cassandra/redhat/40x/
> > >
> > > MSW> See http://cassandra.apache.org/download/ for full install 
> > > instructions.
> > >
> > > MSW> This is an ALPHA version! It is not intended for production use, 
> > > however
> > > MSW> the project would appreciate your testing and feedback to make the 
> > > final
> > > MSW> release better. As always, please pay attention to the release 
> > > notes[2]
> > > MSW> and let us know[3] if you encounter any problems.
> > >
> > > MSW> Enjoy!
> > >
> > > MSW> [1]: CHANGES.txt 
> > > ?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-4.0-alpha3
> > > MSW> [2]: NEWS.txt 
> > > ?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-4.0-alpha3
> > > MSW> [3]: https://issues.apache.org/jira/browse/CASSANDRA
> > >
> > > MSW> -
> > > MSW> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > > MSW> For additional commands, e-mail: user-h...@cassandra.apache.org
> > >
> > >
> > >
>
> MS> -
> MS> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> MS> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>
>
> --
> With best wishes, Alex Ott
> Principal Architect, DataStax
> http://datastax.com/
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>


Ideas for Cassandra 2020 - Remote Meetups / Mastermind

2020-02-08 Thread Rahul Singh
Folks, (Initially meant for User , but realized after I wrote it , it’s more 
sausage making talk which en users probably don’t care about)

I took on a bunch of work and finally starting to get my head out of the sand 
and realized I failed to deliver on some promises last year I made to myself 
and others to contribute to this community. I wanted to resurface a few 
thoughts on which I would like to contribute.

We had a conversation on here a while ago to try doing a virtual conference.. 
which I think is a bit too ambitious. I also spoke to Dinesh last year briefly 
about doing periodic development meetings which focused on the development 
planning and execution.

I’d like to help this project but I don’t know where to start. I tried getting 
some Jr. members internally at Anant who had time to make fixes on content and 
docs but it didn’t get looked at or reviewed so they lost interest. There’s 
only so much they would want to do based on my requests. The failure to deliver 
on better documentation organization was mainly mine because I didn’t commit 
enough time into it.

I don’t think our community does a good enough job communicating the Cassandra 
value proposition to the enterprise community whether they are developers, 
architects, or directors. I’ve been meeting with many folks that haven’t 
touched their clusters since installing 2.1 (because it’s pretty damn good for 
most people!). When I ask them why, it’s a combination of team member churn but 
also because the knowledge is not as accessible.

This year as January closes I am recommitting myself to some ideas and would 
LOVE your feedback. If somethings like this are in progress, I will help.


1. Cassandra Lunch - I’ve been seeing a colleague getting together with his 
fellow practitioners for a weekly “Sitecore Lunch” and I found it a very easy 
way to get people talking that normally wouldn’t be interacting with each other 
in realtime.
2. Coordinated Remote Meetup - I think this would be way easier to organize and 
get cross promoted as a quarterly event with the help of local organizers. I’m 
currently organizing DC / Chicago and have been cross promoting virtual talks 
to both and have gotten a good show with people curious about Cassandra.
3. Documentation - I know I said I’d help last year. I underestimated my free 
time and over estimated my capacity to focus. That being said , this is one of 
my passions and I help a lot of orgs get their [blank] together on how to 
manage their people, process, info and systems and the first thing is always 
knowledge management. If there’s someone I can shadow and apprentice under to 
help with Cassandra.Apache.org I really want to help revitalize our site.


These may still be overestimating my capacity but I’m willing to fail and try 
again. :)


rahul.xavier.si...@gmail.com

http://cassandra.link
The Apache Cassandra Knowledge Base.


Re: Fwd: [CI] What are the troubles projects face with CI and Infra

2020-02-08 Thread Rahul Singh
Related to instances, can we get those credits put to use that Amazon promised 
to give back to the community as part of their Amazon Managed Cassandra Service 
announcement?

Alternatively if there is an appetite to set something in patreon or GitHub’s 
donation platform , it may be a good way to get the things we need funded based 
on what the community wants —- business driven demand.

Thoughts?

rahul.xavier.si...@gmail.com

http://cassandra.link
The Apache Cassandra Knowledge Base.
On Feb 3, 2020, 9:06 PM -0500, David Capwell , wrote:
> Following Mick's format =)
>
> ** Lack of trust (aka reliability)
>
> Mick said it best, but should also add that we have slow tests and tests
> which don't do anything. Effort is needed to improve our current tests and
> to make sure future tests are stable (cleaning up works, isolation, etc.);
> this is not a neglectable amount of work, nor work which can be done by a
> single person.
>
> ** Lack of resources (throughput and response)
>
> Our slowest unit tests are around 2 minutes (materialized views), our
> slowest dtests (not high resource) are around 30 minutes; given enough
> resources we could run unit in < 10 minutes and dtest in 30-60 minutes.
>
> There is also another thing to point out, testing is also a combinatorics
> problem; we support java 8/11 (more to come), vnode and no-vnode, security
> and no security, and the list goes on. Bugs are more likely to happen when
> two features interact, so it is important to test against many combinations.
>
> There is work going on in the community to add new kinds of tests (harry,
> diff, etc.); these tests require even more resources than normal tests.
>
> ** Difficulty in use
>
> Many people rely on CircleCI as the core CI for the project, but this has a
> few issues called out in other forms: the low resource version (free) is
> even more flaky than high (paid), and people get locked out (i have lost
> access twice so far, others have said the same).
>
> The thing which worries me the most is that new members to the project
> won't have the high resource CircleCI plan, nor do they really have access
> to Jenkins. This puts a burden on new authors where they wait 24+ hours to
> run the tests... or just not run them.
>
> ** Lack of visibility into quality
>
> This is two things for me: commit and pre-commit.
>
> For commit, this is more what Mick was referring to as "post-commit CI".
> There are a few questions I would like to know about our current tests
> (report most flaky tests, which sections of code cause the most failures,
> etc.); these are hard to answer at the moment .
>
> We don't have a good pre-commit story since it mostly relies on CircleCI.
> I find that some JIRAs link CircleCI and some don't. I find that if I
> follow the CircleCI link months later (to see if the build was stable
> pre-commit) that Circle fails to show the workflow.
>
> On Mon, Feb 3, 2020 at 3:42 PM Michael Shuler 
> wrote:
>
> > Only have a moment to respond, but Mick hit the higlights with
> > containerization, parallelization, these help solve cleanup, speed, and
> > cascading failures. Dynamic disposable slaves would be icing on that
> > cake, which may require a dedicated master.
> >
> > One more note on jobs, or more correctly unnecessary jobs - pipelines
> > have a `changeset` build condition we should tinker with. There is zero
> > reason to run a job with no actual code diff. For instance, I committed
> > to 2.1 this morning and merged `-s ours` nothing to the newer branches -
> > there's really no reason to run and take up valuable resources with no
> > actual diff changes.
> > https://jenkins.io/doc/book/pipeline/syntax/#built-in-conditions
> >
> > Michael
> >
> > On 2/3/20 3:45 PM, Nate McCall wrote:
> > > Mick, this is fantastic!
> > >
> > > I'll wait another day to see if anyone else chimes in. (Would also love
> > to
> > > hear from CassCI folks, anyone else really who has wrestled with this
> > even
> > > for internal forks).
> > >
> > > On Tue, Feb 4, 2020 at 10:37 AM Mick Semb Wever  wrote:
> > >
> > > > Nate, I leave it to you to forward what-you-chose to the board@'s
> > thread.
> > > >
> > > >
> > > > > Are there still troubles and what are they?
> > > >
> > > >
> > > > TL;DR
> > > > the ASF could provide the Cassandra community with an isolated
> > jenkins
> > > > installation: so that we can manage and control the Jenkins master, as
> > > > well as ensure all donated hardware for Jenkins agents are dedicated and
> > > > isolated to us.
> > > >
> > > >
> > > > The long writeup…
> > > >
> > > > For Cassandra's use of ASF's Jenkins I see the following problems.
> > > >
> > > > ** Lack of trust (aka reliability)
> > > >
> > > > The Jenkins agents re-use their workspaces, as opposed to using new
> > > > containers per test run, leading to broken agents, disks, git clones,
> > etc.
> > > > One broken test run, or a broken agent, too easily affects subsequent
> > test
> > > > executions.
> > > >
> > > > The 

Re: Ideas for removing unnecessary friction in contributing to docs/website.

2019-07-30 Thread Rahul Singh
Acknowledged Mick. I'll review that ticket.

rahul.xavier.si...@gmail.com

http://cassandra.link



On Tue, Jul 30, 2019 at 5:09 AM Mick Semb Wever  wrote:

>
> > The website should be the easiest thing for people to work on. Most of
> the
> > documentation should be easy as well. Not all documentation has a 1-1
> > correlation to code.
> >
> > The Website and Documentation are sibling artifacts in my opinion, but
> the
> > website shouldn't have a hard dependency on the core binaries.
>
>
> Thanks for raising the issue Rahul. The website generation process
> certainly needs some work.
>
> Note, that the correlation is not 1-1. Today the website can generate and
> link to different versions of the code documentation.
>
> For example:
>  - https://cassandra.apache.org/doc/latest/
>  - https://cassandra.apache.org/doc/4.0/
>  - https://cassandra.apache.org/doc/3.11.3
>  - https://cassandra.apache.org/doc/3.11
>
> A patch to better expose and link to these is in CASSANDRA-14954. It is
> waiting on a reviewer.
>
> But, by the same notion, there's nothing preventing there being a 1-0
> relationship when regenerating the website docs.
>
> Some of the headache will the website generation was listed by Michael
> here:
> https://issues.apache.org/jira/browse/CASSANDRA-13907?focusedCommentId=16211161=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16211161
>
> Generation via docker-compose was meant as an incremental improvement on
> these headaches, done in CASSANDRA-14972
> Being but an incremental step we were hoping it would continue to get
> improved. For example the html patch generated for even small website
> updates is massive and trashes the svn history (as Michael also pointed
> out).
>
> regards,
> Mick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Google Season of Docs 2019 for Apache Cassandra

2019-07-23 Thread Rahul Singh
Hey guys, the deadline to pick folks is today. I looked through the 7 that
were in the last spreadsheet that Andrew sent out and liked these
applicants in no particular order.

They had made an effort to propose a project. Others they had shown
experience in other community projects as well as API documentation.


Emmanuel Owusu Sackey :  Wants to make Apache Cassandra 4.0 docs . Helps us
get closer to release.
emmanuelcarefor...@gmail.com

Ajinkya Dubey :  Wants to make the Cassandra operators book.
ajinkya.511...@gmail.com

Deepak Vohra : Wants to help improve CQL/Nodetool docs.
dvohr...@yahoo.com

Saurav Malani :  Also liked this guy but I can only pick 3. ;(
sauravmala...@gmail.com

rahul.xavier.si...@gmail.com

http://cassandra.link



On Sun, Apr 7, 2019 at 3:11 AM Dinesh Joshi 
wrote:

> Hi all,
>
> I have updated the document here with more project ideas:
> https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+GSoD+2019+application
> <
> https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+GSoD+2019+application
> >
>
> If anybody has more ideas, please let me know. The deadline for the
> proposals are April 23rd.
>
> Thanks,
>
> Dinesh
>
> > On Apr 5, 2019, at 6:01 AM, Rahul Singh 
> wrote:
> >
> > I saw the email from Sharan. Who's lead on getting the application in for
> > the SOD? Thanks,
> >
> > rahul.xavier.si...@gmail.com
> >
> > http://cassandra.link
> >
> > I'm speaking at #DataStaxAccelerate, the world’s premiere
> #ApacheCassandra
> > conference, and I want to see you there! Use my code Singh50 for 50% off
> > your registration. www.datastax.com/accelerate
> >
> >
> > On Wed, Mar 20, 2019 at 1:09 AM Dinesh Joshi  >
> > wrote:
> >
> >> Thanks, Laxmikant. For those who have confluence accounts have edit
> >> permissions on the page.
> >>
> >> Dinesh
> >>
> >>> On Mar 19, 2019, at 3:07 AM, Laxmikant Upadhyay <
> laxmikant@gmail.com>
> >> wrote:
> >>>
> >>> Hi Dinesh,
> >>>
> >>> I am willing to help as well. Kindly add me to the group as well as i
> >> don't
> >>> have permission to edit the list.
> >>>
> >>> Regards,
> >>> Laxmikant
> >>>
> >>>
> >>>
> >>> On Thu, Mar 14, 2019 at 12:34 PM Dinesh Joshi
>  >>>
> >>> wrote:
> >>>
> >>>> Thanks for volunteering Stefan. I have added you.
> >>>>
> >>>> Dinesh
> >>>>
> >>>>> On Mar 13, 2019, at 11:55 PM, Stefan Miklosovic <
> >>>> stefan.mikloso...@instaclustr.com> wrote:
> >>>>>
> >>>>> Hi Dinesh,
> >>>>>
> >>>>> I was participant of Google summer of Code 2013 under JBoss so I can
> >> help
> >>>>> with this and with mentoring as I experienced it from the other side.
> >>>>>
> >>>>> I wanted to add myself into that list but I can not edit that 
> >> "Right
> >>>>> now everyone can view and edit this page if they've got the right
> space
> >>>>> permissions. "
> >>>>>
> >>>>> On Thu, 14 Mar 2019 at 17:28, Dinesh Joshi
>  >>>
> >>>>> wrote:
> >>>>>
> >>>>>> Thank you everyone for offering to help!
> >>>>>>
> >>>>>> I have put together a confluence page detailing the program and
> other
> >>>>>> information -
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+GSoD+2019+application
> >>>>>> <
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+GSoD+2019+application
> >>>>>>>
> >>>>>>
> >>>>>> If you do not have a confluence account, please create one. I think
> >> the
> >>>>>> PMCs have the ability to give you write permissions.
> >>>>>>
> >>>>>> Please feel free to add and modify the page. I think the most
> >> important
> >>>>>> thing right now is to get together project ideas.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Dinesh
> >>>>>>
> >>>&

Re: [DISCUSS] Moving chats to ASF's Slack instance

2019-07-18 Thread Rahul Singh
+1



rahul.xavier.si...@gmail.com

http://cassandra.link



On Tue, May 28, 2019 at 5:55 PM Jon Haddad  wrote:

> +1
>
> On Tue, May 28, 2019, 2:54 PM Joshua McKenzie 
> wrote:
>
> > +1 to switching over. One less comms client + history + searchability is
> > enough to get my vote easy.
> >
> > On Tue, May 28, 2019 at 5:52 PM Jonathan Ellis 
> wrote:
> >
> > > I agree.  This lowers the barrier to entry for new participants.  Slack
> > is
> > > probably two orders of magnitude more commonly used now than irc for sw
> > > devs and three for everyone else.  And then you have the
> quality-of-life
> > > features that you get out of the box with Slack and only with
> difficulty
> > in
> > > irc (history, search, file uploads...)
> > >
> > > On Tue, May 28, 2019 at 4:29 PM Nate McCall 
> wrote:
> > >
> > > > Hi Folks,
> > > > While working on ApacheCon last week, I had to get setup on ASF's
> slack
> > > > workspace. After poking around a bit, on a whim I created #cassandra
> > and
> > > > #cassandra-dev. I then invited a couple of people to come signup and
> > test
> > > > it out - primarily to make sure that the process was seamless for
> > non-ASF
> > > > account holders as well as committers, etc (it was).
> > > >
> > > > If you want to jump in, you can signup here:
> > > > https://s.apache.org/slack-invite
> > > >
> > > > That said, I think it's time we transition from IRC to Slack. Now, I
> > like
> > > > CLI friendly, straight forward tools like IRC as much as anyone, but
> > it's
> > > > been more than once recently where a user I've talked to has said one
> > of
> > > > two things regarding our IRC channels: "What's IRC?" or "Yeah, I
> don't
> > > > really do that anymore."
> > > >
> > > > In short, I think it's time to migrate. I think this will really just
> > > > consist of some communications to our lists and updating the site
> > > (anything
> > > > I'm missing?). The archives of IRC should just kind of persist for
> > > > posterity sake without any additional effort or maintenance. The
> > > > ASF-requirements are all configured already on the Slack workspace,
> so
> > I
> > > > think we are good there.
> > > >
> > > > Thanks,
> > > > -Nate
> > > >
> > >
> > >
> > > --
> > > Jonathan Ellis
> > > co-founder, http://www.datastax.com
> > > @spyced
> > >
> >
>


Re: Google Season of Docs 2019 for Apache Cassandra

2019-04-05 Thread Rahul Singh
I saw the email from Sharan. Who's lead on getting the application in for
the SOD? Thanks,

rahul.xavier.si...@gmail.com

http://cassandra.link

I'm speaking at #DataStaxAccelerate, the world’s premiere #ApacheCassandra
conference, and I want to see you there! Use my code Singh50 for 50% off
your registration. www.datastax.com/accelerate


On Wed, Mar 20, 2019 at 1:09 AM Dinesh Joshi 
wrote:

> Thanks, Laxmikant. For those who have confluence accounts have edit
> permissions on the page.
>
> Dinesh
>
> > On Mar 19, 2019, at 3:07 AM, Laxmikant Upadhyay 
> wrote:
> >
> > Hi Dinesh,
> >
> > I am willing to help as well. Kindly add me to the group as well as i
> don't
> > have permission to edit the list.
> >
> > Regards,
> > Laxmikant
> >
> >
> >
> > On Thu, Mar 14, 2019 at 12:34 PM Dinesh Joshi  >
> > wrote:
> >
> >> Thanks for volunteering Stefan. I have added you.
> >>
> >> Dinesh
> >>
> >>> On Mar 13, 2019, at 11:55 PM, Stefan Miklosovic <
> >> stefan.mikloso...@instaclustr.com> wrote:
> >>>
> >>> Hi Dinesh,
> >>>
> >>> I was participant of Google summer of Code 2013 under JBoss so I can
> help
> >>> with this and with mentoring as I experienced it from the other side.
> >>>
> >>> I wanted to add myself into that list but I can not edit that 
> "Right
> >>> now everyone can view and edit this page if they've got the right space
> >>> permissions. "
> >>>
> >>> On Thu, 14 Mar 2019 at 17:28, Dinesh Joshi  >
> >>> wrote:
> >>>
> >>>> Thank you everyone for offering to help!
> >>>>
> >>>> I have put together a confluence page detailing the program and other
> >>>> information -
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+GSoD+2019+application
> >>>> <
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/Apache+Cassandra+GSoD+2019+application
> >>>>>
> >>>>
> >>>> If you do not have a confluence account, please create one. I think
> the
> >>>> PMCs have the ability to give you write permissions.
> >>>>
> >>>> Please feel free to add and modify the page. I think the most
> important
> >>>> thing right now is to get together project ideas.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Dinesh
> >>>>
> >>>>> On Mar 13, 2019, at 10:19 AM, Horia Mocioi 
> wrote:
> >>>>>
> >>>>> I can also help.
> >>>>>
> >>>>> Regards,
> >>>>> Horia
> >>>>>
> >>>>> On Wed, Mar 13, 2019 at 3:12 PM Aaron Ploetz 
> >>>> wrote:
> >>>>>
> >>>>>> I’m willing to help as well.  Feel free to reach out!
> >>>>>>
> >>>>>> Aaron Ploetz
> >>>>>>
> >>>>>>> On Mar 12, 2019, at 8:37 PM, Rahul Singh <
> >> rahul.xavier.si...@gmail.com
> >>>>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Cool. I’m willing to help by taking sub sections of the overall
> >> effort.
> >>>>>> The docs need a lot of TLC. Thanks ,
> >>>>>>>
> >>>>>>> Rahul Singh
> >>>>>>> Principal Architect | 1.202.390.9200 | rahul.si...@datastax.com
> >>>>>>>> On Mar 12, 2019, 8:58 PM -0400, Ben Slater <
> >>>> ben.sla...@instaclustr.com>,
> >>>>>> wrote:
> >>>>>>>> Hi Dinesh
> >>>>>>>>
> >>>>>>>> Great idea. We should be able to find some Instaclustr people to
> >> help
> >>>>>> with
> >>>>>>>> technical input (Stefan has already put his hand up).
> >>>>>>>>
> >>>>>>>> I’m also happy to help with the application if that’s useful.
> >>>>>>>>
> >>>>>>>> Cheers
> >>>>>>>> Ben
> >>>>>>>>
> >>>>>>>> ---
> >>>>>>>>
> >>>>>>>>
> >>>>>>>

Re: API calls taking time

2019-03-14 Thread Rahul Singh
I don't think this is a dev group related question. This seems to be fore
the user group.

-- 
rahul.xavier.si...@gmail.com

http://cassandra.link
On Thu, Mar 14, 2019 at 7:10 PM Sundaramoorthy, Natarajan <
natarajan_sundaramoor...@optum.com> wrote:

> Few api calls interacting with 3 node Cassandra cluster taking lot of time
> what should I look at? Newbie to cassandra. Any db related parameters?
>
> Thanks
>
>
> This e-mail, including attachments, may include confidential and/or
> proprietary information, and may be used only by the person or entity
> to which it is addressed. If the reader of this e-mail is not the intended
> recipient or his or her authorized agent, the reader is hereby notified
> that any dissemination, distribution or copying of this e-mail is
> prohibited. If you have received this e-mail in error, please notify the
> sender by replying to this message and delete this e-mail immediately.
>


Re: Google Season of Docs 2019 for Apache Cassandra

2019-03-12 Thread Rahul Singh
Cool. I’m willing to help by taking sub sections of the overall effort. The 
docs need a lot of TLC. Thanks ,

Rahul Singh
Principal Architect | 1.202.390.9200 | rahul.si...@datastax.com
On Mar 12, 2019, 8:58 PM -0400, Ben Slater , wrote:
> Hi Dinesh
>
> Great idea. We should be able to find some Instaclustr people to help with
> technical input (Stefan has already put his hand up).
>
> I’m also happy to help with the application if that’s useful.
>
> Cheers
> Ben
>
> ---
>
>
> *Ben Slater*
> *Chief Product Officer*
>
>
> <https://www.facebook.com/instaclustr> <https://twitter.com/instaclustr>
> <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information. If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> On Wed, 13 Mar 2019 at 08:12, Dinesh Joshi 
> wrote:
>
> > Hi all,
> >
> > I came across GSoD 2019[1]. This is different from GSoC and focuses on
> > improving documentation for Open Source projects. I think this would be
> > beneficial for Cassandra especially with 4.0 coming up. However, working
> > with a technical writer will require a substantial time commitment from us
> > to bring them up to speed.
> >
> > Are there any volunteers to help guide the technical writer if Cassandra
> > is picked as a project?
> >
> > On a side note, we can put together the application on the Confluence
> > wiki. I will create a page and if anybody is interested in helping out with
> > putting together the application, please feel free to collaborate on it.
> >
> > Thanks,
> >
> > Dinesh
> >
> > [1] https://developers.google.com/season-of-docs/docs/timeline
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >


Re: Audit logging to tables.

2019-02-27 Thread Rahul Singh
I understand why you’d want it but it would add more data management to the 
database. Generally for logging you could consider putting into ELK and then it 
can be more queried on arbitrarily.
On Feb 27, 2019, 12:42 PM -0500, Dinesh Joshi , 
wrote:
> I don’t believe there is a plan to do it. If it were available in a table how 
> would that help you?
>
> Dinesh
>
> > On Feb 27, 2019, at 9:32 AM, Sagar  wrote:
> >
> > Hey All,
> >
> > While following some of the recent developments on Cassandra, I found the
> > new feature on Audit logging quite useful.
> >
> > I wanted to understand is there any plan of pushing the audit logs to a
> > table?
> >
> > Thanks!
> > Sagar.
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>


Re: How to identify inserts/deletes/updates from CDC data

2019-02-21 Thread Rahul Singh
Your questions seem more appropriate  for the user list because you are trying 
to solve for use cases.
On Feb 11, 2019, 6:48 AM -0500, Sreenivasulu Nallapati 
, wrote:
> Hi,
> I am parsing the commit log files and I could not able to segregate the
> inserts/deletes/updates from the mutations. Is there any way that we can
> identify the event that is executed from commit logs?
>
> Here is the partial code:
>
> public class CustomCommitLogReadHandler implements CommitLogReadHandler {
>
> private static final Logger LOGGER =
> LoggerFactory.getLogger(CustomCommitLogReadHandler.class);
>
> private final String keyspace;
> private final String table;
>
> public CustomCommitLogReadHandler(Map configuration) {
> keyspace = (String) YamlUtils.select(configuration,
> "cassandra.keyspace");
> table = (String) YamlUtils.select(configuration, "cassandra.table");
> }
>
> @Override
> public void handleMutation(Mutation mutation, int size, int
> entryLocation, CommitLogDescriptor descriptor) {
> LOGGER.debug("Handle mutation started...");
> for (PartitionUpdate partitionUpdate :
> mutation.getPartitionUpdates()) {
> process(partitionUpdate);
> }
> LOGGER.debug("Handle mutation finished...");
> }
>
> @SuppressWarnings("unchecked")
> private void process(Partition partition) {
> LOGGER.debug("Process method started...");
> if (!partition.metadata().ksName.equals(keyspace)) {
> LOGGER.debug("Keyspace should be '{}' but is '{}'.", keyspace,
> partition.metadata().ksName);
> return;
> }
> if (!partition.metadata().cfName.equals(table)) {
> LOGGER.debug("Table should be '{} but is '{}'.", table,
> partition.metadata().cfName);
> return;
> }
> String key = getKey(partition);
> JSONObject obj = new JSONObject();
>
>
> Thanks
> Sreeni


Re: Capturing all events from a given timestamp

2019-02-21 Thread Rahul Singh
If you streamed CDC into Cassandra via something like Kafka Connect, then yes.
On Feb 10, 2019, 7:00 PM -0500, Sreenivasulu Nallapati 
, wrote:
> Hi,
>
> I have a requirement to be capture any changes to tables, inserts deletes
> or updates from a given timestamp.
> However, with CDC enabled, I want to know if there is yet a way to read the
> data inserts,updates or deletes to a table through CQL. I want to know if
> it is possible to read the changes using CQL. If yes, how?
>
>
> Please advise.
>
> Thanks.
> Sreeni


Re: SEDA, queues, and a second lower-priority queue set

2019-01-16 Thread Rahul Singh
I understand the goal.

Thinking in this direction , multiple queues make sense if there is enough 
processing power / multiple cores and memory. There is some over head involved 
to determine priority and send to the proper queue / routing.

I would say that having the ability to addition queues may bring more potential 
throughput in addition to your priority segregation. Would you think that we 
would need multiple queues for every one of the TPs?

Whistle thinking about another problem, I thought about secondary queues to 
help with this so that the additional computation wouldn’t affect the main 
function. Anytime I think about another queue, it requires more coordination 
and metadata that needs to be managed. May as well as variable number of queues.

There’s both a strength and a weakness to one queue . When adding another 
process it makes “complicated.”

Great thoughts!

Rahul Singh
Principal Architect | 1.202.390.9200 | rahul.si...@datastax.com
On Jan 16, 2019, 3:09 PM -0600, Carl Mueller 
, wrote:
> additionally, a certain number of the threads in each stage could be
> restricted from serving the low-priority queues at all, say 8/32 or 16/32
> threads, to further ensure processing availability to the higher-priority
> tasks.
>
> On Wed, Jan 16, 2019 at 3:04 PM Carl Mueller 
> wrote:
>
> > At a theoretical level assuming it could be implemented with a magic wand,
> > would there be value to having a dual set of queues/threadpools at each of
> > the SEDA stages inside cassandra for a two-tier of priority? Such that you
> > could mark queries that return pages and pages of data as lower-priority
> > while smaller single-partition queries could be marked/defaulted as normal
> > priority, such that the lower-priority queues are only served if the normal
> > priority queues are empty?
> >
> > I suppose rough equivalency to this would be dual-datacenter with an
> > analysis cluster to serve the "slow" queries and a frontline one for the
> > higher priority stuff.
> >
> > However, it has come up several times that I'd like to run a one-off
> > maintenance job/query against production that could not be easily changed
> > (can't just throw up a DC), and while I can do app-level throttling with
> > some pain and sweat, it would seem something like this could do
> > lower-priority work in a somewhat-loaded cluster without impacting the main
> > workload.
> >
> >
> >


Re: [DISCUSS] releasing next 3.0 & 3.11

2019-01-16 Thread Rahul Singh
I’m all for it. Apart from general testing with an existing app, do we have 
automation to bring up / run stress / tear down the cluster?

I have a few endurance tests I can run written on Gatling.

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Jan 16, 2019, 1:05 PM -0600, Jonathan Haddad , wrote:
> Ping on this.
>
> On Mon, Jan 7, 2019 at 5:58 PM Michael Shuler 
> wrote:
>
> > No problem, thanks for asking :)
> >
> > Michael
> >
> > On 1/7/19 6:20 PM, Jonathan Haddad wrote:
> > > It's been 5 months and 30+ bug fixes to each branch.
> > >
> > > Here's the two changelogs:
> > >
> > > https://github.com/apache/cassandra/blob/cassandra-3.0/CHANGES.txt
> > > https://github.com/apache/cassandra/blob/cassandra-3.11/CHANGES.txt
> > >
> > > How's everyone feel about getting a release out this week / early next
> > > week? Some of these bugs are show stoppers, causing OOMs and data loss.
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade


Re: Built in trigger: double-write for app migration

2018-10-18 Thread Rahul Singh
Trigger based has worked for us in the past to get once only output of what’s 
happened - pushing this to Kafka and using Kafka Connect allowed to then direct 
the stream to to other endpoints.

CDC based streaming has the issue of duplicates which are technically fine if 
you don’t care that much about repeat changes coming from replicas.

I agree with Ben. If the goal is just to move a key space from one cluster to 
another that is active and can’t go down, his method will work for sure.

Also, is there a specific reason you need to split the cluster? Why not just 
have another DC and keep it part of the cluster? Do you have more than a 
hundred tables?


Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Oct 18, 2018, 4:35 PM -0400, Ben Slater , wrote:
> I might be missing something but we’ve done this operation on a few
> occasions by:
> 1) Commission the new cluster and join it to the existing cluster as a 2nd
> DC
> 2) Replicate just the keyspace that you want to move to the 2nd DC
> 3) Make app changes to read moved tables from 2nd DC
> 4) Change keyspace definition to remove moved keyspace from first DC
> 5) Split the 2DCs into separate clusters (sever network connections, change
> seeds)
>
> If it’s just a table you moving and not a whole keyspace then you can skip
> step 4 and drop the unneeded tables from either side after splitting. This
> might mean the new cluster needs to be temporarily bigger than the
> end-state during the migration process.
>
> Cheers
> Ben
>
> On Fri, 19 Oct 2018 at 07:04 Jeff Jirsa  wrote:
>
> > Could be done with CDC
> > Could be done with triggers
> > (Could be done with vtables — double writes or double reads — if they were
> > extended to be user facing)
> >
> > Would be very hard to generalize properly, especially handling failure
> > cases (write succeeds in one cluster/table but not the other) which are
> > often app specific
> >
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Oct 18, 2018, at 6:47 PM, Jonathan Ellis  wrote:
> > >
> > > Isn't this what CDC was designed for?
> > >
> > > https://issues.apache.org/jira/browse/CASSANDRA-8844
> > >
> > > On Thu, Oct 18, 2018 at 10:54 AM Carl Mueller
> > >  wrote:
> > >
> > > > tl;dr: a generic trigger on TABLES that will mirror all writes to
> > > > facilitate data migrations between clusters or systems. What is
> > necessary
> > > > to ensure full write mirroring/coherency?
> > > >
> > > > When cassandra clusters have several "apps" aka keyspaces serving
> > > > applications colocated on them, but the app/keyspace bandwidth and size
> > > > demands begin impacting other keyspaces/apps, then one strategy is to
> > > > migrate the keyspace to its own dedicated cluster.
> > > >
> > > > With backups/sstableloading, this will entail a delay and therefore a
> > > > "coherency" shortfall between the clusters. So typically one would
> > employ a
> > > > "double write, read once":
> > > >
> > > > - all updates are mirrored to both clusters
> > > > - writes come from the current most coherent.
> > > >
> > > > Often two sstable loads are done:
> > > >
> > > > 1) first load
> > > > 2) turn on double writes/write mirroring
> > > > 3) a second load is done to finalize coherency
> > > > 4) switch the app to point to the new cluster now that it is coherent
> > > >
> > > > The double writes and read is the sticking point. We could do it at the
> > app
> > > > layer, but if the app wasn't written with that, it is a lot of testing
> > and
> > > > customization specific to the framework.
> > > >
> > > > We could theoretically do some sort of proxying of the java-driver
> > somehow,
> > > > but all the async structures and complex interfaces/apis would be
> > difficult
> > > > to proxy. Maybe there is a lower level in the java-driver that is
> > possible.
> > > > This also would only apply to the java-driver, and not
> > > > python/go/javascript/other drivers.
> > > >
> > > > Finally, I suppose we could do a trigger on the tables. It would be
> > really
> > > > nice if we could add to the cassandra toolbox the basics of a write
> > > > mirroring trigger that could be ac

Re: Implicit Casts for Arithmetic Operators

2018-10-02 Thread Rahul Singh
+1 on Postgres approach. In the last 5 years I’ve seen people move from Oracle 
and SQL server to some variant of Cassandra or Postgres and other new tech is 
also more likely to support Postgres (Cockroach..)

I don’t care either way. It really depends on what you are storing.

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Oct 2, 2018, 11:11 AM -0700, Jonathan Haddad , wrote:
> Thanks for bringing this up, it definitely needs to be discussed.
>
> Last surprise is difficult here, since all major databases have their own
> way of doing things and people will just assume that their way is the right
> way. On that note, some people will be surprised no matter what we do.
>
> I'd rather avoid the pitfalls of returning incorrect results, so either
> option 2 or 3 sound reasonable, but leaning towards the Postgres approach
> of always returning a decimal for those cases.
>
> Jon
>
>
>
> On Tue, Oct 2, 2018 at 10:54 AM Benedict Elliott Smith 
> wrote:
>
> > I agree, in broad strokes at least. Interested to hear others’ positions.
> >
> >
> >
> > > On 2 Oct 2018, at 16:44, Ariel Weisberg  wrote:
> > >
> > > Hi,
> > >
> > > I think overflow and the role of widening conversions are pretty linked
> > so I'll continue to inject that into this discussion. Also overflow is much
> > worse since most applications won't be impacted by a loss of precision when
> > an expression involves an int and float, but will care quite a bit if they
> > get some nonsense wrapped number in an integer only expression.
> > >
> > > For VoltDB in practice we didn't run into issues with applications not
> > making progress due to exceptions with real data due to the widening
> > conversions. The range of double and long are pretty big and that hides
> > wrap around/infinity.
> > >
> > > I think the proposal of having all operations return a decimal is
> > attractive in that these expressions always result in a consistent type.
> > Two pain points might be whether client languages have decimal support and
> > whether there is a performance issue? The nice thing about always returning
> > decimal is we can sidestep the issue of overflow.
> > >
> > > I would start with seeing if that's acceptable, and if it isn't then
> > look at other approaches like returning a variety of types such when doing
> > int + int return a bigint or int + float return a double.
> > >
> > > If we take an approach that allows overflow the ideal end state IMO
> > would be to get all users to run Cassandra in way that overflow results in
> > an error even in the context of aggregation. The road to get there is
> > tricky, but maybe start by having it as an opt in tunable in
> > cassandra.yaml. I don't know how/when we could ever change that as a
> > default and it's unfortunate having an option like this that 99% won't know
> > they should flip.
> > >
> > > It seems like having the default throw on overflow is not as bad as it
> > sounds if you do the widening conversions since most people won't run into
> > them. The change in the column types of results sets actually sounds worse
> > if we want to also improve aggregrations. Many applications won't notice if
> > the client library abstracts that away, but I think there are still cases
> > where people would notice the type changing.
> > >
> > > Ariel
> > >
> > > > On Tue, Oct 2, 2018, at 11:09 AM, Benedict Elliott Smith wrote:
> > > > This (overflow) is an excellent point, but this also affects
> > > > aggregations which were introduced a long time ago. They already
> > > > inherit Java semantics for all of the relevant types (silent wrap
> > > > around). We probably want to be consistent, meaning either changing
> > > > aggregations (which incurs a cost for changing API) or continuing the
> > > > java semantics here.
> > > >
> > > > This is why having these discussions explicitly in the community before
> > > > a release is so critical, in my view. It’s very easy for these
> > semantic
> > > > changes to go unnoticed on a JIRA, and then ossify.
> > > >
> > > >
> > > > > On 2 Oct 2018, at 15:48, Ariel Weisberg  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I think we should decide based on what is least surprising as you
> > mention, but isn't overr

Re: Proposing an Apache Cassandra Management process

2018-09-30 Thread Rahul Singh
Perfection is the enemy of the good enough. All if not most informed open 
source users understand that the tar they are downloading is “unsupported.”

Most of the blog posts people read or the documentation they have is in place 
of tools. Open source software let alone Tooling even a nascent version comes 
with a degree of uncertainty.

If the question is for me: I would rather use something that exists than 
reinvent the wheel. The same way I’ll use “contrib” packages in any system just 
to see if it works.

Example: While working on on a Solr / Kafka / Cassandra integration project we 
must have used a few different “Kafka Connect” variations before settling on a 
solution which was a combination of an existing sink connector and created a 
source connector because what was out there “meh”.

If we had scoffed off “unsupported” packages we would have spent more time 
making something than delivering value. Technology exists to serve business and 
people’s lives not technology itself.

There is a level of discernment that comes with experience as to what works and 
what doesn’t and what is good and what isn’t. The least we can do is help the 
user community know the difference : as Apache does at a higher level.


Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Sep 29, 2018, 5:29 PM -0400, sankalp kohli , wrote:
> Thanks Dinesh for looking at the tools and thanks Mick for listing them
> down.
> Based on Dinesh response, is it accurate to say that for bulk
> functionality, we have evaluated the tools listed by the community? If
> anything is missed we should discuss it as we want to make sure we looked
> at all aspects before implementation starts.
>
> On Sat, Sep 29, 2018 at 1:19 PM Dinesh Joshi 
> wrote:
>
> > > On Sep 29, 2018, at 12:31 PM, Rahul Singh 
> > wrote:
> > >
> > > All of Apache is a patchwork of tools. All of Open Source is a patchwork
> > of tools. All of Linux is a patchwork of tools.
> > >
> > > What works, works.
> >
> > So there isn't a way to make it any better? You would prefer using an
> > unsupported tool vs something that worked out of the box & was well tested?
> >
> > Dinesh
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >


Re: Proposing an Apache Cassandra Management process

2018-09-29 Thread Rahul Singh
Mick,

I think there is someone who’s maintaining CTOP now.. I added it to my 
admin/monitoring list on https://github.com/Anant/awesome-cassandra

Rahul
On Sep 29, 2018, 3:20 PM -0400, Dinesh Joshi , 
wrote:
> > On Sep 27, 2018, at 7:35 PM, Mick Semb Wever  wrote:
> >
> > Reaper,
>
> I have looked at this already.
>
> > Priam,
>
> I have looked at this already.
>
> > Marcus Olsson's offering,
>
> This isn't OSS.
>
> > CStar,
>
> I have looked at this already.
>
> > OpsCenter.
>
> Latest release is only compatible with DSE and not Apache Cassandra[1]
>
> > Then there's a host of command line tools like:
> > ic-tools,
> > ctop (was awesome, but is it maintained anymore?),
> > tablesnap.
>
> These are interesting tools and I don't think they do what we're interested 
> in doing.
>
> > And maybe it's worth including the diy approach people take… 
> > pssh/dsh/clusterssh/mussh/fabric, etc
>
> What's the point? You can definitely add this to the website as helpful 
> documentation.
>
> The proposal in the original thread was to create something that is supported 
> by the Apache Cassandra project learning from the tooling we've all built 
> over the years. The fact that everyone has a sidecar or their own internal 
> tooling is an indicator that the project has room to grow. It will certainly 
> help this project be more user friendly (at least for operators).
>
> I, as a user and a developer, do not want to use a patchwork of disparate 
> tools. Does anybody oppose this on technical grounds? If you do, please help 
> me understand why would you prefer using a patchwork of tools vs something 
> that is part of the Cassandra project?
>
> Thanks,
>
> Dinesh
>
> [1] https://docs.datastax.com/en/opscenter/6.0/opsc/opscPolicyChanges.html
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>


Re: Proposing an Apache Cassandra Management process

2018-09-29 Thread Rahul Singh
All of Apache is a patchwork of tools. All of Open Source is a patchwork of 
tools. All of Linux is a patchwork of tools.

What works, works.

If it wasn’t a patchwork, open source wouldn’t be what it is. Period.

OSS Cassandra can use all the help it can get — in terms of documentation, and 
tooling. Personally I think starting out with documenting it for people is a 
good start.

Then maybe if there’s a thought later, a “distribution” approach can be taken.

When you think about the application developer or administrator, they are 
different from who we are. They don’t give a shit about “patchwork” or 
perfection, they just care to get their job done and make shit happen.

Who cares if one tool is in Java, and another is in Kotlin, and another is a 
shell script that uses PSSH? Something that works is better than nothing.


On Sep 29, 2018, 3:20 PM -0400, Dinesh Joshi , 
wrote:
> > On Sep 27, 2018, at 7:35 PM, Mick Semb Wever  wrote:
> >
> > Reaper,
>
> I have looked at this already.
>
> > Priam,
>
> I have looked at this already.
>
> > Marcus Olsson's offering,
>
> This isn't OSS.
>
> > CStar,
>
> I have looked at this already.
>
> > OpsCenter.
>
> Latest release is only compatible with DSE and not Apache Cassandra[1]
>
> > Then there's a host of command line tools like:
> > ic-tools,
> > ctop (was awesome, but is it maintained anymore?),
> > tablesnap.
>
> These are interesting tools and I don't think they do what we're interested 
> in doing.
>
> > And maybe it's worth including the diy approach people take… 
> > pssh/dsh/clusterssh/mussh/fabric, etc
>
> What's the point? You can definitely add this to the website as helpful 
> documentation.
>
> The proposal in the original thread was to create something that is supported 
> by the Apache Cassandra project learning from the tooling we've all built 
> over the years. The fact that everyone has a sidecar or their own internal 
> tooling is an indicator that the project has room to grow. It will certainly 
> help this project be more user friendly (at least for operators).
>
> I, as a user and a developer, do not want to use a patchwork of disparate 
> tools. Does anybody oppose this on technical grounds? If you do, please help 
> me understand why would you prefer using a patchwork of tools vs something 
> that is part of the Cassandra project?
>
> Thanks,
>
> Dinesh
>
> [1] https://docs.datastax.com/en/opscenter/6.0/opsc/opscPolicyChanges.html
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>


Re: [VOTE] Development Approach for Apache Cassandra Management process

2018-09-13 Thread Rahul Singh
A. +1.
Bias for action. Use something that works. Iterate.

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Sep 12, 2018, 8:23 PM -0400, Jeff Beck , wrote:
> a. +1 to use some existing project.
>
> But I see no reason not to keep discussion going.
>
> Jeff
>
>
> On Thu, Sep 13, 2018, 7:37 AM Sankalp Kohli  wrote:
>
> > The link to the document is available in the other thread. Comparisons are
> > available in other thread as well.
> >
> > > On Sep 12, 2018, at 16:29, Mick Semb Wever  wrote:
> > >
> > >
> > > > I am hoping all the folks who are saying we should not vote will drive
> > the
> > > > other thread. Also note that there is consensus about doing a side car
> > but
> > > > no consensus on which approach to take. I hope not deciding which
> > approach
> > > > is not a poison pill for side car!!
> > >
> > >
> > > Call me pedantic, but I saw the consensus as favouring a side-car over
> > something in tree. That's not a consensus on doing a side-car, as that was
> > not an option on offer.
> > >
> > > There's certainly interest in a side-car that warrants documenting
> > further on comparisons and investigations, IMHO.
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >


Re: Reaper as cassandra-admin

2018-08-27 Thread Rahul Singh
I’d be interested in contributing as well. I’ve been working on a skew review / 
diagnostics tool which feeds off of cfstats/tbstats data (from TXT output to 
CSV to conditionally formatted excel ) and am starting to store data in C* and 
wrap a React based grid on it.

I have backlogged forking the reaper core / UI (api / front end ). It has a lot 
of potential — specifically if the API / Services / UI could be modularized and 
leverage IoC to add functionality via configuration not code.

There are a lot good conventions in both open source and commercial projects 
out there for web based administration tools. The most successful ones do the 
basics related to their tool well and leave the rest to other systems.

The pitfall I don’t want the valuable talent to enter in this group is to 
reinvent the wheel on things that other tools do well and focus on what Admins/ 
Architects/ Developers need. Eg. if Prometheus and Grafana are good for stats, 
keep it - just make it easier to facilitate or compose in Docker.

Another example : There are ideas I had including a data / browser / 
interactive query interface — but Redash or Zeppelin do a good job for the time 
being and no matter how much time I spend on it I probably wouldn’t want make a 
better one.

Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Aug 27, 2018, 9:22 PM -0400, Mick Semb Wever , wrote:
>
> > Is there a roadmap or release schedule, so we can get an idea of what
> > the Reaper devs have planned for it?
>
>
> Hi Murukesh,
> there's no roadmap per se, as it's open-source and it's the contributions as 
> they come that make it.
>
> What I know that's in progress or been discussed is:
> - more thorough upgrade tests,
> - support for diagnostic events (C* 4.0),
> - more task/operations: compactions, cleanups, sstableupgrades, etc etc,
> - more metrics (better visualisations, for example see the newly added 
> streaming),
> - making the scheduler repair-agnostic (so any task/operation can be 
> scheduled), and
> - making task/operations not based on jmx calls (preparing for non-jmx type 
> tasks).
>
> regards,
> Mick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>


Re: Side Car New Repo vs not

2018-08-21 Thread Rahul Singh
+1 for separate repo. Especially on git. Maybe make it a submodule.

Rahul
On Aug 21, 2018, 3:33 PM -0500, Stefan Podkowinski , wrote:
> I'm also currently -1 on the in-tree option.
>
> Additionally to what Aleksey mentioned, I also don't see how we could
> make this work with the current build and release process. Our scripts
> [0] for creating releases (tarballs and native packages), would need
> significant work to add support for an independent side-car. Our ant
> based build process is also not a great start for adding new tasks, let
> alone integrating other tool chains for web components for a potential UI.
>
> [0] https://git-wip-us.apache.org/repos/asf?p=cassandra-builds.git
>
>
> On 21.08.18 19:20, Aleksey Yeshchenko wrote:
> > Sure, allow me to elaborate - at least a little bit. But before I do, just 
> > let me note that this wasn’t a veto -1, just a shorthand for “I don’t like 
> > this option”.
> >
> > It would be nice to have sidecar and C* version and release cycles fully 
> > decoupled. I know it *can* be done when in-tree, but the way we vote on 
> > releases with tags off current branches would have to change somehow. 
> > Probably painfully. It would be nice to be able to easily enforce freezes, 
> > like the upcoming one, on the whole C* repo, while allowing feature 
> > development on the sidecar. It would be nice to not have sidecar commits in 
> > emails from commits@ mailing list. It would be nice to not have C* CI 
> > trigger necessarily on sidecar commits. Groups of people working on the two 
> > repos will mostly be different too, so what’s the point in sharing the repo?
> >
> > Having an extra repo with its own set of branches is cheap and easy - we 
> > already do that with dtests. I like cleanly separated things when coupling 
> > is avoidable. As such I would prefer the sidecar to live in a separate new 
> > repo, while still being part of the C* project.
> >
> > —
> > AY
> >
> > On 21 August 2018 at 17:06:39, sankalp kohli (kohlisank...@gmail.com) wrote:
> >
> > Hi Aleksey,
> > Can you please elaborate on the reasons for your -1? This
> > way we can make progress towards any one approach.
> > Thanks,
> > Sankalp
> >
> > On Tue, Aug 21, 2018 at 8:39 AM Aleksey Yeshchenko 
> > wrote:
> >
> > > FWIW I’m strongly -1 on in-tree approach, and would much prefer a separate
> > > repo, dtest-style.
> > >
> > > —
> > > AY
> > >
> > > On 21 August 2018 at 16:36:02, Jeremiah D Jordan (
> > > jeremiah.jor...@gmail.com) wrote:
> > >
> > > I think the following is a very big plus of it being in tree:
> > > > > * Faster iteration speed in general. For example when we need to add a
> > > > > new
> > > > > JMX endpoint that the sidecar needs, or change something from JMX to a
> > > > > virtual table (e.g. for repair, or monitoring) we can do all changes
> > > > > including tests as one commit within the main repository and don't
> > > have
> > > > > to
> > > > > commit to main repo, sidecar repo,
> > >
> > > I also don’t see a reason why the sidecar being in tree means it would not
> > > work in a mixed version cluster. The nodes themselves must work in a mixed
> > > version cluster during a rolling upgrade, I would expect any management
> > > side car to operate in the same manor, in tree or not.
> > >
> > > This tool will be pretty tightly coupled with the server, and as someone
> > > with experience developing such tightly coupled tools, it is *much* easier
> > > to make sure you don’t accidentally break them if they are in tree. How
> > > many times has someone updated some JMX interface, updated nodetool, and
> > > then moved on? Breaking all the external tools not in tree, without
> > > realizing it. The above point about being able to modify interfaces and 
> > > the
> > > side car in the same commit is huge in terms of making sure someone 
> > > doesn’t
> > > inadvertently break the side car while fixing something else.
> > >
> > > -Jeremiah
> > >
> > >
> > > > On Aug 21, 2018, at 10:28 AM, Jonathan Haddad 
> > > wrote:
> > > >
> > > > Strongly agree with Blake. In my mind supporting multiple versions is
> > > > mandatory. As I've stated before, we already do it with Reaper, I'd
> > > > consider it a major misstep if we couldn't support multiple with the
> > > > project - provided admin tool. It's the same reason dtests are separate
> > > -
> > > > they work with multiple versions.
> > > >
> > > > The number of repos does not affect distribution - if we want to ship
> > > > Cassandra with the admin / repair tool (we should, imo), that can be
> > > part
> > > > of the build process.
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Aug 20, 2018 at 9:21 PM Blake Eggleston 
> > > > wrote:
> > > >
> > > > > If the sidecar is going to be on a different release cadence, or
> > > support
> > > > > interacting with mixed mode clusters, then it should definitely be in
> > > a
> > > > > separate repo. I don’t even know how branching and merging would work
> > > in a
> > > > > repo that 

Re: Proposing an Apache Cassandra Management process

2018-08-17 Thread Rahul Singh
I understand the issues of managing different versions of two correlated 
components — but it is possible to create unit tests with core components of 
both. It takes more effort but it is possible.

That being said, in my experience using Reaper and in the DataStax distribution 
, using OpsCenter , I prefer a separate project that is loosely tied to the 
system and not connected at the hips. Whenever there is an update to Reaper or 
OpsCenter, I can always pull it down and test it before rolling it out - and 
this is much more frequently than if I were rolling out updates to a C* cluster.


Rahul
On Aug 17, 2018, 9:41 AM -0700, Jonathan Haddad , wrote:
> Speaking from experience (Reaper), I can say that developing a sidecar
> admin / repair tool out of tree that's compatible with multiple versions
> really isn't that big of a problem. We've been doing it for a while now.
>
> https://github.com/thelastpickle/cassandra-reaper/blob/master/.travis.yml
>
> On Fri, Aug 17, 2018 at 9:39 AM Joseph Lynch  wrote:
>
> > While I would love to use a different build system (e.g. gradle) for the
> > sidecar, I agree with Dinesh that a separate repo would make sidecar
> > development much harder to verify, especially on the testing and
> > compatibility front. As Jeremiah mentioned we can always choose later to
> > release the sidecar artifact separately or more frequently than the main
> > server regardless of repo choice and as per Roopa's point having a separate
> > release artifact (jar or deb/rpm) is probably a good idea until the default
> > Cassandra packages don't automatically stop and start Cassandra on install.
> >
> > While we were developing the repair scheduler in a separate repo we had a
> > number of annoying issues that only started surfacing once we started
> > merging it directly into the trunk tree:
> > 1. It is hard to compile/test against unreleased versions of Cassandra
> > (e.g. the JMX interfaces changed a lot with 4.x, and it was pretty tricky
> > to compile against that as the main project doesn't release nightly
> > snapshots or anything like that, so we had to build local trunk jars and
> > then manually dep them).
> > 2. Integration testing and cross version compatibility is extremely hard.
> > The sidecar is going to be involved in multi node coordination (e.g.
> > monitoring, metrics, maintenance) and will be tightly coupled to JMX
> > interface choices in the server and trying to make sure that it all works
> > with multiple versions of Cassandra is much harder if it's a separate repo
> > that has to have a mirroring release cycle to Cassandra. It seems much
> > easier to have it in tree and just be like "the in tree sidecar is tested
> > against that version of Cassandra". Every time we cut a Cassandra server
> > branch the sidecar branches with it.
> >
> > We experience these pains all the time with Priam being in a separate repo,
> > where every time we support a new Cassandra version a bunch of JMX
> > endpoints break and we have to refactor the code to either call JMX methods
> > by string or cut a new Priam branch. A separate artifact is pretty
> > important, but a separate repo just allows drifts. Furthermore from the
> > Priam experience I also don't think it's realistic to say one version of a
> > sidecar artifact can actually support multiple versions.
> >
> > -Joey
> >
> > On Fri, Aug 17, 2018 at 12:00 PM Jeremiah D Jordan 
> > wrote:
> >
> > > Not sure why the two things being in the same repo means they need the
> > > same release process. You can always do interim releases of the
> > management
> > > artifact between server releases, or even have completely decoupled
> > > releases.
> > >
> > > -Jeremiah
> > >
> > > > On Aug 17, 2018, at 10:52 AM, Blake Eggleston 
> > > wrote:
> > > >
> > > > I'd be more in favor of making it a separate project, basically for all
> > > the reasons listed below. I'm assuming we'd want a management process to
> > > work across different versions, which will be more awkward if it's in
> > tree.
> > > Even if that's not the case, keeping it in a different repo at this point
> > > will make iteration easier than if it were in tree. I'd imagine (or at
> > > least hope) that validating the management process for release would be
> > > less difficult than the main project, so tying them to the Cassandra
> > > release cycle seems unnecessarily restrictive.
> > > >
> > > >
> > > > On August 17, 2018 at 12:07:18 AM, Dinesh Joshi (
> > dinesh.jo...@yahoo.com.invalid)
> > > wrote:
> > > >
> > > > > On Aug 16, 2018, at 9:27 PM, Sankalp Kohli 
> > > wrote:
> > > > >
> > > > > I am bumping this thread because patch has landed for this with repair
> > > functionality.
> > > > >
> > > > > I have a following proposal for this which I can put in the JIRA or
> > doc
> > > > >
> > > > > 1. We should see if we can keep this in a separate repo like Dtest.
> > > >
> > > > This would imply a looser coupling between the two. Keeping things
> > > in-tree is my 

Re: Recommendation: running Cassandra in containers

2018-06-28 Thread Rahul Singh
Docker containers work - and so they will work with kubernetes 
PersistentVolumes. Nothing would beat bare metal , CPU, and Ram in terms of 
speed but containerization / kubernetization makes sense if you want to totally 
automate infrastructure as code across clouds.

Rahul
On Jun 27, 2018, 6:23 PM -0500, Nate McCall , wrote:
> On Tue, Jun 26, 2018 at 3:12 AM, Pierre Mavro  wrote:
> > Hi,
> >
> > Regarding the limits in linux cgroups (as used in Kubernetes/Mesos), I
> > was wondering if there are any recommendation (didn't find anything on
> > this topic).
> >
> > In general on Java 8 running instances, it is advised to run those
> > options to take into account cgroup environment:
> >
> > -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap
> >
> > Other tuning options for this exists (ex: MaxRAMFraction), I was
> > wondering if there is any information somewhere about it.
> >
>
> Not that I've seen.
>
> I think there is an opportunity for someone to do a thorough
> investigation and writeup about it (particularly given y'alls
> experience with C* over the past few years :)
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>


Re: Evolving the client protocol

2018-04-19 Thread Rahul Singh
Sounds interesting. Could 80% of what we gain with a “shard” approach be 
achieved via Mesos to wrap a stateful service? Technically it’s “Sharding” the 
whole machine and better utilizing resources.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Apr 19, 2018, 1:23 PM -0500, sankalp kohli <kohlisank...@gmail.com>, wrote:
> If you donate Thread per core to C*, I am sure someone can help you review
> it and get it committed.
>
> On Thu, Apr 19, 2018 at 11:15 AM, Ben Bromhead <b...@instaclustr.com> wrote:
>
> > Re #3:
> >
> > Yup I was thinking each shard/port would appear as a discrete server to the
> > client.
> >
> > If the per port suggestion is unacceptable due to hardware requirements,
> > remembering that Cassandra is built with the concept scaling *commodity*
> > hardware horizontally, you'll have to spend your time and energy convincing
> > the community to support a protocol feature it has no (current) use for or
> > find another interim solution.
> >
> > Another way, would be to build support and consensus around a clear
> > technical need in the Apache Cassandra project as it stands today.
> >
> > One way to build community support might be to contribute an Apache
> > licensed thread per core implementation in Java that matches the protocol
> > change and shard concept you are looking for ;P
> >
> >
> > On Thu, Apr 19, 2018 at 1:43 PM Ariel Weisberg <ar...@weisberg.ws> wrote:
> >
> > > Hi,
> > >
> > > So at technical level I don't understand this yet.
> > >
> > > So you have a database consisting of single threaded shards and a socket
> > > for accept that is generating TCP connections and in advance you don't
> > know
> > > which connection is going to send messages to which shard.
> > >
> > > What is the mechanism by which you get the packets for a given TCP
> > > connection delivered to a specific core? I know that a given TCP
> > connection
> > > will normally have all of its packets delivered to the same queue from
> > the
> > > NIC because the tuple of source address + port and destination address +
> > > port is typically hashed to pick one of the queues the NIC presents. I
> > > might have the contents of the tuple slightly wrong, but it always
> > includes
> > > a component you don't get to control.
> > >
> > > Since it's hashing how do you manipulate which queue packets for a TCP
> > > connection go to and how is it made worse by having an accept socket per
> > > shard?
> > >
> > > You also mention 160 ports as bad, but it doesn't sound like a big number
> > > resource wise. Is it an operational headache?
> > >
> > > RE tokens distributed amongst shards. The way that would work right now
> > is
> > > that each port number appears to be a discrete instance of the server. So
> > > you could have shards be actual shards that are simply colocated on the
> > > same box, run in the same process, and share resources. I know this
> > pushes
> > > more of the complexity into the server vs the driver as the server
> > expects
> > > all shards to share some client visible like system tables and certain
> > > identifiers.
> > >
> > > Ariel
> > > On Thu, Apr 19, 2018, at 12:59 PM, Avi Kivity wrote:
> > > > Port-per-shard is likely the easiest option but it's too ugly to
> > > > contemplate. We run on machines with 160 shards (IBM POWER 2s20c160t
> > > > IIRC), it will be just horrible to have 160 open ports.
> > > >
> > > >
> > > > It also doesn't fit will with the NICs ability to automatically
> > > > distribute packets among cores using multiple queues, so the kernel
> > > > would have to shuffle those packets around. Much better to have those
> > > > packets delivered directly to the core that will service them.
> > > >
> > > >
> > > > (also, some protocol changes are needed so the driver knows how tokens
> > > > are distributed among shards)
> > > >
> > > > On 2018-04-19 19:46, Ben Bromhead wrote:
> > > > > WRT to #3
> > > > > To fit in the existing protocol, could you have each shard listen on
> > a
> > > > > different port? Drivers are likely going to support this due to
> > > > > https://issues.apache.org/jira/browse/CASSANDRA-7544 (
> > > > > https://issues.apache.org/jira/browse/CASSANDRA-11596). I'm not
> > sup

Re: Roadmap for 4.0

2018-04-12 Thread Rahul Singh
I can commit some resources on my team - especially as we onboard some of our 
summer apprentices.

I have some proprietary stress tools geared for Cassandra read / writes that 
are a little better and creates a little more realistic data than Cassandra 
stress.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Apr 12, 2018, 3:41 PM -0400, Nate McCall <zznat...@gmail.com>, wrote:
> Ok. So who's willing to test 4.0 on June 2nd? Let's start a sign up.
>
> We (tlp) will put some resources on this via going through some canned
> scenarios we have internally. We aren't in a position to test data validity
> (yet) but we can do a lot around cluster behavior.
>
> Who else has specific stuff they are willing to do? Even if it's just
> tee'ing prod traffic, that would be hugely valuable.
>
> On Fri, Apr 13, 2018, 6:15 AM Jeff Jirsa <jji...@gmail.com> wrote:
>
> > On Thu, Apr 12, 2018 at 9:41 AM, Jonathan Haddad <j...@jonhaddad.com
> > wrote:
> >
> > > It sounds to me (please correct me if I'm wrong) like Jeff is arguing
> > that
> > > releasing 4.0 in 2 months isn't worth the effort of evaluating it,
> > because
> > > it's a big task and there's not enough stuff in 4.0 to make it
> > worthwhile.
> > >
> > >
> > More like "not enough stuff in 4.0 to make it worthwhile for the people I
> > personally know to be willing and able to find the weird bugs".
> >
> >
> > > If that is the case, I'm not quite sure how increasing the surface area
> > of
> > > changed code which needs to be vetted is going to make the process any
> > > easier.
> >
> >
> > It changes the interest level of at least some of the people able to
> > properly test it from "not willing" to "willing".
> >
> > Totally possible that there exist people who are willing and able to find
> > and fix those bugs, who just haven't committed to it in this thread. That's
> > probably why Sankalp keeps asking who's actually willing to do the testing
> > on June 2 - if nobody's going to commit to doing real testing on June 2,
> > all we're doing is adding inconvenience to those of us who'd be willing to
> > do it later in the year.
> >


Re: Repair scheduling tools

2018-04-12 Thread Rahul Singh
Schedule scheme looks good. I believe in process / sidecar can both coexist. As 
an admin would love to be able to run one or the other or none.

Thank you for taking a lead and producing a plan that can actually be executed.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Apr 12, 2018, 6:35 PM -0400, Joseph Lynch <joe.e.ly...@gmail.com>, wrote:
> Given the feedback here and on the ticket, I've written up a proposal
> for a repair
> sidecar tool
> <https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit#heading=h.5f10ng8gzle8
> in the ticket's design document. If there are no major concerns we're going
> to start working on porting the Priam implementation into this new tool
> soon.
>
> -Joey
>
> On Tue, Apr 10, 2018 at 4:17 PM, Elliott Sims <elli...@backblaze.com> wrote:
>
> > My two cents as a (relatively small) user. I'm coming at this from the
> > ops/user side, so my apologies if some of these don't make sense based on a
> > more detailed understanding of the codebase:
> >
> > Repair is definitely a major missing piece of Cassandra. Integrated would
> > be easier, but a sidecar might be more flexible. As an intermediate step
> > that works towards both options, does it make sense to start with
> > finer-grained tracking and reporting for subrange repairs? That is, expose
> > a set of interfaces (both internally and via JMX) that give a scheduler
> > enough information to run subrange repairs across multiple keyspaces or
> > even non-overlapping ranges at the same time. That lets people experiment
> > with and quickly/safely/easily iterate on different scheduling strategies
> > in the short term, and long-term those strategies can be integrated into a
> > built-in scheduler
> >
> > On the subject of scheduling, I think adjusting parallelism/aggression with
> > a possible whitelist or blacklist would be a lot more useful than a "time
> > between repairs". That is, if repairs run for a few hours then don't run
> > for a few (somewhat hard-to-predict) hours, I still have to size the
> > cluster for the load when the repairs are running. The only reason I can
> > think of for an interval between repairs is to allow re-compaction from
> > repair anticompactions, and subrange repairs seem to eliminate this. Even
> > if they didn't, a more direct method along the lines of "don't repair when
> > the compaction queue is too long" might make more sense. Blacklisted
> > timeslots might be useful for avoiding peak time or batch jobs, but only if
> > they can be specified for consistent time-of-day intervals instead of
> > unpredictable lulls between repairs.
> >
> > I really like the idea of automatically adjusting gc_grace_seconds based on
> > repair state. The only_purge_repaired_tombstones option fixes this
> > elegantly for sequential/incremental repairs on STCS, but not for subrange
> > repairs or LCS (unless a scheduler gains the ability somehow to determine
> > that every subrange in an sstable has been repaired and mark it
> > accordingly?)
> >
> >
> > On 2018/04/03 17:48:14, Blake Eggleston <b...@apple.com> wrote:
> > > Hi dev@,
> > >
> > > >
> > >
> > > The question of the best way to schedule repairs came up on
> > CASSANDRA-14346, and I thought it would be good to bring up the idea of an
> > external tool on the dev list.
> > >
> > > >
> > >
> > > Cassandra lacks any sort of tools for automating routine tasks that are
> > required for running clusters, specifically repair. Regular repair is a
> > must for most clusters, like compaction. This means that, especially as far
> > as eventual consistency is concerned, Cassandra isn’t totally functional
> > out of the box. Operators either need to find a 3rd party solution or
> > implement one themselves. Adding this to Cassandra would make it easier to
> > use.
> > >
> > > >
> > >
> > > Is this something we should be doing? If so, what should it look like?
> > >
> > > >
> > >
> > > Personally, I feel like this is a pretty big gap in the project and would
> > like to see an out of process tool offered. Ideally, Cassandra would just
> > take care of itself, but writing a distributed repair scheduler that you
> > trust to run in production is a lot harder than writing a single process
> > management application that can failover.
> > >
> > > >
> > >
> > > Any thoughts on this?
> > >
> > > >
> > >
> > > Thanks,
> > >
> > > >
> > >
> > > Blake
> > >
> > >
> >


Re: Repair scheduling tools

2018-04-05 Thread Rahul Singh
t; > > level. The only thing you need to do is to store state (KS/CF/Last
> > Token)
> > > > in a simple storage like redis
> > > > - It works even pretty well when populating a empty node e.g. when
> > > changing
> > > > RFs / bootstrapping DCs
> > > > - You can easily control the cluster-load by tuning the concurrency of
> > > the
> > > > scrape process
> > > >
> > > > I don't see a reason for us to ever go back to built-in repairs if they
> > > > don't improve immensely. In many cases (especially with MVs) they are
> > > true
> > > > resource killers.
> > > >
> > > > Just my 2 cent and experience.
> > > >
> > > > 2018-04-04 17:00 GMT+02:00 Ben Bromhead <b...@instaclustr.com>:
> > > >
> > > > > +1 to including the implementation in Cassandra itself. Makes managed
> > > > > repair a first-class citizen, it nicely rounds out Cassandra's
> > > > consistency
> > > > > story and makes it 1000x more likely that repairs will get run.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Apr 4, 2018 at 10:45 AM Jon Haddad <j...@jonhaddad.com
> > wrote:
> > > > >
> > > > > > Implementation details aside, I’m firmly in the “it would be nice
> > of
> > > C*
> > > > > > could take care of it” camp. Reaper is pretty damn easy to use and
> > > > > people
> > > > > > *still* don’t put it in prod.
> > > > > >
> > > > > >
> > > > > > > On Apr 4, 2018, at 4:16 AM, Rahul Singh <
> > > > rahul.xavier.si...@gmail.com
> > > > > > wrote:
> > > > > > >
> > > > > > > I understand the merits of both approaches. In working with other
> > > DBs
> > > > > In
> > > > > > the “old country” of SQL, we often had to write indexing sequences
> > > > > manually
> > > > > > for important tables. It was “built into the product” but in order
> > to
> > > > > > leverage the maximum benefits of indices we had to have different
> > > > indices
> > > > > > other than the clustered (physical index). The process still
> > sucked.
> > > > It’s
> > > > > > never perfect.
> > > > > > >
> > > > > > > The JVM is already fraught with GC issues and putting another
> > > process
> > > > > > being managed in the same heapspace is what I’m worried about.
> > > > > Technically
> > > > > > the process could be in the same binary but started as a side Car
> > or
> > > in
> > > > > the
> > > > > > same main process.
> > > > > > >
> > > > > > > Consider a process called “cassandra-agent” that’s sitting around
> > > > with
> > > > > a
> > > > > > scheduler based on config or a Cassandra table. Distributed in the
> > > same
> > > > > > release. Shell / service scripts would start it. The end user knows
> > > it
> > > > > only
> > > > > > by examining the .sh files. This opens possibilities of including a
> > > GUI
> > > > > > hosted in the same process without cluttering the core coolness of
> > > > > > Cassandra.
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > --
> > > > > > > Rahul Singh
> > > > > > > rahul.si...@anant.us
> > > > > > >
> > > > > > > Anant Corporation
> > > > > > >
> > > > > > > On Apr 4, 2018, 2:50 AM -0400, Dor Laor <d...@scylladb.com>,
> > wrote:
> > > > > > > > We at Scylla, implemented repair in a similar way to the
> > Cassandra
> > > > > > reaper.
> > > > > > > > We do
> > > > > > > > that using an external application, written in go that manages
> > > > repair
> > > > > > for
> > > > > > > > multiple clusters
> > > > > > > > and saves the data in an external Scylla cluster. The logic
> > > > resembles
> > > > > > the
> > > > >

Re: Repair scheduling tools

2018-04-04 Thread Rahul Singh
I understand the merits of both approaches. In working with other DBs In the 
“old country” of SQL, we often had to write indexing sequences manually for 
important tables. It was “built into the product” but in order to leverage the 
maximum benefits of indices we had to have different indices other than the 
clustered (physical index). The process still sucked. It’s never perfect.

The JVM is already fraught with GC issues and putting another process being 
managed in the same heapspace is what I’m worried about. Technically the 
process could be in the same binary but started as a side Car or in the same 
main process.

Consider a process called “cassandra-agent” that’s sitting around with a 
scheduler based on config or a Cassandra table. Distributed in the same 
release. Shell / service scripts would start it. The end user knows it only by 
examining the .sh files. This opens possibilities of including a GUI hosted in 
the same process without cluttering the core coolness of Cassandra.

Best,

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Apr 4, 2018, 2:50 AM -0400, Dor Laor <d...@scylladb.com>, wrote:
> We at Scylla, implemented repair in a similar way to the Cassandra reaper.
> We do
> that using an external application, written in go that manages repair for
> multiple clusters
> and saves the data in an external Scylla cluster. The logic resembles the
> reaper one with
> some specific internal sharding optimizations and uses the Scylla rest api.
>
> However, I have doubts it's the ideal way. After playing a bit with
> CockroachDB, I realized
> it's super nice to have a single binary that repairs itself, provides a GUI
> and is the core DB.
>
> Even while distributed, you can elect a leader node to manage the repair in
> a consistent
> way so the complexity can be reduced to a minimum. Repair can write its
> status to the
> system tables and to provide an api for progress, rate control, etc.
>
> The big advantage for repair to embedded in the core is that there is no
> need to expose
> internal state to the repair logic. So an external program doesn't need to
> deal with different
> version of Cassandra, different repair capabilities of the core (such as
> incremental on/off)
> and so forth. A good database should schedule its own repair, it knows
> whether the shreshold
> of hintedhandoff was cross or not, it knows whether nodes where replaced,
> etc,
>
> My 2 cents. Dor
>
> On Tue, Apr 3, 2018 at 11:13 PM, Dinesh Joshi <
> dinesh.jo...@yahoo.com.invalid> wrote:
>
> > Simon,
> > You could still do load aware repair outside of the main process by
> > reading Cassandra's metrics.
> > In general, I don't think the maintenance tasks necessarily need to live
> > in the main process. They could negatively impact the read / write path.
> > Unless strictly required by the serving path, it could live in a sidecar
> > process. There are multiple benefits including isolation, faster iteration,
> > loose coupling. For example - this would mean that the maintenance tasks
> > can have a different gc profile than the main process and it would be ok.
> > Today that is not the case.
> > The only issue I see is that the project does not provide an official
> > sidecar. Perhaps there should be one. We probably would've not had to have
> > this discussion ;)
> > Dinesh
> >
> > On Tuesday, April 3, 2018, 10:12:56 PM PDT, Qingcun Zhou <
> > zhouqing...@gmail.com> wrote:
> >
> > Repair has been a problem for us at Uber. In general I'm in favor of
> > including the scheduling logic in Cassandra daemon. It has the benefit of
> > introducing something like load-aware repair, eg, only schedule repair
> > while no ongoing compaction or traffic is low, etc. As proposed by others,
> > we can expose keyspace/table-level configurations so that users can opt-in.
> > Regarding the risk, yes there will be problems at the beginning but in the
> > long run, users will appreciate that repair works out of the box, just like
> > compaction. We have large Cassandra deployments and can work with Netflix
> > folks for intensive testing to boost user confidence.
> >
> > On the other hand, have we looked into how other NoSQL databases do repair?
> > Is there a side car process?
> >
> >
> > On Tue, Apr 3, 2018 at 9:21 PM, sankalp kohli <kohlisank...@gmail.com
> > wrote:
> >
> > > Repair is critical for running C* and I agree with Roopa that it needs to
> > > be part of the offering. I think we should make it easy for new users to
> > > run C*.
> > >
> > > Can we have a side car process which we can add to Apache Cassandra
> > > off

Re: Repair scheduling tools

2018-04-03 Thread Rahul Singh
Agree on including in the distribution but I think repair can live 
independently and be run / configured separately.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Apr 3, 2018, 4:37 PM -0400, Nate McCall <zznat...@gmail.com>, wrote:
> This document does a really good job of listing out some of the issues
> of coordinating scheduling repair. Regardless of which camp you fall
> into, it is certainly worth a read.
>
> On Wed, Apr 4, 2018 at 8:10 AM, Joseph Lynch <joe.e.ly...@gmail.com> wrote:
> > I just want to say I think it would be great for our users if we moved
> > repair scheduling into Cassandra itself. The team here at Netflix has
> > opened the ticket <https://issues.apache.org/jira/browse/CASSANDRA-14346
> > and have written a detailed design document
> > <https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit#heading=h.iasguic42ger
> > that includes problem discussion and prior art if anyone wants to
> > contribute to that. We tried to fairly discuss existing solutions, what
> > their drawbacks are, and a proposed solution.
> >
> > If we were to put this as part of the main Cassandra daemon, I think it
> > should probably be marked experimental and of course be something that
> > users opt into (table by table or cluster by cluster) with the
> > understanding that it might not fully work out of the box the first time we
> > ship it. We have to be willing to take risks but we also have to be honest
> > with our users. It may help build confidence if a few major deployments use
> > it (such as Netflix) and we are happy of course to provide that QA as best
> > we can.
> >
> > -Joey
> >
> > On Tue, Apr 3, 2018 at 10:48 AM, Blake Eggleston <beggles...@apple.com
> > wrote:
> >
> > > Hi dev@,
> > >
> > >
> > >
> > > The question of the best way to schedule repairs came up on
> > > CASSANDRA-14346, and I thought it would be good to bring up the idea of an
> > > external tool on the dev list.
> > >
> > >
> > >
> > > Cassandra lacks any sort of tools for automating routine tasks that are
> > > required for running clusters, specifically repair. Regular repair is a
> > > must for most clusters, like compaction. This means that, especially as 
> > > far
> > > as eventual consistency is concerned, Cassandra isn’t totally functional
> > > out of the box. Operators either need to find a 3rd party solution or
> > > implement one themselves. Adding this to Cassandra would make it easier to
> > > use.
> > >
> > >
> > >
> > > Is this something we should be doing? If so, what should it look like?
> > >
> > >
> > >
> > > Personally, I feel like this is a pretty big gap in the project and would
> > > like to see an out of process tool offered. Ideally, Cassandra would just
> > > take care of itself, but writing a distributed repair scheduler that you
> > > trust to run in production is a lot harder than writing a single process
> > > management application that can failover.
> > >
> > >
> > >
> > > Any thoughts on this?
> > >
> > >
> > >
> > > Thanks,
> > >
> > >
> > >
> > > Blake
> > >
> > >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>


Re: Paying off tech debt and correctly naming things

2018-03-21 Thread Rahul Singh
+1 - can help with sections of the code

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 21, 2018, 4:25 PM -0500, Lerh Chuan Low <l...@instaclustr.com>, wrote:
> For reasons others have mentioned (nightmare to continuously update branch
> and resolve merge conflicts, existing patches/big features..) it will be a
> nightmare. It seems like in software projects (just basing it off personal
> experience) people typically refactor if a ticket they are working on
> touches the part of the code base that needs refactoring, I've not really
> seen a freeze and work off technical debt before (I'll admit upfront I
> don't know much).
>
> Thinking about it, the only ones I could come up with are the same as
> Sylvain had mentioned, which is start with a small subset and just do only
> renaming and cleaning up comments; no logic changes. I would think some
> parts of the code may take ages more before a ticket finds its way to it
> (and a knowledgable enough person is involved to even guide the refactor).
>
> So definitely, you have my (moral) support if you are going to go with it,
> +1 +1 +1
>
> On 22 March 2018 at 00:31, Eric Evans <john.eric.ev...@gmail.com> wrote:
>
> > On Wed, Mar 21, 2018 at 3:48 AM, Sylvain Lebresne <lebre...@gmail.com
> > wrote:
> >
> > [ ... ]
> >
> > > - pure code renaming is one reasonably simple aspect, but quite a few
> > > renaming may have user visible impact. Particularly around JMX where many
> > > things are name based on their class, and to a lesser extend some of our
> > > tools still use "old" naming. We can't and shouldn't ignore those impact:
> > > such user visible changes should imo be documented, and we should make
> > sure
> > > we have a reasonably painless (and thus incremental) upgrade path. My
> > hunch
> > > is the latter isn't as simple as it seems.
> >
> > Speaking as someone who has personally been burned by this
> > (repeatedly, and it's on-going), please think very carefully before
> > making such changes. I hate to think about of all the hours I wasted
> > shaving this breed of yak.
> >
> > > On Wed, Mar 21, 2018 at 9:06 AM kurt greaves <k...@instaclustr.com
> > wrote:
> >
> > [ ... ]
> >
> > --
> > Eric Evans
> > john.eric.ev...@gmail.com
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >


RE: A JIRA proposing a seperate repository for the online documentation

2018-03-17 Thread Rahul Singh
Static site generator just takes content from flat files or apis (that can be 
managed from a headless cms) and creates static files or progressive web apps 
that are optimized for speed. Nothing to do with multi-media or dynamic in 
terms of client side javascript / css. It’s just an old technology with a new 
name. Thats how we used to generate sites back in 1990s.. :)

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 17, 2018, 10:03 AM -0400, Kenneth Brotman 
<kenbrot...@yahoo.com.invalid>, wrote:
> How about if we look at the website a little differently. Isn't it an 
> opportunity to showcase Cassandra and related technologies like search! 
> Shouldn't we be an extraordinary advocate and example of the technology 
> ourselves?
>
> Rahul, your comment on the different users with different use cases was very 
> wise.
>
> I've been writing html a long time; since about 1990. You're asking me to 
> learn a weird little program, a static site generator just to change 
> something I can already do without using a program at all.
>
> Another weird thing: Wouldn't we want a website that is dynamic and 
> multi-media rich?
>
> Kenneth Brotman
>
>
> -Original Message-
> From: Rahul Singh [mailto:rahul.xavier.si...@gmail.com]
> Sent: Saturday, March 17, 2018 5:57 AM
> To: dev@cassandra.apache.org
> Subject: RE: A JIRA proposing a seperate repository for the online 
> documentation
>
> I’ve previously deep dived into Static Site generators and there are numerous 
> ones.
>
> http://leaves.anant.us/#!/leaf/7255?tag=static.site
>
> I don’t like changing technology for the sake of change. I think it’s a 
> stupid waste of time. In one hand I agree, the substance is more important 
> than the form. On the other hand. I [insert f-bomb] hate writing HTML / CSS, 
> or restructured text. Markdown is much easier. Hugo is one of many that if 
> setup right, it can save a ton of time and make it more accessible for people 
> to contribute.
>
> There is a difference however in developer documentation for developers of 
> cassandra, user documentation for cassandra users, documentation for and 
> administrators. They are different users and have different use cases. Some 
> need reference style docs, others need guides.
>
> Some good examples, (the software quality not-withstanding), correlate with 
> software propularity are Wordpress. I am not wild about Wordpress, but their 
> codex.wordpress.org has been generally a good “user doc.”
>
> Envision the outcome even if you have to mimic someone else. I don’t mind 
> stealing/copying if it gets us one step further than we are. The reaper docs 
> look easy to maintain and I could care less about Hugo, Hexo, Jekyll, Hyde, 
> KafkasMom, EinsteinsDog, ShrodingersCat static site generator.
>
> I think action should come before decision in open source. Prove something 
> before suggesting a change. Jon’s reaper example is good. If anyone has 
> something better, show it. Prove it.
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Mar 16, 2018, 6:54 PM -0400, Kenneth Brotman 
> <kenbrot...@yahoo.com.invalid>, wrote:
> > There is no need for another program. Keep the code in html, css and js 
> > People can modify that and show proposed changes in that. No need to 
> > convert back and forth from other formats. If someone is doing something 
> > more involved, they probably already have a program themselves.
> >
> > -Original Message-
> > From: beggles...@apple.com [mailto:beggles...@apple.com]
> > Sent: Friday, March 16, 2018 3:16 PM
> > To: dev@cassandra.apache.org
> > Subject: Re: A JIRA proposing a seperate repository for the online
> > documentation
> >
> > It would probably be more productive to list some specific concerns you 
> > have with Hugo. Then explain why you think they make using it a bad idea 
> > Then offer some alternatives.
> >
> > On 3/16/18, 1:18 PM, "Kenneth Brotman" <kenbrot...@yahoo.com.INVALID> wrote:
> >
> > Thanks for that Eric Evans.
> >
> > I'm not sure Hugo is the way to go. I don't see how I would generate the 
> > quality of work I would want with it. It seems like another example of 
> > coders learning and using a more complicated program to generate the code 
> > they could have already generated - it’s a disease in the I.T. industry 
> > right now. But I could be wrong.
> >
> > Here's the thing. I've been spending a lot of my time for the past three 
> > weeks now trying to help with the website. That is a tiny website. I've 
> > never worked with a website that tiny. Bear with me.
> >
> > I'm stud

RE: A JIRA proposing a seperate repository for the online documentation

2018-03-17 Thread Rahul Singh
I’ve previously deep dived into Static Site generators and there are numerous 
ones.

http://leaves.anant.us/#!/leaf/7255?tag=static.site

I don’t like changing technology for the sake of change. I think it’s a stupid 
waste of time. In one hand I agree, the substance is more important than the 
form. On the other hand. I [insert f-bomb] hate writing HTML / CSS, or 
restructured text. Markdown is much easier. Hugo is one of many that if setup 
right, it can save a ton of time and make it more accessible for people to 
contribute.

There is a difference however in developer documentation for developers of 
cassandra, user documentation for cassandra users, documentation for and 
administrators. They are different users and have different use cases. Some 
need reference style docs, others need guides.

Some good examples, (the software quality not-withstanding), correlate with 
software propularity are Wordpress. I am not wild about Wordpress, but their 
codex.wordpress.org has been generally a good “user doc.”

Envision the outcome even if you have to mimic someone else. I don’t mind 
stealing/copying if it gets us one step further than we are. The reaper docs 
look easy to maintain and I could care less about Hugo, Hexo, Jekyll, Hyde, 
KafkasMom, EinsteinsDog, ShrodingersCat static site generator.

I think action should come before decision in open source. Prove something 
before suggesting a change. Jon’s reaper example is good. If anyone has 
something better, show it. Prove it.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 16, 2018, 6:54 PM -0400, Kenneth Brotman <kenbrot...@yahoo.com.invalid>, 
wrote:
> There is no need for another program. Keep the code in html, css and js. 
> People can modify that and show proposed changes in that. No need to convert 
> back and forth from other formats. If someone is doing something more 
> involved, they probably already have a program themselves.
>
> -Original Message-
> From: beggles...@apple.com [mailto:beggles...@apple.com]
> Sent: Friday, March 16, 2018 3:16 PM
> To: dev@cassandra.apache.org
> Subject: Re: A JIRA proposing a seperate repository for the online 
> documentation
>
> It would probably be more productive to list some specific concerns you have 
> with Hugo. Then explain why you think they make using it a bad idea. Then 
> offer some alternatives.
>
> On 3/16/18, 1:18 PM, "Kenneth Brotman" <kenbrot...@yahoo.com.INVALID> wrote:
>
> Thanks for that Eric Evans.
>
> I'm not sure Hugo is the way to go. I don't see how I would generate the 
> quality of work I would want with it. It seems like another example of coders 
> learning and using a more complicated program to generate the code they could 
> have already generated - it’s a disease in the I.T. industry right now. But I 
> could be wrong.
>
> Here's the thing. I've been spending a lot of my time for the past three 
> weeks now trying to help with the website. That is a tiny website. I've never 
> worked with a website that tiny. Bear with me.
>
> I'm studying Jeff Carpenter and Eben Hewitt's book: Cassandra The Definitive 
> Guide 
> https://www.amazon.com/Cassandra-Definitive-Guide-Distributed-Scale/dp/1491933666/ref=sr_1_1?ie=UTF8=1521230539=8-1=cassandra+the+definitive+guide
>  and have already have a terrible itch to start contributing some code. I 
> just want to get set up to do that. The book seems to be a good way to get 
> familiar with the internals and the code of Cassandra.
>
> I can only do so much for the group at one time just like anyone else. I'll 
> only do top quality work. I'll only be a part of top quality work. It could 
> be that I won't feel comfortable with what the group wants to do for the 
> website.
>
> Please keep working on it as it is really embarrassing, terrible, substandard 
> unacceptable beneath professional standards...
>
> I will contribute if it's possible for me to do so. Let's see what we decide 
> to do going forward for the website.
>
> Kenneth Brotman
> (Cassandra coder?)
>
> -Original Message-
> From: Eric Evans [mailto:john.eric.ev...@gmail.com]
> Sent: Friday, March 16, 2018 7:59 AM
> To: dev@cassandra.apache.org
> Subject: Re: A JIRA proposing a seperate repository for the online 
> documentation
>
> On Thu, Mar 15, 2018 at 11:40 AM, Kenneth Brotman 
> <kenbrotman@yahoo.cominvalid> wrote:
> > Well pickle my cucumbers Jon! It's good to know that you have experience 
> > with Hugo, see it as a good fit and that all has been well. I look forward 
> > to the jira epic!
> >
> > How exactly does the group make such a decision: Call for final discussion? 
> > Call for vote? Wait for the PMC to vote?
>
> Good question!
>
> Decisions like this are made by consensus; As 

Re: A JIRA proposing a seperate repository for the online documentation

2018-03-15 Thread Rahul Singh
I don’t understand why it’s so complicated. In tree docs are as good as any. 
All the old docs are there in the version control system.

All we need to is a) generate docs for old versions b) improve user experience 
on the site by having it clearly laid out what is latest vs. old docs. and c) 
have some semblance of a search maybe using something like Algolia or whatever.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 14, 2018, 7:58 PM -0400, Murukesh Mohanan <murukesh.moha...@gmail.com>, 
wrote:
> I think this was how it was in the dark ages, with the wiki and all. I
> believe the reason why they shifted to in-tree docs is that this way,
> people who make changes to the code are more likely to make the
> corresponding doc changes as well, and reviewers have it easier to ensure
> docs are updated with new patches. The wiki was often left behind the code.
> So I they will move back to an out-of-tree doc system again. The way
> Michael put it in CASSANDRA-13907, the main blocker behind having docs for
> multiple versions online is that it's a pain just to get the docs for trunk
> updated. Once the current site update process is improved, multiple
> versions can more easily be added.
>
>
> On Thu, Mar 15, 2018 at 1:22 Kenneth Brotman <kenbrot...@yahoo.com.invalid
> wrote:
>
> > https://issues.apache.org/jira/browse/CASSANDRA-14313
> >
> >
> >
> > For some reason I'm told by many committers that we should not have sets of
> > documentation for other versions than the current version in a tree for
> > that
> > version. This has made it difficult, maybe impossible to have
> > documentation
> > for all the supported versions on the website at one time.
> >
> >
> >
> > As a solution I propose that we maintain the online documentation in a
> > separate repository that is managed as the current repository under the
> > guidance of the Apache Cassandra PMC (Project Management Committee); and
> > that in the new repository . . .
> >
> >
> >
> > Please see the jira. I hope it's a good answer to everyone.
> >
> >
> >
> > KennethBrotman
> >
> >
> >
> >
> >
> > --
>
> Murukesh Mohanan,
> Yahoo! Japan


Re: Filling in the blank To Do sections on the Apache Cassandra web site

2018-02-23 Thread Rahul Singh
Ken I can help out with text or with drafting new stuff. I’ll review the list 
you provided and get back to you.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 23, 2018, 7:28 PM -0500, Kenneth Brotman <kenbrot...@yahoo.com.invalid>, 
wrote:
> These nine web pages on the Apache Cassandra web site have blank To Do
> sections. Most of the web pages are completely blank. Mind you there is a
> lot of hard work already done on the documentation. I'll make JIRA's for
> any of the blank sections where there is not already a JIRA. Then it will
> be on to writing up those sections. If you have any text to help me get
> started for any of these sections that would be really cool.
>
>
>
> http://cassandra.apache.org/doc/latest/architecture/overview.html
>
>
>
> http://cassandra.apache.org/doc/latest/architecture/dynamo.html
>
>
>
> http://cassandra.apache.org/doc/latest/architecture/guarantees.html
>
>
>
> http://cassandra.apache.org/doc/latest/data_modeling/index.html
>
>
>
> http://cassandra.apache.org/doc/latest/operating/read_repair.html
>
>
>
> http://cassandra.apache.org/doc/latest/operating/hints.html
>
>
>
> http://cassandra.apache.org/doc/latest/operating/backups.html
>
>
>
> http://cassandra.apache.org/doc/latest/operating/bulk_loading.html
>
>
>
> http://cassandra.apache.org/doc/latest/troubleshooting/index.html
>
>
>
> Kenneth Brotman
>
>
>


Re: Why isn't there a separate JVM per table?

2018-02-23 Thread Rahul Singh
I agree with Jon. The actor based model would be the logical approach to get to 
be more “efficient.” Until then fault tolerance has to be built into the driver 
to contact another node if in the middle and then reconcile the commitlog later.

I’ve seen many people combine an external queue to deal with the GC issues by 
adding yet another layer of asynchronicity. (If it’s not a word it is now)

Even in systems like SQL servers there are internal queues that get locked up 
due to memory, storage, or cpu pressures. It’s not a GC pause but it may as 
well be. Even with all the tweaking the only way to get beyond is distributed 
asynchronous systems that are self healing.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 23, 2018, 4:34 AM -0500, Brian Hess <brianmh...@gmail.com>, wrote:
> Something folks haven't raised, but would be another impediment here is that 
> in Cassandra if you submit a batch (logged or unlogged) for two tables in the 
> same keyspace with the same partition then Cassandra collapses them into the 
> same Mutation and the two INSERTs are processed atomically. There are a few 
> (maybe more than a few) things that take advantage of this fact.
>
> If you move each table to its own JVM then you cannot really achieve this 
> atomicity. So, at most you would want to consider a JVM per keyspace (or 
> consider touching a lot of code or changing a pretty fundamental/deep 
> contract in Cassandra).
>
> >Brian
>
> Sent from my iPhone
>
> > On Feb 22, 2018, at 7:10 PM, J. D. Jordan <jeremiah.jor...@gmail.com> wrote:
> >
> > I would be careful with anything per table for memory sizing. We used to 
> > have many caches and things that could be tuned per table, but they have 
> > all since changed to being per node, as it was a real PITA to get them 
> > right. Having to do per table heap/gc/memtable/cache tuning just sounds 
> > like a usability nightmare.
> >
> > -Jeremiah
> >
> > On Feb 22, 2018, at 6:59 PM, kurt greaves <k...@instaclustr.com> wrote:
> >
> > > >
> > > > ... compaction on its own jvm was also something I was thinking about, 
> > > > but
> > > > then I realized even more JVM sharding could be done at the table level.
> > >
> > >
> > > Compaction in it's own JVM makes sense. At the table level I'm not so sure
> > > about. Gotta be some serious overheads from running that many JVM's.
> > > Keyspace might be reasonable purely to isolate bad tables, but for the 
> > > most
> > > part I'd think isolating every table isn't that beneficial and pretty
> > > complicated. In most cases people just fix their modelling so that they
> > > don't generate large amounts of GC, and hopefully test enough so they know
> > > how it will behave in production.
> > >
> > > If we did at the table level we would inevitable have to make each
> > > individual table incredibly tune-able which would be a bit tedious IMO.
> > > There's no way for us to smartly decide how much heap/memtable space/etc
> > > each table should use (not without some decent AI, anyway).
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>