from:"Edward Capriolo"

Re: Code quality, principles and rules

2017-03-29 Thread Edward Capriolo

On Sat, Mar 18, 2017 at 9:21 PM, Qingcun Zhou <zhouqing...@gmail.com> wrote:

> I wanted to contribute some unit test cases. However the unit test approach
> in Cassandra seems weird to me after looking into some examples. Not sure
> if anyone else has the same feeling.
>
> Usually, at least for all Java projects I have seen, people use mock
> (mockito, powermock) for dependencies. And then in a particular test case
> you verify the behavior using junit.assert* or mockito.verify. However we
> don't use mockito in Cassandra. Is there any reason for this? Without
> these, how easy do people think about adding unit test cases?
>
>
> Besides that, we have lots of singletons and there are already a handful of
> tickets to eliminate them. Maybe I missed something but I'm not seeing much
> progress. Is anyone actively working on this?
>
> Maybe a related problem. Some unit test cases have method annotated with
> @BeforeClass to do initialization work. However, it not only initializes
> direct dependencies, but also indirect ones, including loading
> cassandra.yaml and initializing indirect dependencies. This seems to me
> more like functional/integration test but not unit test style.
>
>
> On Fri, Mar 17, 2017 at 2:56 PM, Jeremy Hanna <jeremy.hanna1...@gmail.com>
> wrote:
>
> > https://issues.apache.org/jira/browse/CASSANDRA-7837 may be some
> > interesting context regarding what's been worked on to get rid of
> > singletons and static initialization.
> >
> > > On Mar 17, 2017, at 4:47 PM, Jonathan Haddad <j...@jonhaddad.com>
> wrote:
> > >
> > > I'd like to think that if someone refactors existing code, making it
> more
> > > testable (with tests, of course) it should be acceptable on it's own
> > > merit.  In fact, in my opinion it sometimes makes more sense to do
> these
> > > types of refactorings for the sole purpose of improving stability and
> > > testability as opposed to mixing them with features.
> > >
> > > You referenced the issue I fixed in one of the early emails.  The fix
> > > itself was a couple lines of code.  Refactoring the codebase to make it
> > > testable would have been a huge effort.  I wish I had time to do it.  I
> > > created CASSANDRA-13007 as a follow up with the intent of working on
> > > compaction from a purely architectural standpoint.  I think this type
> of
> > > thing should be done throughout the codebase.
> > >
> > > Removing the singletons is a good first step, my vote is we just rip
> off
> > > the bandaid, do it, and move forward.
> > >
> > > On Fri, Mar 17, 2017 at 2:20 PM Edward Capriolo <edlinuxg...@gmail.com
> >
> > > wrote:
> > >
> > >>> On Fri, Mar 17, 2017 at 2:31 PM, Jason Brown <jasedbr...@gmail.com>
> > wrote:
> > >>>
> > >>> To François's point about code coverage for new code, I think this
> > makes
> > >> a
> > >>> lot of sense wrt large features (like the current work on
> > >> 8457/12229/9754).
> > >>> It's much simpler to (mentally, at least) isolate those changed
> > sections
> > >>> and it'll show up better in a code coverage report. With small
> patches,
> > >>> that might be harder to achieve - however, as the patch should come
> > with
> > >>> *some* tests (unless it's a truly trivial patch), it might just work
> > >> itself
> > >>> out.
> > >>>
> > >>> On Fri, Mar 17, 2017 at 11:19 AM, Jason Brown <jasedbr...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> As someone who spent a lot of time looking at the singletons topic
> in
> > >> the
> > >>>> past, Blake brings a great perspective here. Figuring out and
> > >>> communicating
> > >>>> how best to test with the system we have (and of course
> incrementally
> > >>>> making that system easier to work with/test) seems like an
> achievable
> > >>> goal.
> > >>>>
> > >>>> On Fri, Mar 17, 2017 at 10:17 AM, Edward Capriolo <
> > >> edlinuxg...@gmail.com
> > >>>>
> > >>>> wrote:
> > >>>>
> > >>>>> On Fri, Mar 17, 2017 at 12:33 PM, Blake Eggleston <
> > >> beggles...@apple.com
> > >>>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> I think we’re getting a little ahead of ourselves talking

Re: Spam Moderation

2017-03-23 Thread Edward Capriolo

On Thu, Mar 23, 2017 at 12:42 PM, Daryl Hawken 
wrote:

> +1.
>
> On Thu, Mar 23, 2017 at 12:10 PM, Michael Shuler 
> wrote:
>
> > I won't reply to the obvious spam to hilight it any further, so new
> > message..
> >
> > Could the mailing list moderator that approved the "client list" message
> > identify themselves and possibly explain how that was seen as a valid
> > message about the development of Apache Cassandra?
> >
> > --
> > Kind regards,
> > Michael
> >
>
>
>
> --
> *Most people have more than the average number of legs*
>

While the dev list is not clearly the place, and ithe email is spam looking
it is interesting to know that someone is marketing such a list. I have
spoken at different events an those entities likely have my email so I am
curious about the list.

I think the situation is much like the "Free bsd backdoor emails"
http://marc.info/?l=openbsd-tech=129236621626462=2 . IE even if you
believe  99.999% the info untrue do you pass the info along?

Re: DataStax Client List

2017-03-23 Thread Edward Capriolo

Well that is quite unsettling.

On Thu, Mar 23, 2017 at 10:33 AM, Theresa Taylor <
theresa.tay...@onlinedatatech.biz> wrote:

> Hi,
>
> Would you be interested in acquiring a list of DataStax users' information
> in an Excel sheet for unlimited marketing usage?
>
> List includes – First and Last name, Phone number, Email Address, Company
> Name, Job Title, Address, City, State, Zip, SIC code/Industry, Revenue and
> Company Size. The leads can also be further customized as per requirements.
>
> We can provide contact lists from any country/industry/title.
>
> If your target criteria are different kindly get back to us with your
> requirement with geography and job titles to provide you with counts and
> more information.
>
> Let me know your thoughts!
>
> Thanks,
>
>
> Theresa
> Senior Information Analyst
>
>
> If you wish not to receive marketing emails, please reply back “Opt
> Out” In headlines
>

Re: splitting CQL parser & spec into separate repo

2017-03-23 Thread Edward Capriolo

On Thu, Mar 23, 2017 at 10:56 AM, Eric Evans <john.eric.ev...@gmail.com>
wrote:

> On Wed, Mar 22, 2017 at 10:01 AM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
> > I believe you could accomplish a similar goal by making a multi-module
> > project https://maven.apache.org/guides/mini/guide-multiple-modules.html
> .
> > Probably not as easy thanks to ant, but I think that is a better route.
> One
> > there actually are N dependent projects in the wild you can make the case
> > for overhead which is both technical and in ASF based.
>
> This was my first thought: If we were using Maven, we'd probably
> already have created this as a module[*].
>
>
> [*]: Maybe a surprise to some given how strongly I pushed back against
> it in the Early Days, but we would be so much better off at this point
> with Maven.
>
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>

Well the ant maven bit is a separate issue: It still could be done with
ant, it could be done in a way that the port is very easy.
http://ant.apache.org/easyant/history/trunk/ref/anttasks/SubModuletask.html

Re: splitting CQL parser & spec into separate repo

2017-03-22 Thread Edward Capriolo

On Tue, Mar 21, 2017 at 5:45 PM, Anthony Grasso <anthony.gra...@gmail.com>
wrote:

> This is a great idea
>
> +1 (non-binding)
>
> On 22 March 2017 at 07:04, Edward Capriolo <edlinuxg...@gmail.com> wrote:
>
> > On Tue, Mar 21, 2017 at 3:24 PM, Mark Dewey <milde...@gmail.com> wrote:
> >
> > > I can immediately think of a project I would use that in. +1
> > >
> > > On Tue, Mar 21, 2017 at 12:18 PM Jonathan Haddad <j...@jonhaddad.com>
> > > wrote:
> > >
> > > > I created CASSANDRA-13284 a few days ago with the intent of starting
> a
> > > > discussion around the topic of breaking the CQL parser out into a
> > > separate
> > > > project.  I see a few benefits to doing it and was wondering what the
> > > folks
> > > > here thought as well.
> > > >
> > > > First off, the Java CQL parser would obviously continue to be the
> > > reference
> > > > parser.  I'd love to see other languages have CQL parsers as well,
> but
> > > the
> > > > intent here isn't for the OSS C* team to be responsible for
> maintaining
> > > > that.  My vision here is simply the ability to have some high level
> > > > CQLParser.parse(statement) call that returns the parse tree, nothing
> > > more.
> > > >
> > > > It would be nice to be able to leverage that parser in other projects
> > > such
> > > > as IDEs, code gen tools, etc.  It would be outstanding to be able to
> > > create
> > > > the parser tests in such a way that they can be referenced by other
> > > parsers
> > > > in other languages.  Yay code reuse.  It also has the benefit of
> making
> > > the
> > > > codebase a little more modular and a bit easier to understand.
> > > >
> > > > Thoughts?
> > > >
> > > > Jon
> > > >
> > >
> >
> > It turns out that a similar thing was done with Hive.
> >
> > https://calcite.apache.org/
> >
> > https://calcite.apache.org/community/#apache-calcite-one-
> planner-fits-all
> >
> > The challenge is typically adoption. The elevator pitch is like:
> > "EVERYONE WILL SHARE THIS AND IT WILL BE AWESOME". Maybe this is the
> wrong
> > word, but lets just say frenemies
> > exist and they do not like control of something moving to a shared
> medium.
> > Technical issues like ANTLR 3 vs ANTRL 4 etc.
> > For something like Hive the challenge is the parser/planner needs only be
> > fast enough for analytic queries but that would not
> > be the right move for say CQL.
> >
>

I believe you could accomplish a similar goal by making a multi-module
project https://maven.apache.org/guides/mini/guide-multiple-modules.html.
Probably not as easy thanks to ant, but I think that is a better route. One
there actually are N dependent projects in the wild you can make the case
for overhead which is both technical and in ASF based.

Re: Can we kill the wiki?

2017-03-19 Thread Edward Capriolo

Wikis are still good for collaberative design etc. Its a burden to edit the
docs and its not the place for all info.

On Friday, March 17, 2017, Murukesh Mohanan 
wrote:

> I wonder if the recent influx has anything to do with GSoC. The student
> application period begins in a few days. I don't see any Cassandra issues
> on the GSoC ideas list, though.
>
> On Sat, 18 Mar 2017 at 10:40 Anthony Grasso  >
> wrote:
>
> +1 to killing the wiki as well. If that is not possible, we should at least
> put a note on there saying it is deprecated and point people to the new
> docs.
>
> On 18 March 2017 at 08:09, Jonathan Haddad  > wrote:
>
> > +1 to killing the wiki.
> >
> > On Fri, Mar 17, 2017 at 2:08 PM Blake Eggleston  >
> > wrote:
> >
> > > With CASSANDRA-8700, docs were moved in tree, with the intention that
> > they
> > > would replace the wiki. However, it looks like we’re still getting
> > regular
> > > requests to edit the wiki. It seems like we should be directing these
> > folks
> > > to the in tree docs and either disabling edits for the wiki, or just
> > > removing it entirely, and replacing it with a link to the hosted docs.
> > I'd
> > > prefer we just remove it myself, makes things less confusing for
> > newcomers.
> > >
> > > Does that seem reasonable to everyone?
> >
>
> --
>
> Murukesh Mohanan,
> Yahoo! Japan
>


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Code quality, principles and rules

2017-03-19 Thread Edward Capriolo

On Saturday, March 18, 2017, Qingcun Zhou <zhouqing...@gmail.com> wrote:

> I wanted to contribute some unit test cases. However the unit test approach
> in Cassandra seems weird to me after looking into some examples. Not sure
> if anyone else has the same feeling.
>
> Usually, at least for all Java projects I have seen, people use mock
> (mockito, powermock) for dependencies. And then in a particular test case
> you verify the behavior using junit.assert* or mockito.verify. However we
> don't use mockito in Cassandra. Is there any reason for this? Without
> these, how easy do people think about adding unit test cases?
>
>
> Besides that, we have lots of singletons and there are already a handful of
> tickets to eliminate them. Maybe I missed something but I'm not seeing much
> progress. Is anyone actively working on this?
>
> Maybe a related problem. Some unit test cases have method annotated with
> @BeforeClass to do initialization work. However, it not only initializes
> direct dependencies, but also indirect ones, including loading
> cassandra.yaml and initializing indirect dependencies. This seems to me
> more like functional/integration test but not unit test style.
>
>
> On Fri, Mar 17, 2017 at 2:56 PM, Jeremy Hanna <jeremy.hanna1...@gmail.com
> <javascript:;>>
> wrote:
>
> > https://issues.apache.org/jira/browse/CASSANDRA-7837 may be some
> > interesting context regarding what's been worked on to get rid of
> > singletons and static initialization.
> >
> > > On Mar 17, 2017, at 4:47 PM, Jonathan Haddad <j...@jonhaddad.com
> <javascript:;>> wrote:
> > >
> > > I'd like to think that if someone refactors existing code, making it
> more
> > > testable (with tests, of course) it should be acceptable on it's own
> > > merit.  In fact, in my opinion it sometimes makes more sense to do
> these
> > > types of refactorings for the sole purpose of improving stability and
> > > testability as opposed to mixing them with features.
> > >
> > > You referenced the issue I fixed in one of the early emails.  The fix
> > > itself was a couple lines of code.  Refactoring the codebase to make it
> > > testable would have been a huge effort.  I wish I had time to do it.  I
> > > created CASSANDRA-13007 as a follow up with the intent of working on
> > > compaction from a purely architectural standpoint.  I think this type
> of
> > > thing should be done throughout the codebase.
> > >
> > > Removing the singletons is a good first step, my vote is we just rip
> off
> > > the bandaid, do it, and move forward.
> > >
> > > On Fri, Mar 17, 2017 at 2:20 PM Edward Capriolo <edlinuxg...@gmail.com
> <javascript:;>>
> > > wrote:
> > >
> > >>> On Fri, Mar 17, 2017 at 2:31 PM, Jason Brown <jasedbr...@gmail.com
> <javascript:;>>
> > wrote:
> > >>>
> > >>> To François's point about code coverage for new code, I think this
> > makes
> > >> a
> > >>> lot of sense wrt large features (like the current work on
> > >> 8457/12229/9754).
> > >>> It's much simpler to (mentally, at least) isolate those changed
> > sections
> > >>> and it'll show up better in a code coverage report. With small
> patches,
> > >>> that might be harder to achieve - however, as the patch should come
> > with
> > >>> *some* tests (unless it's a truly trivial patch), it might just work
> > >> itself
> > >>> out.
> > >>>
> > >>> On Fri, Mar 17, 2017 at 11:19 AM, Jason Brown <jasedbr...@gmail.com
> <javascript:;>>
> > >>> wrote:
> > >>>
> > >>>> As someone who spent a lot of time looking at the singletons topic
> in
> > >> the
> > >>>> past, Blake brings a great perspective here. Figuring out and
> > >>> communicating
> > >>>> how best to test with the system we have (and of course
> incrementally
> > >>>> making that system easier to work with/test) seems like an
> achievable
> > >>> goal.
> > >>>>
> > >>>> On Fri, Mar 17, 2017 at 10:17 AM, Edward Capriolo <
> > >> edlinuxg...@gmail.com <javascript:;>
> > >>>>
> > >>>> wrote:
> > >>>>
> > >>>>> On Fri, Mar 17, 2017 at 12:33 PM, Blake Eggleston <
> > >> beggles...@apple.com <javascript:;>
> > &g

Re: Code quality, principles and rules

2017-03-17 Thread Edward Capriolo

On Fri, Mar 17, 2017 at 2:31 PM, Jason Brown <jasedbr...@gmail.com> wrote:

> To François's point about code coverage for new code, I think this makes a
> lot of sense wrt large features (like the current work on 8457/12229/9754).
> It's much simpler to (mentally, at least) isolate those changed sections
> and it'll show up better in a code coverage report. With small patches,
> that might be harder to achieve - however, as the patch should come with
> *some* tests (unless it's a truly trivial patch), it might just work itself
> out.
>
> On Fri, Mar 17, 2017 at 11:19 AM, Jason Brown <jasedbr...@gmail.com>
> wrote:
>
> > As someone who spent a lot of time looking at the singletons topic in the
> > past, Blake brings a great perspective here. Figuring out and
> communicating
> > how best to test with the system we have (and of course incrementally
> > making that system easier to work with/test) seems like an achievable
> goal.
> >
> > On Fri, Mar 17, 2017 at 10:17 AM, Edward Capriolo <edlinuxg...@gmail.com
> >
> > wrote:
> >
> >> On Fri, Mar 17, 2017 at 12:33 PM, Blake Eggleston <beggles...@apple.com
> >
> >> wrote:
> >>
> >> > I think we’re getting a little ahead of ourselves talking about DI
> >> > frameworks. Before that even becomes something worth talking about,
> we’d
> >> > need to have made serious progress on un-spaghettifying Cassandra in
> the
> >> > first place. It’s an extremely tall order. Adding a DI framework right
> >> now
> >> > would be like throwing gasoline on a raging tire fire.
> >> >
> >> > Removing singletons seems to come up every 6-12 months, and usually
> >> > abandoned once people figure out how difficult they are to remove
> >> properly.
> >> > I do think removing them *should* be a long term goal, but we really
> >> need
> >> > something more immediately actionable. Otherwise, nothing’s going to
> >> > happen, and we’ll be having this discussion again in a year or so when
> >> > everyone’s angry that Cassandra 5.0 still isn’t ready for production,
> a
> >> > year after it’s release.
> >> >
> >> > That said, the reason singletons regularly get brought up is because
> >> doing
> >> > extensive testing of anything in Cassandra is pretty much impossible,
> >> since
> >> > the code is basically this big web of interconnected global state.
> >> Testing
> >> > anything in isolation can’t be done, which, for a distributed
> database,
> >> is
> >> > crazy. It’s a chronic problem that handicaps our ability to release a
> >> > stable database.
> >> >
> >> > At this point, I think a more pragmatic approach would be to draft and
> >> > enforce some coding standards that can be applied in day to day
> >> development
> >> > that drive incremental improvement of the testing and testability of
> the
> >> > project. What should be tested, how it should be tested. How to write
> >> new
> >> > code that talks to the rest of Cassandra and is testable. How to fix
> >> bugs
> >> > in old code in a way that’s testable. We should also have some
> >> guidelines
> >> > around refactoring the wildly untested sections, how to get started,
> >> what
> >> > to do, what not to do, etc.
> >> >
> >> > Thoughts?
> >>
> >>
> >> To make the conversation practical. There is one class I personally
> really
> >> want to refactor so it can be tested:
> >>
> >> https://github.com/apache/cassandra/blob/trunk/src/java/org/
> >> apache/cassandra/net/OutboundTcpConnection.java
> >>
> >> There is little coverage here. Questions like:
> >> what errors cause the connection to restart?
> >> when are undropable messages are dropped?
> >> what happens when the queue fills up?
> >> Infamous throw new AssertionError(ex); (which probably bubble up to
> >> nowhere)
> >> what does the COALESCED strategy do in case XYZ.
> >> A nifty label (wow a label you just never see those much!)
> >> outer:
> >> while (!isStopped)
> >>
> >> Comments to jira's that probably are not explicitly tested:
> >> // If we haven't retried this message yet, put it back on the queue to
> >> retry after re-connecting.
> >> // See CASSANDRA-5393 and CASSANDRA-12192.
> >>
> >> If I were to undertake this cleanup, would there actually be support? IE
> >> if
> >> this going to turn into an "it aint broken. don't fix it thing" or a "we
> >> don't want to change stuff just to add tests" . Like will someone pledge
> >> to
> >> agree its kinda wonky and merge the effort in < 1 years time?
> >>
> >
> >
>

So ...:) If open a ticket to refactor OutboundTcpConnection.java to do
specific unit testing and possibly even pull things out to the point that I
can actually open a socket and to an end to end test will you/anyone
support that? (it sounds like your saying I must/should make a large
feature to add a test)

Re: Code quality, principles and rules

2017-03-17 Thread Edward Capriolo

On Fri, Mar 17, 2017 at 6:41 AM, Ryan Svihla <r...@foundev.pro> wrote:

> Different DI frameworks have different initialization costs, even inside of
> spring even depending on how you wire up dependencies (did it use autowire
> with reflection, parse a giant XML of explicit dependencies, etc).
>
> To back this assertion up for awhile in that community benching different
> DI frameworks perf was a thing and you can find benchmarks galore with a
> quick Google.
>
> The practical cost is also dependent on the lifecycles used (transient
> versus Singleton style for example) and features used (Interceptors
> depending on implementation can get expensive).
>
> So I think there should be some quantification of cost before a framework
> is considered, something like dagger2 which uses codegen I wager is only a
> cost at compile time (have not benched it, but looking at it's feature set,
> that's my guess) , Spring I know from experience even with the most optimal
> settings is slower on initialization time than doing by DI "by hand" at
> minimum, and that can sometimes be substantial.
>
>
> On Mar 17, 2017 12:29 AM, "Edward Capriolo" <edlinuxg...@gmail.com> wrote:
>
> On Thu, Mar 16, 2017 at 5:18 PM, Jason Brown <jasedbr...@gmail.com> wrote:
>
> > >> do we have plan to integrate with a dependency injection framework?
> >
> > No, we (the maintainers) have been pretty much against more frameworks
> due
> > to performance reasons, overhead, and dependency management problems.
> >
> > On Thu, Mar 16, 2017 at 2:04 PM, Qingcun Zhou <zhouqing...@gmail.com>
> > wrote:
> >
> > > Since we're here, do we have plan to integrate with a dependency
> > injection
> > > framework like Dagger2? Otherwise it'll be difficult to write unit test
> > > cases.
> > >
> > > On Thu, Mar 16, 2017 at 1:16 PM, Edward Capriolo <
> edlinuxg...@gmail.com>
> > > wrote:
> > >
> > > > On Thu, Mar 16, 2017 at 3:10 PM, Jeff Jirsa <jji...@apache.org>
> wrote:
> > > >
> > > > >
> > > > >
> > > > > On 2017-03-16 10:32 (-0700), François Deliège <
> > franc...@instagram.com>
> > > > > wrote:
> > > > > >
> > > > > > To get this started, here is an initial proposal:
> > > > > >
> > > > > > Principles:
> > > > > >
> > > > > > 1. Tests always pass.  This is the starting point. If we don't
> care
> > > > > about test failures, then we should stop writing tests. A recurring
> > > > failing
> > > > > test carries no signal and is better deleted.
> > > > > > 2. The code is tested.
> > > > > >
> > > > > > Assuming we can align on these principles, here is a proposal for
> > > their
> > > > > implementation.
> > > > > >
> > > > > > Rules:
> > > > > >
> > > > > > 1. Each new release passes all tests (no flakinesss).
> > > > > > 2. If a patch has a failing test (test touching the same code
> > path),
> > > > the
> > > > > code or test should be fixed prior to being accepted.
> > > > > > 3. Bugs fixes should have one test that fails prior to the fix
> and
> > > > > passes after fix.
> > > > > > 4. New code should have at least 90% test coverage.
> > > > > >
> > > > > First I was
> > > > > I agree with all of these and hope they become codified and
> > followed. I
> > > > > don't know anyone who believes we should be committing code that
> > breaks
> > > > > tests - but we should be more strict with requiring green test
> runs,
> > > and
> > > > > perhaps more strict with reverting patches that break tests (or
> cause
> > > > them
> > > > > to be flakey).
> > > > >
> > > > > Ed also noted on the user list [0] that certain sections of the
> code
> > > > > itself are difficult to test because of singletons - I agree with
> the
> > > > > suggestion that it's time to revisit CASSANDRA-7837 and
> > CASSANDRA-10283
> > > > >
> > > > > Finally, we should also recall Jason's previous notes [1] that the
> > > actual
> > > > > test infrastructure available is limited - the system provided by
> > > > Datastax
> > > >

Re: Code quality, principles and rules

2017-03-16 Thread Edward Capriolo

On Thu, Mar 16, 2017 at 3:10 PM, Jeff Jirsa  wrote:

>
>
> On 2017-03-16 10:32 (-0700), François Deliège 
> wrote:
> >
> > To get this started, here is an initial proposal:
> >
> > Principles:
> >
> > 1. Tests always pass.  This is the starting point. If we don't care
> about test failures, then we should stop writing tests. A recurring failing
> test carries no signal and is better deleted.
> > 2. The code is tested.
> >
> > Assuming we can align on these principles, here is a proposal for their
> implementation.
> >
> > Rules:
> >
> > 1. Each new release passes all tests (no flakinesss).
> > 2. If a patch has a failing test (test touching the same code path), the
> code or test should be fixed prior to being accepted.
> > 3. Bugs fixes should have one test that fails prior to the fix and
> passes after fix.
> > 4. New code should have at least 90% test coverage.
> >
> First I was
> I agree with all of these and hope they become codified and followed. I
> don't know anyone who believes we should be committing code that breaks
> tests - but we should be more strict with requiring green test runs, and
> perhaps more strict with reverting patches that break tests (or cause them
> to be flakey).
>
> Ed also noted on the user list [0] that certain sections of the code
> itself are difficult to test because of singletons - I agree with the
> suggestion that it's time to revisit CASSANDRA-7837 and CASSANDRA-10283
>
> Finally, we should also recall Jason's previous notes [1] that the actual
> test infrastructure available is limited - the system provided by Datastax
> is not generally open to everyone (and not guaranteed to be permanent), and
> the infrastructure currently available to the ASF is somewhat limited (much
> slower, at the very least). If we require tests passing (and I agree that
> we should), we need to define how we're going to be testing (or how we're
> going to be sharing test results), because the ASF hardware isn't going to
> be able to do dozens of dev branch dtest runs per day in its current form.
>
> 0: https://lists.apache.org/thread.html/f6f3fc6d0ad1bd54a6185ce7bd7a2f
> 6f09759a02352ffc05df92eef6@%3Cuser.cassandra.apache.org%3E
> 1: https://lists.apache.org/thread.html/5fb8f0446ab97644100e4ef987f36e
> 07f44e8dd6d38f5dc81ecb3cdd@%3Cdev.cassandra.apache.org%3E
>
>
>
Ed also noted on the user list [0] that certain sections of the code itself
are difficult to test because of singletons - I agree with the suggestion
that it's time to revisit CASSANDRA-7837 and CASSANDRA-10283

Thanks for the shout out!

I was just looking at a patch about compaction. The patch was to calculate
free space correctly in case X. Compaction is not something that requires
multiple nodes to test. The logic on the surface seems simple: find tables
of similar size and select them and merge them. The reality is it turns out
now to be that way. The coverage itself both branch and line may be very
high, but what the code does not do is directly account for a wide variety
of scenarios. Without direct tests you end up with a mental approximation
of what it does and that varies from person to person and accounts for the
cases that fit in your mind. For example, you personally are only running
LevelDB inspired compaction.

Being that this this is not a multi-node problem you should be able to re
factor this heavily. Pulling everything out to a static method where all
the parameters are arguments, or inject a lot of mocks in the current code,
and develop some scenario based coverage.

That is how i typically "rescue" code I take over. I look at the nightmare
and say, "damn i am really afraid to touch this". I construct 8 scenarios
that test green. Then I force some testing into it through careful re
factoring. Now, I probably know -something- about it. Now, you are fairly
free to do a wide ranging refactor, because you at least counted for 8
scenarios and you put unit test traps so that some rules are enforced. (Or
the person changing the code has to actively REMOVE your tests asserting it
was not or no longer is valid). Later on you (or someone else)  __STILL__
might screw the entire thing up, but at least you can now build forward.

Anyway that patch on compaction was great and I am sure it improved things.
That being said it did not add any tests :). So it can easily be undone by
the next person who does not understand the specific issue trying to be
addressed. Inline comments almost scream to me 'we need a test' not
everyone believes that.

Re: State of triggers

2017-03-04 Thread Edward Capriolo

On Sat, Mar 4, 2017 at 10:26 AM, Jeff Jirsa <jji...@gmail.com> wrote:

>
>
>
> > On Mar 4, 2017, at 7:06 AM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
> >
> >> On Fri, Mar 3, 2017 at 12:04 PM, Jeff Jirsa <jji...@gmail.com> wrote:
> >>
> >> On Fri, Mar 3, 2017 at 5:40 AM, Edward Capriolo <edlinuxg...@gmail.com>
> >> wrote:
> >>
> >>>
> >>> I used them. I built do it yourself secondary indexes with them. They
> >> have
> >>> there gotchas, but so do all the secondary index implementations. Just
> >>> because datastax does not write about something. Lets see like 5 years
> >> ago
> >>> there was this: https://github.com/hmsonline/cassandra-triggers
> >>>
> >>>
> >> Still in use? How'd it work? Production ready? Would you still do it
> that
> >> way in 2017?
> >>
> >>
> >>> There is a fairly large divergence to what actual users do and what
> other
> >>> groups 'say' actual users do in some cases.
> >>>
> >>
> >> A lot of people don't share what they're doing (for business reasons, or
> >> because they don't think it's important, or because they don't know
> >> how/where), and that's fine but it makes it hard for anyone to know what
> >> features are used, or how well they're really working in production.
> >>
> >> I've seen a handful of "how do we use triggers" questions in IRC, and
> they
> >> weren't unreasonable questions, but seemed like a lot of pain, and more
> >> than one of those people ultimately came back and said they used some
> other
> >> mechanism (and of course, some of them silently disappear, so we have no
> >> idea if it worked or not).
> >>
> >> If anyone's actively using triggers, please don't keep it a secret.
> Knowing
> >> that they're being used would be a great way to justify continuing to
> >> maintain them.
> >>
> >> - Jeff
> >>
> >
> > "Still in use? How'd it work? Production ready? Would you still do it
> that way in 2017?"
> >
> > I mean that is a loaded question. How long has cassandra had Secondary
> > Indexes? Did they work well? Would you use them? How many times were
> they re-written?
>
> It wasn't really meant to be a loaded question; I was being sincere
>
> But I'll answer: secondary indexes suck for many use cases, but they're
> invaluable for their actual intended purpose, and I have no idea how many
> times they've been rewritten but they're production ready for their narrow
> use case (defined by cardinality).
>
> Is there a real triggers use case still? Alternative to MVs? Alternative
> to CDC? I've never implemented triggers - since you have, what's the level
> of surprise for the developer?

:) You mention alternatives/: Lets break them down.

MV:
They seem to have a lot pf promise. IE you can use them for things other
then equality searches, and I do think the CQL example with the top N high
scores is pretty useful. Then again our buddy Mr Roth has a thread named
"Rebuild / remove node with MV is inconsistent". I actually think a lot of
the use case for mv falls into the category of "something you should
actually be doing with storm". I can vibe with the concept of not needing a
streaming platform, but i KNOW storm would do this correctly. I don't want
to land on something like 2x index v1 v2 where there was fundamental flaws
at scale.(not saying this is case but the rebuild thing seems a bit scary)

CDC:
I slightly afraid of this. Rational: A extensible piece design specifically
for a close source implementation of hub and spoke replication. I have some
experience trying to "play along" with extensible things
https://issues.apache.org/jira/browse/CASSANDRA-12627
"Thus, I'm -1 on {[PropertyOrEnvironmentSeedProvider}}."

Not a rub, but I can't even get something committed using an existing
extensible interface. Heaven forbid a use case I have would want to *change*
the interface, I would probably get a -12. So I have no desire to try and
maintain a CDC implementation. I see myself falling into the same old "why
you want to do this? -1" trap.

Coordinator Triggers:
To bring things back really old-school coordinator triggers everyone always
wanted. In a nutshell, I DO believe they are easier to reason about then
MV. It is pretty basic, it happens on the coordinator there is no batchlogs
or whatever, best effort possibly requiring more nodes then as the keys
might be on different services. Actually I tend do like features like. Once
something comes on the downswing of  "software h

Re: State of triggers

2017-03-04 Thread Edward Capriolo

On Saturday, March 4, 2017, Edward Capriolo <edlinuxg...@gmail.com> wrote:

>
>
> On Fri, Mar 3, 2017 at 12:04 PM, Jeff Jirsa <jji...@gmail.com
> <javascript:_e(%7B%7D,'cvml','jji...@gmail.com');>> wrote:
>
>> On Fri, Mar 3, 2017 at 5:40 AM, Edward Capriolo <edlinuxg...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','edlinuxg...@gmail.com');>>
>> wrote:
>>
>> >
>> > I used them. I built do it yourself secondary indexes with them. They
>> have
>> > there gotchas, but so do all the secondary index implementations. Just
>> > because datastax does not write about something. Lets see like 5 years
>> ago
>> > there was this: https://github.com/hmsonline/cassandra-triggers
>> >
>> >
>> Still in use? How'd it work? Production ready? Would you still do it that
>> way in 2017?
>>
>>
>> > There is a fairly large divergence to what actual users do and what
>> other
>> > groups 'say' actual users do in some cases.
>> >
>>
>> A lot of people don't share what they're doing (for business reasons, or
>> because they don't think it's important, or because they don't know
>> how/where), and that's fine but it makes it hard for anyone to know what
>> features are used, or how well they're really working in production.
>>
>> I've seen a handful of "how do we use triggers" questions in IRC, and they
>> weren't unreasonable questions, but seemed like a lot of pain, and more
>> than one of those people ultimately came back and said they used some
>> other
>> mechanism (and of course, some of them silently disappear, so we have no
>> idea if it worked or not).
>>
>> If anyone's actively using triggers, please don't keep it a secret.
>> Knowing
>> that they're being used would be a great way to justify continuing to
>> maintain them.
>>
>> - Jeff
>>
>
> "Still in use? How'd it work? Production ready? Would you still do it that
> way in 2017?"
>
> I mean that is a loaded question. How long has cassandra had Secondary
> Indexes? Did they work well? Would you use them? How many times were they
> re-written?
>
>
>
The state if triggers imho was more about the long standing opinion that
users should not be able to inject code into cassandra.

That stance reversed and people could inject code, eventually all the
walls: sandboxes, mandate on copying a jar to every server toppled.

In the mix the secondary index implementations (that read before write (and
maybe still do)) were pitches as the supported way to do it correctly.

To be fair i would probably do this in an application server infront of c
unless the trigger had to genenerate n in the hundreds or thousands of
events.


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: State of triggers

2017-03-04 Thread Edward Capriolo

On Fri, Mar 3, 2017 at 12:04 PM, Jeff Jirsa <jji...@gmail.com> wrote:

> On Fri, Mar 3, 2017 at 5:40 AM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
> >
> > I used them. I built do it yourself secondary indexes with them. They
> have
> > there gotchas, but so do all the secondary index implementations. Just
> > because datastax does not write about something. Lets see like 5 years
> ago
> > there was this: https://github.com/hmsonline/cassandra-triggers
> >
> >
> Still in use? How'd it work? Production ready? Would you still do it that
> way in 2017?
>
>
> > There is a fairly large divergence to what actual users do and what other
> > groups 'say' actual users do in some cases.
> >
>
> A lot of people don't share what they're doing (for business reasons, or
> because they don't think it's important, or because they don't know
> how/where), and that's fine but it makes it hard for anyone to know what
> features are used, or how well they're really working in production.
>
> I've seen a handful of "how do we use triggers" questions in IRC, and they
> weren't unreasonable questions, but seemed like a lot of pain, and more
> than one of those people ultimately came back and said they used some other
> mechanism (and of course, some of them silently disappear, so we have no
> idea if it worked or not).
>
> If anyone's actively using triggers, please don't keep it a secret. Knowing
> that they're being used would be a great way to justify continuing to
> maintain them.
>
> - Jeff
>

"Still in use? How'd it work? Production ready? Would you still do it that
way in 2017?"

I mean that is a loaded question. How long has cassandra had Secondary
Indexes? Did they work well? Would you use them? How many times were they
re-written?

Pluggable throttling of read and write queries

2017-02-20 Thread Edward Capriolo

Older versions had a request scheduler api.

On Monday, February 20, 2017, Ben Slater > wrote:

> We’ve actually had several customers where we’ve done the opposite - split
> large clusters apart to separate uses cases. We found that this allowed us
> to better align hardware with use case requirements (for example using AWS
> c3.2xlarge for very hot data at low latency, m4.xlarge for more general
> purpose data) we can also tune JVM settings, etc to meet those uses cases.
>
> Cheers
> Ben
>
> On Mon, 20 Feb 2017 at 22:21 Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Sat, Feb 18, 2017 at 3:12 AM, Abhishek Verma  wrote:
>>
>>> Cassandra is being used on a large scale at Uber. We usually create
>>> dedicated clusters for each of our internal use cases, however that is
>>> difficult to scale and manage.
>>>
>>> We are investigating the approach of using a single shared cluster with
>>> 100s of nodes and handle 10s to 100s of different use cases for different
>>> products in the same cluster. We can define different keyspaces for each of
>>> them, but that does not help in case of noisy neighbors.
>>>
>>> Does anybody in the community have similar large shared clusters and/or
>>> face noisy neighbor issues?
>>>
>>
>> Hi,
>>
>> We've never tried this approach and given my limited experience I would
>> find this a terrible idea from the perspective of maintenance (remember the
>> old saying about basket and eggs?)
>>
>> What potential benefits do you see?
>>
>> Regards,
>> --
>> Alex
>>
>> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798
>


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: New committers announcement

2017-02-15 Thread Edward Capriolo

Three cheers!
Hip , Hip, NotFound
1 ms later
Hip, Hip, Hooray
1 ms later
Hooray, Hooray, Hooray

On Tue, Feb 14, 2017 at 5:50 PM, Ben Bromhead  wrote:

> Congrats!!
>
> On Tue, 14 Feb 2017 at 13:37 Joaquin Casares 
> wrote:
>
> > Congratulations!
> >
> > +1 John's sentiments. That's a great list of new committers! :)
> >
> > Joaquin Casares
> > Consultant
> > Austin, TX
> >
> > Apache Cassandra Consulting
> > http://www.thelastpickle.com
> >
> > On Tue, Feb 14, 2017 at 3:34 PM, Jonathan Haddad 
> > wrote:
> >
> > > Congratulations! Definitely a lot of great contributions from everyone
> on
> > > the list.
> > > On Tue, Feb 14, 2017 at 1:31 PM Jason Brown 
> > wrote:
> > >
> > > > Hello all,
> > > >
> > > > It's raining new committers here in Apache Cassandra!  I'd like to
> > > announce
> > > > the following individuals are now committers for the project:
> > > >
> > > > Branimir Lambov
> > > > Paulo Motta
> > > > Stefan Pokowinski
> > > > Ariel Weisberg
> > > > Blake Eggleston
> > > > Alex Petrov
> > > > Joel Knighton
> > > >
> > > > Congratulations all! Please keep the excellent contributions coming.
> > > >
> > > > Thanks,
> > > >
> > > > -Jason Brown
> > > >
> > >
> >
> --
> Ben Bromhead
> CTO | Instaclustr 
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>

Re: If reading from materialized view with a consistency level of quorum am I guaranteed to have the most recent view?

2017-02-12 Thread Edward Capriolo

On Sat, Feb 11, 2017 at 3:03 AM, Benjamin Roth 
wrote:

> For MVs regarding this threads question only the partition key matters.
> Different primary keys can have the same partition key. Which is the case
> in the example in your last comment.
>
> Am 10.02.2017 20:26 schrieb "Kant Kodali" :
>
> @Benjamin Roth: How do you say something is a different PRIMARY KEY now?
> looks like you are saying
>
> The below is same partition key and same primary key?
>
> PRIMARY KEY ((a, b), c, d) and
> PRIMARY KEY ((a, b), d, c)
>
> @Russell Great to see you here! As always that is spot on!
>
> On Fri, Feb 10, 2017 at 11:13 AM, Benjamin Roth 
> wrote:
>
> > Thanks a lot for that post. If I read the code right, then there is one
> > case missing in your post.
> > According to StorageProxy.mutateMV, local updates are NOT put into a
> batch
> > and are instantly applied locally. So a batch is only created if remote
> > mutations have to be applied and only for those mutations.
> >
> > 2017-02-10 19:58 GMT+01:00 DuyHai Doan :
> >
> > > See my blog post to understand how MV is implemented:
> > > http://www.doanduyhai.com/blog/?p=1930
> > >
> > > On Fri, Feb 10, 2017 at 7:48 PM, Benjamin Roth <
> benjamin.r...@jaumo.com>
> > > wrote:
> > >
> > > > Same partition key:
> > > >
> > > > PRIMARY KEY ((a, b), c, d) and
> > > > PRIMARY KEY ((a, b), d, c)
> > > >
> > > > PRIMARY KEY ((a), b, c) and
> > > > PRIMARY KEY ((a), c, b)
> > > >
> > > > Different partition key:
> > > >
> > > > PRIMARY KEY ((a, b), c, d) and
> > > > PRIMARY KEY ((a), b, d, c)
> > > >
> > > > PRIMARY KEY ((a), b) and
> > > > PRIMARY KEY ((b), a)
> > > >
> > > >
> > > > 2017-02-10 19:46 GMT+01:00 Kant Kodali :
> > > >
> > > > > Okies now I understand what you mean by "same" partition key.  I
> > think
> > > > you
> > > > > are saying
> > > > >
> > > > > PRIMARY KEY(col1, col2, col3) == PRIMARY KEY(col2, col1, col3) //
> so
> > > far
> > > > I
> > > > > assumed they are different partition keys.
> > > > >
> > > > > On Fri, Feb 10, 2017 at 10:36 AM, Benjamin Roth <
> > > benjamin.r...@jaumo.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > There are use cases where the partition key is the same. For
> > example
> > > if
> > > > > you
> > > > > > need a sorting within a partition or a filtering different from
> the
> > > > > > original clustering keys.
> > > > > > We actually use this for some MVs.
> > > > > >
> > > > > > If you want "dumb" denormalization with simple append only cases
> > (or
> > > > more
> > > > > > general cases that don't require a read before write on update)
> you
> > > are
> > > > > > maybe better off with batched denormalized atomics writes.
> > > > > >
> > > > > > The main benefit of MVs is if you need denormalization to sort or
> > > > filter
> > > > > by
> > > > > > a non-primary key field.
> > > > > >
> > > > > > 2017-02-10 19:31 GMT+01:00 Kant Kodali :
> > > > > >
> > > > > > > yes thanks for the clarification.  But why would I ever have MV
> > > with
> > > > > the
> > > > > > > same partition key? if it is the same partition key I could
> just
> > > read
> > > > > > from
> > > > > > > the base table right? our MV Partition key contains the columns
> > > from
> > > > > the
> > > > > > > base table partition key but in a different order plus an
> > > additional
> > > > > > column
> > > > > > > (which is allowed as of today)
> > > > > > >
> > > > > > > On Fri, Feb 10, 2017 at 10:23 AM, Benjamin Roth <
> > > > > benjamin.r...@jaumo.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > It depends on your model.
> > > > > > > > If the base table + MV have the same partition key, then the
> MV
> > > > > > mutations
> > > > > > > > are applied synchronously, so they are written as soon the
> > write
> > > > > > request
> > > > > > > > returns.
> > > > > > > > => In this case you can rely on the R+F > RF
> > > > > > > >
> > > > > > > > If the partition key of the MV is different, the partition of
> > the
> > > > MV
> > > > > is
> > > > > > > > probably placed on a different host (or said differently it
> > > cannot
> > > > be
> > > > > > > > guaranteed that it is on the same host). In this case, the MV
> > > > updates
> > > > > > are
> > > > > > > > executed async in a logged batch. So it can be guaranteed
> they
> > > will
> > > > > be
> > > > > > > > applied eventually but not at the time the write request
> > returns.
> > > > > > > > => You cannot rely and there is no possibility to absolutely
> > > > > guarantee
> > > > > > > > anything, not matter what CL you choose. A MV update may
> always
> > > > > "arrive
> > > > > > > > late". I guess it has been implemented like this to not block
> > in
> > > > case
> > > > > > of
> > > > > > > > remote request to prefer the cluster sanity over consistency.
> > > > > > > >
> > > > > > > > Is it now 100% clear?
> > > > > > > >
> > > > > > > > 2017-02-10

Re: If reading from materialized view with a consistency level of quorum am I guaranteed to have the most recent view?

2017-02-11 Thread Edward Capriolo

If you want to test the scenarios thia project would be helpful
Http://github.com/edwardcapriolo/ec

I use brute force at different CL and assert if i detect and consistency
issues. Having mvs would be nice


On Saturday, February 11, 2017, Benjamin Roth 
wrote:

> For MVs regarding this threads question only the partition key matters.
> Different primary keys can have the same partition key. Which is the case
> in the example in your last comment.
>
> Am 10.02.2017 20:26 schrieb "Kant Kodali"  >:
>
> @Benjamin Roth: How do you say something is a different PRIMARY KEY now?
> looks like you are saying
>
> The below is same partition key and same primary key?
>
> PRIMARY KEY ((a, b), c, d) and
> PRIMARY KEY ((a, b), d, c)
>
> @Russell Great to see you here! As always that is spot on!
>
> On Fri, Feb 10, 2017 at 11:13 AM, Benjamin Roth  >
> wrote:
>
> > Thanks a lot for that post. If I read the code right, then there is one
> > case missing in your post.
> > According to StorageProxy.mutateMV, local updates are NOT put into a
> batch
> > and are instantly applied locally. So a batch is only created if remote
> > mutations have to be applied and only for those mutations.
> >
> > 2017-02-10 19:58 GMT+01:00 DuyHai Doan  >:
> >
> > > See my blog post to understand how MV is implemented:
> > > http://www.doanduyhai.com/blog/?p=1930
> > >
> > > On Fri, Feb 10, 2017 at 7:48 PM, Benjamin Roth <
> benjamin.r...@jaumo.com >
> > > wrote:
> > >
> > > > Same partition key:
> > > >
> > > > PRIMARY KEY ((a, b), c, d) and
> > > > PRIMARY KEY ((a, b), d, c)
> > > >
> > > > PRIMARY KEY ((a), b, c) and
> > > > PRIMARY KEY ((a), c, b)
> > > >
> > > > Different partition key:
> > > >
> > > > PRIMARY KEY ((a, b), c, d) and
> > > > PRIMARY KEY ((a), b, d, c)
> > > >
> > > > PRIMARY KEY ((a), b) and
> > > > PRIMARY KEY ((b), a)
> > > >
> > > >
> > > > 2017-02-10 19:46 GMT+01:00 Kant Kodali  >:
> > > >
> > > > > Okies now I understand what you mean by "same" partition key.  I
> > think
> > > > you
> > > > > are saying
> > > > >
> > > > > PRIMARY KEY(col1, col2, col3) == PRIMARY KEY(col2, col1, col3) //
> so
> > > far
> > > > I
> > > > > assumed they are different partition keys.
> > > > >
> > > > > On Fri, Feb 10, 2017 at 10:36 AM, Benjamin Roth <
> > > benjamin.r...@jaumo.com 
> > > > >
> > > > > wrote:
> > > > >
> > > > > > There are use cases where the partition key is the same. For
> > example
> > > if
> > > > > you
> > > > > > need a sorting within a partition or a filtering different from
> the
> > > > > > original clustering keys.
> > > > > > We actually use this for some MVs.
> > > > > >
> > > > > > If you want "dumb" denormalization with simple append only cases
> > (or
> > > > more
> > > > > > general cases that don't require a read before write on update)
> you
> > > are
> > > > > > maybe better off with batched denormalized atomics writes.
> > > > > >
> > > > > > The main benefit of MVs is if you need denormalization to sort or
> > > > filter
> > > > > by
> > > > > > a non-primary key field.
> > > > > >
> > > > > > 2017-02-10 19:31 GMT+01:00 Kant Kodali  >:
> > > > > >
> > > > > > > yes thanks for the clarification.  But why would I ever have MV
> > > with
> > > > > the
> > > > > > > same partition key? if it is the same partition key I could
> just
> > > read
> > > > > > from
> > > > > > > the base table right? our MV Partition key contains the columns
> > > from
> > > > > the
> > > > > > > base table partition key but in a different order plus an
> > > additional
> > > > > > column
> > > > > > > (which is allowed as of today)
> > > > > > >
> > > > > > > On Fri, Feb 10, 2017 at 10:23 AM, Benjamin Roth <
> > > > > benjamin.r...@jaumo.com 
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > It depends on your model.
> > > > > > > > If the base table + MV have the same partition key, then the
> MV
> > > > > > mutations
> > > > > > > > are applied synchronously, so they are written as soon the
> > write
> > > > > > request
> > > > > > > > returns.
> > > > > > > > => In this case you can rely on the R+F > RF
> > > > > > > >
> > > > > > > > If the partition key of the MV is different, the partition of
> > the
> > > > MV
> > > > > is
> > > > > > > > probably placed on a different host (or said differently it
> > > cannot
> > > > be
> > > > > > > > guaranteed that it is on the same host). In this case, the MV
> > > > updates
> > > > > > are
> > > > > > > > executed async in a logged batch. So it can be guaranteed
> they
> > > will
> > > > > be
> > > > > > > > applied eventually but not at the time the write request
> > returns.
> > > > > > > > => You cannot rely and there is no possibility to absolutely
> > > > > guarantee
> > > > > > > > anything, not matter what CL you

Re: Why does CockroachDB github website say Cassandra has no Availability on datacenter failure?

2017-02-07 Thread Edward Capriolo

On Tue, Feb 7, 2017 at 8:12 AM, Kant Kodali  wrote:

> yes agreed with this response
>
> On Tue, Feb 7, 2017 at 5:07 AM, James Carman 
> wrote:
>
> > I think folks might agree that it's not worth the time to worry about
> what
> > they say.  The ASF isn't a commercial entity, so we don't worry about
> > market share or anything.  Sure, it's not cool for folks to say
> misleading
> > or downright false statements about Cassandra, but we can't police the
> > internet.  We would be better served focusing on what we can control,
> which
> > is Cassandra, making it the best NoSQL database it can be.  Perhaps you
> > should write a blog post showing Cassandra survive a failure and we can
> > link to it from the Cassandra site.
> >
> > Now, this doesn't apply to trademarks, as the PMC is responsible for
> > "defending" its marks.
> >
> >
> >
> > On Tue, Feb 7, 2017 at 7:59 AM Kant Kodali  wrote:
> >
> > > @James I don't see how people can agree to it if they know Cassandra or
> > > even better Distributed systems reasonably well
> > >
> > > On Tue, Feb 7, 2017 at 4:54 AM, Bernardo Sanchez <
> > > bernard...@pointclickcare.com> wrote:
> > >
> > > > same. yra
> > > >
> > > > Sent from my BlackBerry - the most secure mobile device - via the
> Bell
> > > > Network
> > > > From: benjamin.r...@jaumo.com
> > > > Sent: February 7, 2017 7:51 AM
> > > > To: dev@cassandra.apache.org
> > > > Reply-to: dev@cassandra.apache.org
> > > > Subject: Re: Why does CockroachDB github website say Cassandra has no
> > > > Availability on datacenter failure?
> > > >
> > > >
> > > > Btw this isn't the Bronx either. It's not incorrect to be polite.
> > > >
> > > > Am 07.02.2017 13:45 schrieb "Bernardo Sanchez" <
> > > > bernard...@pointclickcare.com>:
> > > >
> > > > > guys this isn't twitter. stop your stupid posts
> > > > >
> > > > > From: benjamin.le...@datastax.com
> > > > > Sent: February 7, 2017 7:43 AM
> > > > > To: dev@cassandra.apache.org
> > > > > Reply-to: dev@cassandra.apache.org
> > > > > Subject: Re: Why does CockroachDB github website say Cassandra has
> no
> > > > > Availability on datacenter failure?
> > > > >
> > > > >
> > > > > Do not get angry for that. It does not worth it. :-)
> > > > >
> > > > > On Tue, Feb 7, 2017 at 1:11 PM, Kant Kodali 
> > wrote:
> > > > >
> > > > > > lol. But seriously are they even allowed to say something that is
> > not
> > > > > true
> > > > > > about another product ?
> > > > > >
> > > > > > On Tue, Feb 7, 2017 at 4:05 AM, kurt greaves <
> k...@instaclustr.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Marketing never lies. Ever
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Join their mailing list,
Tony Cassandra: "Your database is a piece of ..."
Cockroach ML: "What are your talking about"
Tony Cassandra  "You know what Im talking about you  cockroach"
::grabs chaos monkey and points it at their cluster::

Re: [RELEASE] Apache Cassandra 3.10 released

2017-02-03 Thread Edward Capriolo

On Fri, Feb 3, 2017 at 6:52 PM, Michael Shuler 
wrote:

> The Cassandra team is pleased to announce the release of Apache
> Cassandra version 3.10.
>
> Apache Cassandra is a fully distributed database. It is the right choice
> when you need scalability and high availability without compromising
> performance.
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a new feature and bug fix release[1] on the 3.X series.
> As always, please pay attention to the release notes[2] and Let us
> know[3] if you were to encounter any problem.
>
> This is the last tick-tock feature release of Apache Cassandra. Version
> 3.11.0 will continue bug fixes from this point on the cassandra-3.11
> branch in git.
>
> Enjoy!
>
> [1]: (CHANGES.txt) https://goo.gl/J0VghF
> [2]: (NEWS.txt) https://goo.gl/00KNVW
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>
>
Great job all on this release.

Re: Rollback procedure for Cassandra Upgrade.

2017-01-10 Thread Edward Capriolo

On Tuesday, January 10, 2017, Romain Hardouin 
wrote:

> To be able to downgrade we should be able to pin both commitlog and
> sstables versions, e.g. -Dcassandra.commitlog_version=3
> -Dcassandra.sstable_version=jb
> That would be awesome because it would decorrelate binaries version and
> data version. Upgrades would be much less risky so I guess that adoption of
> new C* versions would increase.
> Best,
> Romain
>
> Le Mardi 10 janvier 2017 6h03, Brandon Williams  > a écrit :
>
>
>  However, it's good to determine *how* it failed.  If nodetool just died or
> timed out, that's no big deal, it'll finish.
>
> On Mon, Jan 9, 2017 at 11:00 PM, Jonathan Haddad  > wrote:
>
> > There's no downgrade procedure. You either upgrade or you go back to a
> > snapshot from the previous version.
> > On Mon, Jan 9, 2017 at 8:13 PM Prakash Chauhan <
> > prakash.chau...@ericsson.com >
> > wrote:
> >
> > > Hi All ,
> > >
> > > Do we have an official procedure to rollback the upgrade of C* from
> 2.0.x
> > > to 2.1.x ?
> > >
> > >
> > > Description:
> > > I have upgraded C* from 2.0.x to 2.1.x . As a part of upgrade
> procedure ,
> > > I have to run nodetool upgradesstables .
> > > What if the command fails in the middle ? Some of the sstables will be
> in
> > > newer format (*-ka-*) where as other might be in older format(*-jb-*).
> > >
> > > Do we have a standard procedure to do rollback in such cases?
> > >
> > >
> > >
> > > Regards,
> > > Prakash Chauhan.
> > >
> > >
> >
>
>
>


It would be amazing if the version could output commitlog and sstables at a
specific version so roll backs are possible.


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Committer access to CassCI

2016-12-06 Thread Edward Capriolo

I will take this up at the next NYC-cassandra meetup. I have been on the
fence for "charging" for events for a while, but a nice donation piece
would be pretty cool if it can fuel the project.

I have also joked about creating CaSETI (Search for Extra Testing
Infrastructure) and building a docker that would phone home for testing
work, that we all can run on our workstations and xboxes.

On Tue, Dec 6, 2016 at 10:47 AM, Michael Shuler 
wrote:

> (Sent to private@ a couple weeks ago)
>
> We are currently working on configuring newly donated ASF recommended
> compute resources to the ASF Jenkins environment and will be
> transferring unit and dtests over there once the infrastructure is
> running jobs successfully.
>
> We are receiving requests for new committers to receive access to
> CassCI, and in the interim can do so, but please note that we are near
> capacity and jobs are starting to back up there. We are also offering
> new committers assistance in setting up a local environment which may
> give them a faster turnaround time in terms of test results. It is our
> preference to not create new CassCI accounts and to spend our efforts
> and contributions on improving running on ASF infrastructure.
>
>
> (Added notes 12/06)
>
> At this time, JIRA ticket reviewers may need to set up dev branch jobs,
> if patch submitters do not currently have their forks set up on CassCI.
> The goal is to eventually migrate off of CassCI, utilizing ASF Jenkins
> for main branch jobs, which is nearly complete. If compute resources
> materialize for dev branches to run on ASF Jenkins, that's great,
> otherwise, I've set up a model of how to set up Jenkins in-house to run
> jobs.
>
> The ASF Jenkins jobs are configured via Job DSL directly from the
> cassandra-builds git repository:
>
> https://git-wip-us.apache.org/repos/asf?p=cassandra-builds.git;a=summary
>
> There are very limited Jenkins plugins installed on the ASF Jenkins, so
> a base Jenkins install with the Job DSL plugin added should get other
> Jenkins admins up and running pretty quickly. Modifications for running
> Jenkins on a user's repo of Apache Cassandra and custom branches should
> be relatively straightforward, but feel free to ask for help.
>
> With 5 dedicated ASF Jenkins slaves for Apache Cassandra, we currently
> cannot support developer branches on the ASF Jenkins infrastructure - we
> would queue jobs for days/weeks waiting to run. If there are community
> members that have a desire to donate compute resources to ASF Jenkins to
> add testing capacity, here's some background and the related INFRA
> tickets as we started testing on ASF and adding/troubleshooting the
> initial 5 servers:
>
> https://issues.apache.org/jira/browse/INFRA-12366
> https://issues.apache.org/jira/browse/INFRA-12823
> https://issues.apache.org/jira/browse/INFRA-12897
> https://issues.apache.org/jira/browse/INFRA-12943
> https://issues.apache.org/jira/browse/INFRA-13018
>
> --
> Kind regards,
> Michael
>

Re: Failed Dtest will block cutting releases

2016-12-03 Thread Edward Capriolo

I think it is fair to run a flakey test again. If it is determine it flaked
out due to a conflict with another test or something ephemeral in a long
process it is not worth blocking a release.

Just deleting it is probably not a good path.

I actually enjoy writing fixing, tweeking, tests so pinge offline or
whatever.

On Saturday, December 3, 2016, Benjamin Roth 
wrote:

> Excuse me if I jump into an old thread, but from my experience, I have a
> very clear opinion about situations like that as I encountered them before:
>
> Tests are there to give *certainty*.
> *Would you like to pass a crossing with a green light if you cannot be sure
> if green really means green?*
> Do you want to rely on tests that are green, red, green, red? What if a red
> is a real red and you missed it because you simply ignore it because it's
> flaky?
>
> IMHO there are only 3 options how to deal with broken/red tests:
> - Fix the underlying issue
> - Fix the test
> - Delete the test
>
> If I cannot trust a test, it is better not to have it at all. Otherwise
> people are staring at red lights and start to drive.
>
> This causes:
> - Uncertainty
> - Loss of trust
> - Confusion
> - More work
> - *Less quality*
>
> Just as an example:
> Few days ago I created a patch. Then I ran the utest and 1 test failed.
> Hmmm, did I break it? I had to check it twice by checking out the former
> state, running the tests again just to recognize that it wasn't me who made
> it fail. That's annoying.
>
> Sorry again, I'm rather new here but what I just read reminded me much of
> situations I have been in years ago.
> So: +1, John
>
> 2016-12-03 7:48 GMT+01:00 sankalp kohli  >:
>
> > Hi,
> > I dont see any any update on this thread. We will go ahead and make
> > Dtest a blocker for cutting releasing for anything after 3.10.
> >
> > Please respond if anyone has an objection to this.
> >
> > Thanks,
> > Sankalp
> >
> >
> >
> > On Mon, Nov 21, 2016 at 11:57 AM, Josh McKenzie  >
> > wrote:
> >
> > > Caveat: I'm strongly in favor of us blocking a release on a non-green
> > test
> > > board of either utest or dtest.
> > >
> > >
> > > > put something in prod which is known to be broken in obvious ways
> > >
> > > In my experience the majority of fixes are actually shoring up
> > low-quality
> > > / flaky tests or fixing tests that have been invalidated by a commit
> but
> > do
> > > not indicate an underlying bug. Inferring "tests are failing so we know
> > > we're asking people to put things in prod that are broken in obvious
> > ways"
> > > is hyperbolic. A more correct statement would be: "Tests are failing so
> > we
> > > know we're shipping with a test that's failing" which is not helpful.
> > >
> > > Our signal to noise ratio with tests has been very poor historically;
> > we've
> > > been trying to address that through aggressive triage and assigning out
> > > test failures however we need far more active and widespread community
> > > involvement if we want to truly *fix* this problem long-term.
> > >
> > > On Mon, Nov 21, 2016 at 2:33 PM, Jonathan Haddad  >
> > > wrote:
> > >
> > > > +1.  Kind of silly to put advise people to put something in prod
> which
> > is
> > > > known to be broken in obvious ways
> > > >
> > > > On Mon, Nov 21, 2016 at 11:31 AM sankalp kohli <
> kohlisank...@gmail.com 
> > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > > We should not cut a releases if Dtest are not passing. I won't
> > > block
> > > > > 3.10 on this since we are just discussing this.
> > > > >
> > > > > Please provide feedback on this.
> > > > >
> > > > > Thanks,
> > > > > Sankalp
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Edward Capriolo

I would say start with a mindset like 'people will run this in production'
not like 'why would you expect this to work'.

Now how does this logic effect feature develement? Maybe use gossip 2.0 as
an example.

I will play my given debby downer role. I could imagine 1 or 2 dtests and
the logic of 'dont expect it to work' unleash 4.0 onto hords of nubes with
twitter announce of the release let bugs trickle in.

One could also do something comprehensive like test on clusters of 2 to
1000 nodes. Test with jepsen to see what happens during partitions, inject
things like jvm pauses and account for behaivor. Log convergence times
after given events.

Take a stand and say look "we engineered and beat the crap out of this
feature. I deployed this release feature at my company and eat my dogfood.
You are not my crash test dummy."


On Saturday, November 19, 2016, Jeff Jirsa <jji...@gmail.com> wrote:

> Any proposal to solve the problem you describe?
>
> --
> Jeff Jirsa
>
>
> > On Nov 19, 2016, at 8:50 AM, Edward Capriolo <edlinuxg...@gmail.com
> <javascript:;>> wrote:
> >
> > This is especially relevant if people wish to focus on removing things.
> >
> > For example, gossip 2.0 sounds great, but seems geared toward huge
> clusters
> > which is not likely a majority of users. For those with a 20 node cluster
> > are the indirect benefits woth it?
> >
> > Also there seems to be a first push to remove things like compact storage
> > or thrift. Fine great. But what is the realistic update path for someone.
> > If the big players are running 2.1 and maintaining backports, the average
> > shop without a dedicated team is going to be stuck saying (great features
> > in 4.0 that improve performance, i would probably switch but its not
> stable
> > and we have that one compact storage cf and who knows what is going to
> > happen performance wise when)
> >
> > We really need to lose this realease wont be stable for 6 minor versions
> > concept.
> >
> > On Saturday, November 19, 2016, Edward Capriolo <edlinuxg...@gmail.com
> <javascript:;>>
> > wrote:
> >
> >>
> >>
> >> On Friday, November 18, 2016, Jeff Jirsa <jeff.ji...@crowdstrike.com
> <javascript:;>
> >> <javascript:_e(%7B%7D,'cvml','jeff.ji...@crowdstrike.com 
> >> <javascript:;>');>>
> wrote:
> >>
> >>> We should assume that we’re ditching tick/tock. I’ll post a thread on
> >>> 4.0-and-beyond here in a few minutes.
> >>>
> >>> The advantage of a prod release every 6 months is fewer incentive to
> push
> >>> unfinished work into a release.
> >>> The disadvantage of a prod release every 6 months is then we either
> have
> >>> a very short lifespan per-release, or we have to maintain lots of
> active
> >>> releases.
> >>>
> >>> 2.1 has been out for over 2 years, and a lot of people (including us)
> are
> >>> running it in prod – if we have a release every 6 months, that means
> we’d
> >>> be supporting 4+ releases at a time, just to keep parity with what we
> have
> >>> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+
> year
> >>> old branches.
> >>>
> >>>
> >>> On 11/18/16, 3:10 PM, "beggles...@apple.com <javascript:;> on behalf
> of Blake
> >>> Eggleston" <beggles...@apple.com <javascript:;>> wrote:
> >>>
> >>>>> While stability is important if we push back large "core" changes
> >>> until later we're just setting ourselves up to face the same issues
> later on
> >>>>
> >>>> In theory, yes. In practice, when incomplete features are earmarked
> for
> >>> a certain release, those features are often rushed out, and not always
> >>> fully baked.
> >>>>
> >>>> In any case, I don’t think it makes sense to spend too much time
> >>> planning what goes into 4.0, and what goes into the next major release
> with
> >>> so many release strategy related decisions still up in the air. Are we
> >>> going to ditch tick-tock? If so, what will it’s replacement look like?
> >>> Specifically, when will the next “production” release happen? Without
> >>> knowing that, it's hard to say if something should go in 4.0, or 4.5,
> or
> >>> 5.0, or whatever.
> >>>>
> >>>> The reason I suggested a production release every 6 months is because
> >>> (in my mind) it’s frequen

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Edward Capriolo

This is especially relevant if people wish to focus on removing things.

For example, gossip 2.0 sounds great, but seems geared toward huge clusters
which is not likely a majority of users. For those with a 20 node cluster
are the indirect benefits woth it?

Also there seems to be a first push to remove things like compact storage
or thrift. Fine great. But what is the realistic update path for someone.
If the big players are running 2.1 and maintaining backports, the average
shop without a dedicated team is going to be stuck saying (great features
in 4.0 that improve performance, i would probably switch but its not stable
and we have that one compact storage cf and who knows what is going to
happen performance wise when)

We really need to lose this realease wont be stable for 6 minor versions
concept.

On Saturday, November 19, 2016, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

>
>
> On Friday, November 18, 2016, Jeff Jirsa <jeff.ji...@crowdstrike.com
> <javascript:_e(%7B%7D,'cvml','jeff.ji...@crowdstrike.com');>> wrote:
>
>> We should assume that we’re ditching tick/tock. I’ll post a thread on
>> 4.0-and-beyond here in a few minutes.
>>
>> The advantage of a prod release every 6 months is fewer incentive to push
>> unfinished work into a release.
>> The disadvantage of a prod release every 6 months is then we either have
>> a very short lifespan per-release, or we have to maintain lots of active
>> releases.
>>
>> 2.1 has been out for over 2 years, and a lot of people (including us) are
>> running it in prod – if we have a release every 6 months, that means we’d
>> be supporting 4+ releases at a time, just to keep parity with what we have
>> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+ year
>> old branches.
>>
>>
>> On 11/18/16, 3:10 PM, "beggles...@apple.com on behalf of Blake
>> Eggleston" <beggles...@apple.com> wrote:
>>
>> >> While stability is important if we push back large "core" changes
>> until later we're just setting ourselves up to face the same issues later on
>> >
>> >In theory, yes. In practice, when incomplete features are earmarked for
>> a certain release, those features are often rushed out, and not always
>> fully baked.
>> >
>> >In any case, I don’t think it makes sense to spend too much time
>> planning what goes into 4.0, and what goes into the next major release with
>> so many release strategy related decisions still up in the air. Are we
>> going to ditch tick-tock? If so, what will it’s replacement look like?
>> Specifically, when will the next “production” release happen? Without
>> knowing that, it's hard to say if something should go in 4.0, or 4.5, or
>> 5.0, or whatever.
>> >
>> >The reason I suggested a production release every 6 months is because
>> (in my mind) it’s frequent enough that people won’t be tempted to rush
>> features to hit a given release, but not so frequent that it’s not
>> practical to support. It wouldn’t be the end of the world if some of these
>> tickets didn’t make it into 4.0, because 4.5 would fine.
>> >
>> >On November 18, 2016 at 1:57:21 PM, kurt Greaves (k...@instaclustr.com)
>> wrote:
>> >
>> >On 18 November 2016 at 18:25, Jason Brown <jasedbr...@gmail.com> wrote:
>> >
>> >> #11559 (enhanced node representation) - decided it's *not* something we
>> >> need wrt #7544 storage port configurable per node, so we are punting on
>> >>
>> >
>> >#12344 - Forward writes to replacement node with same address during
>> replace
>> >depends on #11559. To be honest I'd say #12344 is pretty important,
>> >otherwise it makes it difficult to replace nodes without potentially
>> >requiring client code/configuration changes. It would be nice to get
>> #12344
>> >in for 4.0. It's marked as an improvement but I'd consider it a bug and
>> >thus think it could be included in a later minor release.
>> >
>> >Introducing all of these in a single release seems pretty risky. I think
>> it
>> >> would be safer to spread these out over a few 4.x releases (as they’re
>> >> finished) and give them time to stabilize before including them in an
>> LTS
>> >> release. The downside would be having to maintain backwards
>> compatibility
>> >> across the 4.x versions, but that seems preferable to delaying the
>> release
>> >> of 4.0 to include these, and having another big bang release.
>> >
>> >
>> >I don't think anyone expects 4.0.0 to b

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Edward Capriolo

On Friday, November 18, 2016, Jeff Jirsa  wrote:

> We should assume that we’re ditching tick/tock. I’ll post a thread on
> 4.0-and-beyond here in a few minutes.
>
> The advantage of a prod release every 6 months is fewer incentive to push
> unfinished work into a release.
> The disadvantage of a prod release every 6 months is then we either have a
> very short lifespan per-release, or we have to maintain lots of active
> releases.
>
> 2.1 has been out for over 2 years, and a lot of people (including us) are
> running it in prod – if we have a release every 6 months, that means we’d
> be supporting 4+ releases at a time, just to keep parity with what we have
> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+ year
> old branches.
>
>
> On 11/18/16, 3:10 PM, "beggles...@apple.com  on behalf of
> Blake Eggleston" > wrote:
>
> >> While stability is important if we push back large "core" changes until
> later we're just setting ourselves up to face the same issues later on
> >
> >In theory, yes. In practice, when incomplete features are earmarked for a
> certain release, those features are often rushed out, and not always fully
> baked.
> >
> >In any case, I don’t think it makes sense to spend too much time planning
> what goes into 4.0, and what goes into the next major release with so many
> release strategy related decisions still up in the air. Are we going to
> ditch tick-tock? If so, what will it’s replacement look like? Specifically,
> when will the next “production” release happen? Without knowing that, it's
> hard to say if something should go in 4.0, or 4.5, or 5.0, or whatever.
> >
> >The reason I suggested a production release every 6 months is because (in
> my mind) it’s frequent enough that people won’t be tempted to rush features
> to hit a given release, but not so frequent that it’s not practical to
> support. It wouldn’t be the end of the world if some of these tickets
> didn’t make it into 4.0, because 4.5 would fine.
> >
> >On November 18, 2016 at 1:57:21 PM, kurt Greaves (k...@instaclustr.com
> ) wrote:
> >
> >On 18 November 2016 at 18:25, Jason Brown  > wrote:
> >
> >> #11559 (enhanced node representation) - decided it's *not* something we
> >> need wrt #7544 storage port configurable per node, so we are punting on
> >>
> >
> >#12344 - Forward writes to replacement node with same address during
> replace
> >depends on #11559. To be honest I'd say #12344 is pretty important,
> >otherwise it makes it difficult to replace nodes without potentially
> >requiring client code/configuration changes. It would be nice to get
> #12344
> >in for 4.0. It's marked as an improvement but I'd consider it a bug and
> >thus think it could be included in a later minor release.
> >
> >Introducing all of these in a single release seems pretty risky. I think
> it
> >> would be safer to spread these out over a few 4.x releases (as they’re
> >> finished) and give them time to stabilize before including them in an
> LTS
> >> release. The downside would be having to maintain backwards
> compatibility
> >> across the 4.x versions, but that seems preferable to delaying the
> release
> >> of 4.0 to include these, and having another big bang release.
> >
> >
> >I don't think anyone expects 4.0.0 to be stable. It's a major version
> >change with lots of new features; in the production world people don't
> >normally move to a new major version until it has been out for quite some
> >time and several minor releases have passed. Really, most people are only
> >migrating to 3.0.x now. While stability is important if we push back large
> >"core" changes until later we're just setting ourselves up to face the
> same
> >issues later on. There should be enough uptake on the early releases of
> 4.0
> >from new users to help test and get it to a production-ready state.
> >
> >
> >Kurt Greaves
> >k...@instaclustr.com 
>
>
 I don't think anyone expects 4.0.0 to be stable

Someone previously described 3.0 as the "break everything release".

We know that many people are still 2.1 and 3.0. Cassandra will always be
maintaining 3 or 4 active branches and have adoption issues if releases are
not stable and usable.

Being that cassandra was 1.0 years ago I expect things to be stable. Half
working features , or added this broke that are not appealing to me.



-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Rough roadmap for 4.0

2016-11-18 Thread Edward Capriolo

These tickets claim to duplicate each other:

https://issues.apache.org/jira/browse/CASSANDRA-12674
https://issues.apache.org/jira/browse/CASSANDRA-12746

But one is marked fixed and the other is still open.

What is the status here?

On Thu, Nov 17, 2016 at 5:20 PM, DuyHai Doan  wrote:

> Be very careful, there is a serious bug about AND/OR semantics, not solved
> yet and not going to be solved any soon:
> https://issues.apache.org/jira/browse/CASSANDRA-12674
>
> On Thu, Nov 17, 2016 at 7:32 PM, Jeff Jirsa 
> wrote:
>
> >
> > We’ll be voting in the very near future on timing of major releases and
> > release strategy. 4.0 won’t happen until that vote takes place.
> >
> > But since you asked, I have ONE tick/tock (3.9) cluster being qualified
> > for production because it needs SASI.
> >
> > - Jeff
> >
> > On 11/17/16, 9:59 AM, "Jonathan Haddad"  wrote:
> >
> > >I think it might be worth considering adopting the release strategy
> before
> > >4.0 release.  Are any PMC members putting tick tock in prod? Does anyone
> > >even trust it?  What's the downside of changing the release cycle
> > >independently from 4.0?
> > >
> > >On Thu, Nov 17, 2016 at 9:03 AM Jason Brown 
> wrote:
> > >
> > >Jason,
> > >
> > >That's a separate topic, but we will have a different vote on how the
> > >branching/release strategy should be for the future.
> > >
> > >On Thursday, November 17, 2016, jason zhao yang <
> > zhaoyangsingap...@gmail.com
> > >>
> > >wrote:
> > >
> > >> Hi,
> > >>
> > >> Will we still use tick-tock release for 4.x and 4.0.x ?
> > >>
> > >> Stefan Podkowinski >于2016年11月16日周三
> > >> 下午4:52写道：
> > >>
> > >> > From my understanding, this will also effect EOL dates of other
> > >branches.
> > >> >
> > >> > "We will maintain the 2.2 stability series until 4.0 is released,
> and
> > >3.0
> > >> > for six months after that.".
> > >> >
> > >> >
> > >> > On Wed, Nov 16, 2016 at 5:34 AM, Nate McCall  > >> > wrote:
> > >> >
> > >> > > Agreed. As long as we have a goal I don't see why we have to
> adhere
> > to
> > >> > > arbitrary date for 4.0.
> > >> > >
> > >> > > On Nov 16, 2016 1:45 PM, "Aleksey Yeschenko" <
> alek...@datastax.com
> > >> >
> > >> > wrote:
> > >> > >
> > >> > > > I’ll comment on the broader issue, but right now I want to
> > elaborate
> > >> on
> > >> > > > 3.11/January/arbitrary cutoff date.
> > >> > > >
> > >> > > > Doesn’t matter what the original plan was. We should continue
> with
> > >> 3.X
> > >> > > > until all the 4.0 blockers have been
> > >> > > > committed - and there are quite a few of them remaining yet.
> > >> > > >
> > >> > > > So given all the holidays, and the tickets remaining, I’ll
> > >personally
> > >> > be
> > >> > > > surprised if 4.0 comes out before
> > >> > > > February/March and 3.13/3.14. Nor do I think it’s an issue.
> > >> > > >
> > >> > > > —
> > >> > > > AY
> > >> > > >
> > >> > > > On 16 November 2016 at 00:39:03, Mick Semb Wever (
> > >> > m...@thelastpickle.com 
> > >> > > )
> > >> > > > wrote:
> > >> > > >
> > >> > > > On 4 November 2016 at 13:47, Nate McCall  > >> > wrote:
> > >> > > >
> > >> > > > > Specifically, this should be "new stuff that could/will break
> > >> things"
> > >> > > > > given we are upping
> > >> > > > > the major version.
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > How does this co-ordinate with the tick-tock versioning¹ leading
> > up
> > >> to
> > >> > > the
> > >> > > > 4.0 release?
> > >> > > >
> > >> > > > To just stop tick-tock and then say yeehaa let's jam in all the
> > >> > breaking
> > >> > > > changes we really want seems to be throwing away some of the
> > learnt
> > >> > > wisdom,
> > >> > > > and not doing a very sane transition from tick-tock to
> > >> > > > features/testing/stable². I really hope all this is done in a
> way
> > >> that
> > >> > > > continues us down the path towards a stable-master.
> > >> > > >
> > >> > > > For example, are we fixing the release of 4.0 to November? or
> > >> > continuing
> > >> > > > tick-tocks until we complete the 4.0 roadmap? or starting the
> > >> > > > features/testing/stable branching approach with 3.11?
> > >> > > >
> > >> > > >
> > >> > > > Background:
> > >> > > > ¹) Sylvain wrote in an earlier thread titled "A Home for 4.0"
> > >> > > >
> > >> > > > > And as 4.0 was initially supposed to come after 3.11, which is
> > >> > coming,
> > >> > > > it's probably time to have a home for those tickets.
> > >> > > >
> > >> > > > ²) The new versioning scheme slated for 4.0, per the "Proposal -
> > >> 3.5.1"
> > >> > > > thread
> > >> > > >
> > >> > > > > three branch plan with “features”, “testing”, and “stable”
> > >starting
> > >> > > with
> > >> > > > 4.0?
> > >> > > >
> > >> > > >
> > >> > > > Mick
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
>

Re: DataStax role in Cassandra and the ASF

2016-11-05 Thread Edward Capriolo

lic APIs that person will require support from project
> > members to get this done. If such help will not be given it any outside
> > change will be ever completed and noone will invest time in doing
> something
> > more than fixing typos or common programmer errors which we all do from
> > time to time. Despite of impersonal nature of communications in Internet
> we
> > still do have human interactions and we all have just one chance to make
> > first impression. If we made it wrong at beginning its hard to fix it
> later
> > on.
> > Some decisions made in past by project PMCs lead to situation that
> project
> > was forked and maintained outside ASF (ie. stratio cassandra which
> > eventually ended up as lucene indexes plugin over a year ago), some other
> > did hurt users running cassandra for long time (ie. discontinuation of
> > thrift). Especially second decission was seen by outsiders, who do not
> > desire billion writes per second, as marketing driven. This led to people
> > looking and finding alternatives using compatible interface which might
> be,
> > ironically, even faster (ie. scylladb).
> >
> > And since there was quote battle on twitter between Jim Jagielski and
> > Benedict, I can throw some in as well. Over conferences I attended and
> even
> > during consultancy services I got, I’ve spoken with some people having
> > records of DataStax in their resumes and even them told me "collaboration
> > with them [cassandra team] was hard". Now imagine how outsider will get
> any
> > chance to get any change done with such attitude shown even to own
> > colleagues? Must also note that Tinkerpop is getting better on this field
> > since it has much more generic nature.
> > I don’t think this whole topic is to say that you (meaning DataStax) made
> > wrong job, or you are doing wrong for project but about letting others
> join
> > forces with you to make Cassandra even better. Maybe there is not a lot
> of
> > people currently walking around but once you will welcome and help them
> > working with you on code base you may be sure that others will join
> making
> > your development efforts easier and shared across community.
> >
> > Kind regards,
> > Lukasz
> >
> > > Wiadomość napisana przez Edward Capriolo <edlinuxg...@gmail.com> w
> dniu
> > 4 lis 2016, o godz. 18:55:
> > >
> > > On Thu, Nov 3, 2016 at 11:44 PM, Kelly Sommers <kell.somm...@gmail.com
> >
> > > wrote:
> > >
> > >> I think the community needs some clarification about what's going on.
> > >> There's a really concerning shift going on and the story about why is
> > >> really blurry. I've heard all kinds of wild claims about what's going
> > on.
> > >>
> > >> I've heard people say the ASF is pushing DataStax out because they
> don't
> > >> like how much control they have over Cassandra. I've heard other
> people
> > say
> > >> DataStax and the ASF aren't getting along. I've heard one person who
> has
> > >> pull with a friend in the ASF complained about a feature not getting
> > >> considered (who also didn't go down the correct path of proposing)
> > kicked
> > >> and screamed and started the ball rolling for control change.
> > >>
> > >> I don't know what's going on, and I doubt the truth is in any of
> those,
> > the
> > >> truth is probably somewhere in between. As a former Cassandra MVP and
> > >> builder of some of the larger Cassandra clusters in the last 3 years
> I'm
> > >> concerned.
> > >>
> > >> I've been really happy with Jonathan and DataStax's role in the
> > Cassandra
> > >> community. I think they have done a great job at investing time and
> > money
> > >> towards the good interest in the project. I think it is unavoidable a
> > >> single company bootstraps large projects like this into popularity.
> It's
> > >> those companies investments who give the ability to grow diversity in
> > later
> > >> stages. The committer list in my opinion is the most diverse its ever
> > been,
> > >> hasn't it? Apple is a big player now.
> > >>
> > >> I don't think reducing DataStax's role for the sake of diversity is
> > smart.
> > >> You grow diversity by opening up new opportunities for others. Grow
> the
> > >> committer list perhaps. Mentor new people to join that list. You don't
> > kick
> > >> someone to the curb

Re: Broader community involvement in 4.0 (WAS Re: Rough roadmap for 4.0)

2016-11-05 Thread Edward Capriolo

On Sat, Nov 5, 2016 at 9:19 AM, Benedict Elliott Smith <bened...@apache.org>
wrote:

> Hi Ed,
>
> I would like to try and clear up what I perceive to be some
> misunderstandings.
>
> Aleksey is relating that for *complex* tickets there are desperately few
> people with the expertise necessary to review them.  In some cases it can
> amount to several weeks' work, possibly requiring multiple people, which is
> a huge investment.  EPaxos is an example where its complexity likely needs
> multiple highly qualified reviewers.
>
> Simpler tickets on the other hand languish due to poor incentives - they
> aren't sexy for volunteers, and aren't important for the corporately
> sponsored contributors, who also have finite resources.  Nobody *wants* to
> do them.
>
> This does contribute to an emergent lack of diversity in the pool of
> contributors, but it doesn't discount Aleksey's point.  We need to find a
> way forward that handles both of these concerns.
>
> Sponsored contributors have invested time into efforts to expand the
> committer pool before, though they have universally failed.  Efforts like
> the "low hanging fruit squad" seem like a good idea that might payoff, with
> the only risk being the cloud hanging over the project right now.  I think
> constructive engagement with potential sponsors is probably the way
> forward.
>
> (As an aside, the policy on test coverage was historically very poor
> indeed, but is I believe much stronger today - try not to judge current
> behaviours on those of the past)
>
>
> On 5 November 2016 at 00:05, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
> > "I’m sure users running Cassandra in production would prefer actual
> proper
> > reviews to non-review +1s."
> >
> > Again, you are implying that only you can do a proper job.
> >
> > Lets be specific here: You and I are working on this one:
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-10825
> >
> > Now, Ariel reported there was no/low code coverage. I went looking a the
> > code and found a problem.
> >
> > If someone were to merge this: I would have more incentive to look for
> > other things, then I might find more bugs and improvements. If this
> process
> > keeps going, I would naturally get exposed to more of the code. Finally
> in
> > maybe (I don't know in 10 or 20 years) I could become one of these
> > specialists.
> >
> > Lets peal this situation apart:
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-10825
> >
> > "If you grep test/src and cassandra-dtest you will find that the string
> > OverloadedException doesn't appear anywhere."
> >
> > Now let me flip this situation around:
> >
> > "I'm sure the users running Cassandra in production would prefer proper
> > coding practice like writing unit and integration test to rubber stamp
> > merges"
> >
> > When the shoe is on the other foot it does not feel so nice.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Fri, Nov 4, 2016 at 7:08 PM, Aleksey Yeschenko <alek...@apache.org>
> > wrote:
> >
> > > Dunno. A sneaky correctness or data corruption bug. A performance
> > > regression. Or something that can take a node/cluster down.
> > >
> > > Of course no process is bullet-proof. The purpose of review is to
> > minimise
> > > the odds of such a thing happening.
> > >
> > > I’m sure users running Cassandra in production would prefer actual
> proper
> > > reviews to non-review +1s.
> > >
> > > --
> > > AY
> > >
> > > On 4 November 2016 at 23:03:23, Edward Capriolo (edlinuxg...@gmail.com
> )
> > > wrote:
> > >
> > > I feel that is really standing up on a soap box. What would be the
> worst
> > > thing that happens here
> >
>

Benedict,

Well said. I think we both see a similar way forward.

"Sponsored contributors have invested time into efforts to expand the
committer pool before, though they have universally failed."

Lets talk about this. I am following a number of tickets. Take for example
this one.

https://issues.apache.org/jira/browse/CASSANDRA-12649

September 19th: User submits a patch along with a clear rational. (It is
right in the description of the ticket):

October 19th: (me) +1 (non binding) users with unpredictable batch sizes
tend to also have gc problems and this would aid in insight.

October 28th: Someone else: Would be nice to see this committed. We have
seen a lot of users mistakenly batch against mul

Re: Broader community involvement in 4.0 (WAS Re: Rough roadmap for 4.0)

2016-11-04 Thread Edward Capriolo

"I’m sure users running Cassandra in production would prefer actual proper
reviews to non-review +1s."

Again, you are implying that only you can do a proper job.

Lets be specific here: You and I are working on this one:

https://issues.apache.org/jira/browse/CASSANDRA-10825

Now, Ariel reported there was no/low code coverage. I went looking a the
code and found a problem.

If someone were to merge this: I would have more incentive to look for
other things, then I might find more bugs and improvements. If this process
keeps going, I would naturally get exposed to more of the code. Finally in
maybe (I don't know in 10 or 20 years) I could become one of these
specialists.

Lets peal this situation apart:

https://issues.apache.org/jira/browse/CASSANDRA-10825

"If you grep test/src and cassandra-dtest you will find that the string
OverloadedException doesn't appear anywhere."

Now let me flip this situation around:

"I'm sure the users running Cassandra in production would prefer proper
coding practice like writing unit and integration test to rubber stamp
merges"

When the shoe is on the other foot it does not feel so nice.

On Fri, Nov 4, 2016 at 7:08 PM, Aleksey Yeschenko <alek...@apache.org>
wrote:

> Dunno. A sneaky correctness or data corruption bug. A performance
> regression. Or something that can take a node/cluster down.
>
> Of course no process is bullet-proof. The purpose of review is to minimise
> the odds of such a thing happening.
>
> I’m sure users running Cassandra in production would prefer actual proper
> reviews to non-review +1s.
>
> --
> AY
>
> On 4 November 2016 at 23:03:23, Edward Capriolo (edlinuxg...@gmail.com)
> wrote:
>
> I feel that is really standing up on a soap box. What would be the worst
> thing that happens here

Re: Broader community involvement in 4.0 (WAS Re: Rough roadmap for 4.0)

2016-11-04 Thread Edward Capriolo

"There is also the issue of specialisation. Very few people can be trusted
with review of arbitrary
Cassandra patches. I can count them all on fingers of one hand."

I have to strongly disagree  here. The Cassandra issue tracker is over
12000 tickets. I do not think that cassandra has added 12000 "features"
since it's inception.  I reject this concept that only a hand full of
people are capable of reviewing and merging things. Firstly, if this
process was so insanely bullet proof we never had alternating tick-tock fix
releases. (Unless someone is going to argue we are still fixing zero day
bugs from the facebook code drop :). I in my spare time have passed over
code and found things.

I do not mean this to come off as offensive. There clearly are specialists
and they are well respected. When someone say things like:

"real reviews, not rubber-stamping a +1 formally"

I feel that is really standing up on a soap box. What would be the worst
thing that happens here? A "rubber stamp" review sneaks in and causes bug
12001. OMG! NO SOMEONE RUBBER STAMPED SOMETHING AND CREATED A BUG. THAT
NEVER HAPPENED BEFORE IN THE HISTORY OF THE PROJECT. THERE HAS NEVER BEEN A
UNTESTED FEATURE ADDED WHICH BROKE SOMETHING ELSE. ETC ETC.

Be real about this situation it. Just added sasi stuff has bugs.





On Fri, Nov 4, 2016 at 6:27 PM, Aleksey Yeschenko 
wrote:

> I’m sure that impactful, important, and/or potentially destabilising
> patches will continue getting reviewed
> by those engineers.
>
> As for patches that no organisation with a strong enough commercial
> interest cares about, they probably won’t.
> Engineering time is quite expensive, most employers are understaffed as it
> is, overloaded with deadlines and
> fires, so it’s hard to justify donating man hours to work that brings no
> value to your employer - be it Instagram,
> Apple, or DataStax.
>
> I don’t want to sound negative here, but I’d rather not fake optimism
> here, either. Expect that kind of patches
> to stay in unreviewed limbo for the most part.
>
> But significant work will still get reviewed and committed, keeping the
> project overall healthy. I wouldn’t worry much.
>
> --
> AY
>
> On 4 November 2016 at 22:13:42, Aleksey Yeschenko (alek...@apache.org)
> wrote:
>
> This has always been a concern. We’ve always had a backlog on unreviewed
> patches.
>
> Reviews (real reviews, not rubber-stamping a +1 formally) are real work,
> often taking as much work
> as creating the patch in question. And taking as much expertise (or more).
>
> It’s also not ‘fun’ and doesn’t lend itself to scratch-your-own-itch
> drive-by style contributions.
>
> In other words, not something people tend to volunteer for. Something done
> mostly by people
> paid to do the work, reviews assigned to them by their managers.
>
> There is also the issue of specialisation. Very few people can be trusted
> with review of arbitrary
> Cassandra patches. I can count them all on fingers of one hand. There are
> islands of expertise
> and people who can review certain subsystems, and most of them are
> employed by a certain one
> company. A few people at Apple, but with no real post-8099, 3.0 code
> experience at the moment.
>
> What I’m saying is that it’s insufficient to just have desire to volunteer
> - you also need the actual
> knowledge and skill to properly review non-trivial work, and for that we
> largely only have DataStax
> employed contributors, with a sprinkle of people at Apple, and that’s
> sadly about it.
>
> We tried to improve it by holding multiple bootcamps, at Summits, and
> privately within major companies,
> at non-trivial expense to the company, but those initiatives mostly failed
> so far :(
>
> This has always been a problem (lack of review bandwidth), and always will
> be. That said, I don’t expect it to get
> much worse than it is now.
>
> --
> AY
>
> On 4 November 2016 at 21:50:20, Nate McCall (zznat...@gmail.com) wrote:
>
> To be clear, getting the community more involved is a super hard,
> critically important problem to which I don't have a concrete answer
> other than I'm going to keep reaching out for opinions, ideas and
> involvement.
>
> Just like this.
>
> Please speak up here if you have ideas on how this could work.
>
> On Sat, Nov 5, 2016 at 10:38 AM, Nate McCall  wrote:
> > [Moved to a new thread because this topic is important by itself]
> >
> > There are some excellent points here - thanks for bringing this up.
> >
> >> What can inspiring developers contribute to 4.0
> >> that would move the project forward to it’s goals and would be very
> likely
> >> included in the final release?
> >
> > That is a hard question with regards to the tickets I listed. My goal
> > was to list the large, potentially breaking changes which would
> > necessitate a move from '3' to '4' major release. Unfortunately in
> > this context, those types of issues have a huge surface area that
> > requires experience with the code to

Re: DataStax role in Cassandra and the ASF

2016-11-04 Thread Edward Capriolo

On Thu, Nov 3, 2016 at 11:44 PM, Kelly Sommers 
wrote:

> I think the community needs some clarification about what's going on.
> There's a really concerning shift going on and the story about why is
> really blurry. I've heard all kinds of wild claims about what's going on.
>
> I've heard people say the ASF is pushing DataStax out because they don't
> like how much control they have over Cassandra. I've heard other people say
> DataStax and the ASF aren't getting along. I've heard one person who has
> pull with a friend in the ASF complained about a feature not getting
> considered (who also didn't go down the correct path of proposing) kicked
> and screamed and started the ball rolling for control change.
>
> I don't know what's going on, and I doubt the truth is in any of those, the
> truth is probably somewhere in between. As a former Cassandra MVP and
> builder of some of the larger Cassandra clusters in the last 3 years I'm
> concerned.
>
> I've been really happy with Jonathan and DataStax's role in the Cassandra
> community. I think they have done a great job at investing time and money
> towards the good interest in the project. I think it is unavoidable a
> single company bootstraps large projects like this into popularity. It's
> those companies investments who give the ability to grow diversity in later
> stages. The committer list in my opinion is the most diverse its ever been,
> hasn't it? Apple is a big player now.
>
> I don't think reducing DataStax's role for the sake of diversity is smart.
> You grow diversity by opening up new opportunities for others. Grow the
> committer list perhaps. Mentor new people to join that list. You don't kick
> someone to the curb and hope things improve. You add.
>
> I may be way off on what I'm seeing but there's not much to go by but
> gossip (ahaha :P) and some ASF meeting notes and DataStax blog posts.
>
> August 17th 2016 ASF changed the Apache Cassandra chair
> https://www.apache.org/foundation/records/minutes/
> 2016/board_minutes_2016_08_17.txt
>
> "The Board expressed continuing concern that the PMC was not acting
> independently and that one company had undue influence over the project."
>
> August 19th 2016 Jonothan Ellis steps down as chair
> http://www.datastax.com/2016/08/a-look-back-a-look-forward
>
> November 2nd 2016 DataStax moves committers to DSE from Cassandra.
> http://www.datastax.com/2016/11/serving-customers-serving-the-community
>
> I'm really concerned if indeed the ASF is trying to change control and
> diversity  of organizations by reducing DataStax's role. As I said earlier,
> I've been really happy at the direction DataStax and Jonathan has taken the
> project and I would much prefer see additional opportunities along side
> theirs grow instead of subtracting. The ultimate question that's really
> important is whether DataStax and Jonathan have been steering the project
> in the right direction. If the answer is yes, then is there really anything
> broken? Only if the answer is no should change happen, in my opinion.
>
> Can someone at the ASF please clarify what is going on? The ASF meeting
> notes are very concerning.
>
> Thank you for listening,
> Kelly Sommers
>

Kelly,

Thank you for taking the time to mention this. I want to react to this
statement:

"I've heard people say the ASF is pushing DataStax out because they don't
like how much control they have over Cassandra. I've heard other people say
DataStax and the ASF aren't getting along. I've heard one person who has
pull with a friend in the ASF complained about a feature not getting
considered (who also didn't go down the correct path of proposing) kicked
and screamed and started the ball rolling for control change."

There is an important saying in the ASF:
https://community.apache.org/newbiefaq.html

   - If it didn't happen on a mailing list, it didn't happen.

It is natural that communication happens outside of Jira. The rough aim of
this mandate is a conversation like that that happens by the water cooler
should be summarized and moved into a forum where it can be recorded and
discussed. There is a danger in repeating something anecdotal or 'things
you have heard'. If that party is being suppressed, that is an issue to
deal with. If a party is unwilling to speak for themselves publicly in the
ASF public forums that is on them. Retelling what others told us is
'gossip' as you put it.

"I think it is unavoidable a single company bootstraps large projects like
this into popularity"
"I don't think reducing DataStax's role for the sake of diversity is
smart."

Let me state my opinion as an open source ASF member that was never
directly payed to work on an open source project. I have proposed and seen
proposed by others ideas to several open source projects inside (ASF and
outside) which were rejected. Later (months maybe years later) the exact
idea or roughly the same idea is implemented by different person in a
slightly different form. There

Re: Moderation

2016-11-04 Thread Edward Capriolo

Is the message in moderation because
1) it was sent by someone not registered with the list
2) some other reason (anti-spam etc)

If it is is case 1: Isn't the correct process to inform and encourage
someone list properly?
If it is case 2: Is there an expected ETA for list moderation events?
(probably not)

I see twitter mentioned. We know that sometimes news and sentiment in
social media move fast and cause reactions on incorrect/unvetted
information.


On Fri, Nov 4, 2016 at 11:58 AM, Mattmann, Chris A (3010) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hmm. Not excessive but you have a situation where someone is tweeting
> thinking her message didn't go through and conversation is happening there
> when that same conversation could be had on list. If you are ok with that
> continuing to happen then great but I am not. Can someone please moderate
> the message through?
>
> Sent from my iPhone
>
> > On Nov 4, 2016, at 8:54 AM, Mark Thomas  wrote:
> >
> >> On 04/11/2016 15:47, Chris Mattmann wrote:
> >> Hi Folks,
> >>
> >> Kelly Sommers sent a message to dev@cassandra and I'm trying to figure
> out if it's in moderation.
> >>
> >> Can the moderators speak up?
> >
> > Using my infra karma, I checked the mail server. That message is waiting
> > for moderator approval. It has been in moderation for 12 hours which
> > doesn't strike me as at all excessive.
> >
> > Mark
> >
>

Re: Rough roadmap for 4.0

2016-11-04 Thread Edward Capriolo

I would like to propose features around seeds:
https://issues.apache.org/jira/browse/CASSANDRA-12627

I have other follow up issues like getting seeds from Amazon API, or from
JNDI/ DNS, etc.

I was hoping 12627 was an easy way to grease the wheels.


On Fri, Nov 4, 2016 at 8:39 AM, Jason Brown  wrote:

> Hey Nate,
>
> I'd like to add CASSANDRA-11559 (Enhance node representation) to the list,
> including the comment I made on the ticket (different storage ports for
> each node). For us, it's a great "would really like to have" as our group
> will probably need this in production within the next 1 year or less.
> However since it hasn't gotten much attention, I'm not sure if we should
> add it to the list of "must haves" for 4.0. I'm planning on bringing it up
> internally today, as well.
>
> Also, from the previous 4.0 email thread that Jonathan started back in July
> (
> https://mail-archives.apache.org/mod_mbox/cassandra-dev/
> 201607.mbox/%3CCALdd-zhW3qJ%3DOWida9nMXPj0JOsru7guOYh6-
> 7uTjqEBvacrgQ%40mail.gmail.com%3E
> )
>
> - CASSANDRA-5 (thrift removal) - ticket not mentioned explicitly in the
> email, but the notion of removing thrift was
> - CASSANDRA-10857 (Allow dropping COMPACT STORAGE flag)
>
> Thanks,
>
> -Jason
>
> On Thu, Nov 3, 2016 at 8:43 PM, sankalp kohli 
> wrote:
>
> > List looks really good. I will let you know if there is something else we
> > plan to add to this list.
> >
> > On Thu, Nov 3, 2016 at 7:47 PM, Nate McCall  wrote:
> >
> > > It was brought up recently at the PMC level that our goals as a
> > > project are not terribly clear.
> > >
> > > This is a pretty good point as outside of Jira 'Fix Version' labelling
> > > (which we actually suck less at compared to a lot of other ASF
> > > projects) this really isnt tracked anywhere outside of general tribal
> > > knowledge about who is working on what.
> > >
> > > I would like to see us change this for two reasons:
> > > - it's important we are clear with our community about where we are
> going
> > > - we need to start working more closely together
> > >
> > > To that end, i've put together a list (in no particular order) of the
> > > *major* features in which I know folks are interested, have patches
> > > coming, are awaiting design review, etc.:
> > >
> > > - CASSANDRA-9425 Immutable node-local schema
> > > - CASSANDRA-10699 Strongly consistent schema alterations
> > > - CASSANDRA-12229 NIO streaming
> > > - CASSANDRA-8457 NIO messaging
> > > - CASSANDRA-12345 Gossip 2.0
> > > - CASSANDRA-9754 Birch trees
> > >
> > > What did I miss? What else would folks like to see? Specifically, this
> > > should be "new stuff that could/will break things" given we are upping
> > > the major version.
> > >
> > > To be clear, it's not my intention to set this in stone and then beat
> > > people about the head with it. More to have it there to point it at a
> > > high level and foster better communication with our users from the
> > > perspective of an open source project.
> > >
> > > Please keep in mind that given everything else going on, I think it's
> > > a fantastic idea to keep this list small and spend some time focusing
> > > on stability particularly as we transition to a new release process.
> > >
> > > -Nate
> > >
> >
>

Re: Low hanging fruit crew

2016-10-19 Thread Edward Capriolo

I realize that test passing a small tests and trivial reviews will not
catch all issues. I am  not attempting to trivialize the review process.

Both deep and shallow bugs exist. The deep bugs, I am not convinced that
even an expert looking at the contribution for N days can account for a
majority of them.

The shallow ones may appear shallow and may be deep, but given that a bug
already exists an attempt to fix it usually does not arrive at something
worse.

Many of these issues boil down to simple, seeemingly clear fixes. Some are
just basic metric addition. Many have no interaction for weeks.


On Wednesday, October 19, 2016, Jeff Jirsa 
wrote:

> Let’s not get too far in the theoretical weeds. The email thread really
> focused on low hanging tickets – tickets that need review, but definitely
> not 8099 level reviews:
>
> 1) There’s a lot of low hanging tickets that would benefit from outside
> contributors as their first-patch in Cassandra (like
> https://issues.apache.org/jira/browse/CASSANDRA-12541 ,
> https://issues.apache.org/jira/browse/CASSANDRA-12776 )
> 2) Some of these patches already exist and just need to be reviewed and
> eventually committed.
>
> Folks like Ed and Kurt have been really active in Jira lately, and there
> aren’t a ton of people currently reviewing/committing – that’s part of OSS
> life, but the less friction that exists getting those patches reviewed and
> committed, the more people will be willing to contribute in the future, and
> the better off the project will be.
>
>
> On 10/19/16, 9:14 AM, "Jeremy Hanna"  > wrote:
>
> >And just to be clear, I think everyone would welcome more testing for
> both regressions of new code correctness.  I think everyone would
> appreciate the time savings around more automation.  That should give more
> time for a thoughtful review - which is likely what new contributors really
> need to get familiar with different parts of the codebase.  These LHF
> reviews won’t be like the code reviews of the vnode or the 8099 ticket
> obviously, so it won’t be a huge burden.  But it has some very tangible
> benefits imo, as has been said.
> >
> >> On Oct 19, 2016, at 11:08 AM, Jonathan Ellis  > wrote:
> >>
> >> I specifically used the phrase "problems that the test would not" to
> show I
> >> am talking about more than mechanical correctness.  Even if the tests
> are
> >> perfect (and as Jeremiah points out, how will you know that without
> reading
> >> the code?), you can still pass tests with bad code.  And is expecting
> >> perfect tests really realistic for multithreaded code?
> >>
> >> But besides correctness, reviews should deal with
> >>
> >> 1. Efficiency.  Is there a quadratic loop that will blow up when the
> number
> >> of nodes in a cluster gets large?  Is there a linear amount of memory
> used
> >> that will cause problems when a partition gets large?  These are not
> >> theoretical problems.
> >>
> >> 2. Maintainability.  Is this the simplest way to accomplish your
> goals?  Is
> >> there a method in SSTableReader that would make your life easier if you
> >> refactored it a bit instead of reinventing it?  Does this reduce
> technical
> >> debt or add to it?
> >>
> >> 3. "Bus factor."  If only the author understands the code and all anyone
> >> else knows is that tests pass, the project will quickly be in bad shape.
> >> Review should ensure that at least one other person understand the code,
> >> what it does, and why, at a level that s/he could fix bugs later in it
> if
> >> necessary.
> >>
> >> On Wed, Oct 19, 2016 at 10:52 AM, Jonathan Haddad  > wrote:
> >>
> >>> Shouldn't the tests test the code for correctness?
> >>>
> >>> On Wed, Oct 19, 2016 at 8:34 AM Jonathan Ellis  > wrote:
> >>>
>  On Wed, Oct 19, 2016 at 8:27 AM, Benjamin Lerer <
>  benjamin.le...@datastax.com 
> > wrote:
> 
> > Having the test passing does not mean that a patch is fine. Which is
> >>> why
>  we
> > have a review check list.
> > I never put a patch available without having the tests passing but
> most
>  of
> > my patches never pass on the first try. We always make mistakes no
> >>> matter
> > how hard we try.
> > The reviewer job is to catch those mistakes by looking at the patch
> >>> from
> > another angle. Of course, sometime, both of them fail.
> >
> 
>  Agreed.  Review should not just be a "tests pass, +1" rubber stamp,
> but
>  actually checking the code for correctness.  The former is just
> process;
>  the latter actually catches problems that the tests would not.  (And
> this
>  is true even if the tests are much much better than ours.)
> 
>  --
>  Jonathan Ellis
>  co-founder, https://urldefense.proofpoint.com/v2/url?u=http-3A__www.
>

Re: Low hanging fruit crew

2016-10-19 Thread Edward Capriolo

Yes. The LHFC crew should always pay it forward. Not many of us have a
super computer to run all the tests, but for things that are out there
marked patch_available apply it to see that it applies clean, if it
includes a test run that test (and possibly some related ones in the
file/folder etc for quick coverage). A nice initial sweep is a good thing.

I have seen before a process which triggered and auto-build when the patch
was added to the ticket. This took a burden off the committers, by the time
someone got to the ticket they already knew if tests passed then it was
only a style and fine tuning review.

In this case if we have a good initial pass we can hopefully speed along
the process.

On Wed, Oct 19, 2016 at 4:18 AM, kurt Greaves  wrote:

> On 19 October 2016 at 05:30, Nate McCall  wrote:
>
> > if you are offering up resources for review and test coverage,
> > there is work out there. Most immediately in the department of reviews
> > for "Patch Available."
> >
>
> We can certainly put some minds to this. There are a few of us with a very
> good understanding of working Cassandra yet could use more exposure to the
> code base. We'll start getting out there and looking for things to review.
>
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>

Low hanging fruit crew

2016-10-18 Thread Edward Capriolo

I go through the Cassandra jira weekly and I notice a number of tickets
which appear to be clear issues or requests for simple metrics.

https://issues.apache.org/jira/browse/CASSANDRA-12626

https://issues.apache.org/jira/browse/CASSANDRA-12330

I also have a few jira issues (opinion) would be simple to triage and
merge. Getting things merged is the primary pathway to meritocracy in the
ASF.

Across some other ASF projects I have seen that when the number of small
patches becomes larger that bodies to review them. It can result and chick
and egg scenario where reviewers feel overburdened, but the pathway to
removing this burden is promoting contributors to committers.

My suggestion:
Assemble a low-hanging-fruit-crew. This crew would be armed general support
for small commits, logging, metrics, test coverage, things static analysis
reveals etc. They would have a reasonable goal like "get 2 lhf merged a
day/week/whatever". If the process is successful, in a few months there
would hopefully be 1-2 committers graduated who would naturally wish to
move into low hanging fruit duties.

Thoughts?

Re: Proprietary Replication Strategies: Cassandra Driver Support

2016-10-08 Thread Edward Capriolo

I have contemplated using LocalStrategy as a "do it yourself client side
sharding system".

On Sat, Oct 8, 2016 at 12:37 AM, Vladimir Yudovin 
wrote:

> Hi Prasenjit,
> I would like to get the replication factors of the key-spaces using the
> strategies in the same way we get the replication factors for Simple and
> NetworkTopology.
>  Actually LocalSarategy has no replication factor:
>
> SELECT * FROM system_schema.keyspaces WHERE keyspace_name IN ('system',
> 'system_schema');
>  keyspace_name | durable_writes | replication
> ---++---
> -
> system   | True | {'class':
> 'org.apache.cassandra.locator.LocalStrategy'}
>  system_schema | True | {'class':
> 'org.apache.cassandra.locator.LocalStrategy'}
>
>
> It's used for internal tables and not accessible to users:
>
> CREATE KEYSPACE excel WITH replication = {'class': 'LocalStrategy'};
> ConfigurationException: Unable to use given strategy class: LocalStrategy
> is reserved for internal use.
>
>
> Best regards, Vladimir Yudovin,
> Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
> Launch your cluster in minutes.
>
>
>
>
>  On Fri, 07 Oct 2016 17:06:09 -0400 Prasenjit
> Sarkarprasenjit.sar...@datos.io wrote 
>
> Thanks Vlad and Jeremiah.
>
> There were questions about support, so let me address that in more detail.
>
> If I look at the latest Cassandra python driver, the support for
> LocalStrategy is very limited (code snippet shown below) and the support
> for EverywhereStrategy is non-existent. By limited I mean that the
> Cassandra python driver only provides the name of the strategy for
> LocalStrategy and not much else.
>
> What I would like (and happy to help) is for the Cassandra python driver to
> provide support for Local and Everywhere to the same extent it is provided
> for Simple and NetworkTopology. I understand that token aware routing is
> not applicable to either strategy but I would like to get the replication
> factors of the key-spaces using the strategies in the same way we get the
> replication factors for Simple and NetworkTopology.
>
> Hope this helps,
> Prasenjit
>
>
> class LocalStrategy(ReplicationStrategy):
> def __init__(self, options_map):
> pass
> def make_token_replica_map(self, token_to_host_owner, ring):
> return {}
> def export_for_schema(self):
> """
> Returns a string version of these replication options which are
> suitable for use in a CREATE KEYSPACE statement.
> """
> return "{'class': 'LocalStrategy'}"
> def __eq__(self, other):
> return isinstance(other, LocalStrategy)
>
> On Fri, Oct 7, 2016 at 11:56 AM, Jeremiah D Jordan 
> jeremiah.jor...@gmail.com wrote:
>
>  What kind of support are you thinking of? All drivers should support
> them
>  already, drivers shouldn’t care about replication strategy except when
>  trying to do token aware routing.
>  But since anyone can make a custom replication strategy, drivers that
> do
>  token aware routing just need to handle falling back to not doing
> token
>  aware routing if a replication strategy they don’t know about is in
> use.
>  All the open sources drivers I know of do this, so they should all
>  “support” those strategies already.
> 
>  -Jeremiah
> 
>   On Oct 7, 2016, at 1:02 PM, Prasenjit Sarkar &
> lt;prasenjit.sar...@datos.io
>  wrote:
>  
>   Hi everyone,
>  
>   To the best of my understanding that Datastax has proprietary
> replication
>   strategies: Local and Everywhere which are not part of the open
> source
>   Apache Cassandra project.
>  
>   Do we know of any plans in the open source Cassandra driver
> community to
>   support these two replication strategies? Would Datastax have a
> licensing
>   concern if the open source driver community supported these
> strategies?
>  I'm
>   fairly new here and would like to understand the dynamics.
>  
>   Thanks,
>   Prasenjit
> 
> 
>
>
>
>
>
>

Re: Create table with ID - Question

2016-09-28 Thread Edward Capriolo

I have a similar set of problems. I will set the stage: in the past, for a
variety of reasons I had to create tables(column families) by time range
for an event processing system.

The man reason was expiring data (TTL) did not purge easily. It was easier
to simply truncate/drop old column families than two deal with different
evolving compaction strategies.

The main loop of my program looked like this:
public void writeThisStuff(List event ){
  MutationBatch mb;
  for (Event event : events){
mb.add(event)
  }
  maybeCreateNeededTables(mb)
  executeBatch(mb);
}

public void maybeCreateNeededTables(mb){
  Set columnFamilieToCreate =
  for (mutation : batch) {
 columnFamiliesToCreate.add(extractColumnFamilyFromMutation(mutation));
  }
  for (String cf: columnFamiliesToCreate){
 if ! hectorAstyanaxFlavoroftheweekclientDoesCfExist(cf)){
hectorAstyanaxFlavoroftheweekclienCreateCf(cf);
 }
  }
}

The size of the batches were in the 5-10K range. For a given batch the
number of target cfs was typically one, but at most two. That mean worst
case scenario 1 would need to be created. Effectively this meant 1 metadata
read before write. (You could cache the already existing columns as well).
One quick read is not a huge cost when you consider the savings of batching
5K roundtrips.

Even with this type of scenario you can run into a concurrent schema
problem. But you can add whatever gizmo to confirm schema agreement here:

  for (String cf: columnFamiliesToCreate){
*waitForSchemaToSettleGizmo()*
 if ! hectorAstyanaxFlavoroftheweekclientDoesCfExist(cf)){
*waitForSchemaToSettleGizmo()*
hectorAstyanaxFlavoroftheweekclienCreateCf(cf);
 }
  }

On Wed, Sep 28, 2016 at 12:01 PM, Aleksey Yeschenko 
wrote:

> No way to do that via Thrift I’m afraid, nor will there be one. Sorry.
>
> --
> AY
>
> On 28 September 2016 at 16:43:58, Roman Bielik (roman.bielik@
> openmindnetworks.com) wrote:
>
> Hi,
>
> in CQL it is possible to create a table with explicit ID: CREATE TABLE ...
> WITH ID='xyz'.
>
> Is something like this possible via Thrift interface?
> There is an int32 "id" field in CfDef, but it has no effect on the table
> ID.
>
> My problem is, that concurrent create table (add_column_family) requests
> for the same table name result in clash with somewhat unpredictable
> behavior.
>
> This problem was reported in:
> https://issues.apache.org/jira/browse/CASSANDRA-9933
>
> and seems to be related to changes from ticket:
> https://issues.apache.org/jira/browse/CASSANDRA-5202
>
> A workaround for me could be using the same ID in create table, however I'm
> using Thrift interface only.
>
> Thank you.
> Regards,
> Roman
>
> --
>
> 
> 
>  
>

Re: [Discuss] Adding dtest to project

2016-09-23 Thread Edward Capriolo

I love DTest I think it is a great thing in the tool belt. One thing that I
want to point out, nosettests and dtests are black-box type testing. You
can not step or trace these things very easily.

My dream would be if cassandra was re-entrant and it was possible to run a
3 node cluster in one JVM and set a break point. I think you could prove
out many things much easier and faster.

On Thu, Sep 22, 2016 at 11:44 PM, Nate McCall  wrote:

> [Moved from PMC as there is nothing really private involved]
>
> DataStax has graciously offered to contribute cassandra-dtest [0] to
> the project.
>
> There were, however, two issues noted by Jonathan when he presented
> the offer to the PMC:
>
> 1. dtest mixes tests for many cassandra versions together in a single
> project.  So having it live in the main cassandra repo, versioned by
> cassandra release, doesn't really make sense.  Is Infra able to create a
> second repo for this, or is the "one project, one repo" mapping fixed?
>
> 2. DataStax did not require a CLA to contribute to dtest, so the non-DS
> contributors to dtest would need to be contacted for their permission to
> assign copyright to the ASF.  Is the PMC willing to tackle this?
>
> In a brief discussion, it was deduced that #1 can be addressed by
> adding apache/cassandra-dtest to the ASF repo (the example of
> apache/aurora and apache/aurora-packaging was given to justify this).
>
> #2 will be harder as it will require tracking a few people people down
> to sign ASF CLAs.
>
> Before we really put effort into this, I wanted to open the discussion
> up about whether we are willing to take on the development of this in
> the project. Thoughts?
>
> -Nate
>
>
> [0] https://github.com/riptano/cassandra-dtest
>

Re: Question on assert

2016-09-22 Thread Edward Capriolo

Yes obviously we do not need to go in and replace them all at once. Some
rough guidance/general consensus should be in place, because we are
violating the standard usage:

https://docs.oracle.com/javase/8/docs/technotes/guides/language/assert.html

Do *not* use assertions for argument checking in public methods.
Do *not* use assertions to do any work that your application requires for
correct operation.

There should be a rational as to why and when this is right. Otherwise
changes like this might be considered bikeshedding.

In any case I created

https://issues.apache.org/jira/browse/CASSANDRA-12688

since I think we can all agree that can not run without them at the moment
and we do not want to give someone an incentive to set this off which I
feel the claim of 5% performance does.

On Thu, Sep 22, 2016 at 7:29 AM, Benjamin Lerer  wrote:

> I fully agree.
>
> On Thu, Sep 22, 2016 at 11:57 AM, Dave Brosius 
> wrote:
>
> > As an aside, C* for some reason heavily uses asserts in unit tests, which
> > adds to the "can't see the forest for the trees" problem. I see no reason
> > for that. they should all be moved over to junit asserts.
> >
> >
> >
> > On 09/22/2016 03:52 AM, Benjamin Lerer wrote:
> >
> >> We can spend hours arguing about assert vs exceptions. I have seen it
> >> happen in every company I worked for.
> >> Overall, based on the patches I have reviewed, it seems to me that in
> >> general people are using them only has internal safety checks.
> >> Unfortunatly, the code change and we can miss things.
> >> If anybody think that some SPECIFIC assertions should be replaced by
> some
> >> real checks, I think the best way to do it is to open a JIRA ticket to
> >> raise the problem.
> >>
> >>
> >
>

Re: Question on assert

2016-09-21 Thread Edward Capriolo

" potential 5% performance win when you've corrupted all their data."
This is somewhat of my point. Why do assertions that sometimes are trapped
"protect my data" better then a checked exception?

On Wed, Sep 21, 2016 at 1:24 PM, Michael Kjellman <
mkjell...@internalcircle.com> wrote:

> I hate that comment with a passion. Please please please please do
> yourself a favor and *always* run with asserts on. `-ea` for life. In
> practice I'd be surprised if you actually got a reliable 5% performance win
> and I doubt your customers will care about a potential 5% performance win
> when you've corrupted all their data.
>
> best,
> kjellman
>
> > On Sep 21, 2016, at 10:21 AM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
> >
> > There are a variety of assert usages in the Cassandra. You can find
> several
> > tickets like mine.
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-12643
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-11537
> >
> > Just to prove that I am not the only one who runs into these:
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-12484
> >
> > To paraphrase another ticket that I read today and can not find,
> > "The problem is X throws Assertion which is not caught by the Exception
> > handler and it bubbles over and creates a thread death."
> >
> > The jvm.properties file claims this:
> >
> > # enable assertions.  disabling this in production will give a modest
> > # performance benefit (around 5%).
> > -ea
> >
> > If assertions incur a "5% penalty" but are not always trapped what value
> do
> > they add?
> >
> > These are common sentiments about how assert should be used: (not trying
> to
> > make this a this is what the internet says type debate)
> >
> > http://stackoverflow.com/questions/2758224/what-does-
> the-java-assert-keyword-do-and-when-should-it-be-used
> >
> > "Assertions
> > <http://docs.oracle.com/javase/specs/jls/se8/html/jls-14.html#jls-14.10>
> (by
> > way of the *assert* keyword) were added in Java 1.4. They are used to
> > verify the correctness of an invariant in the code. They should never be
> > triggered in production code, and are indicative of a bug or misuse of a
> > code path. They can be activated at run-time by way of the -eaoption on
> the
> > java command, but are not turned on by default."
> >
> > http://stackoverflow.com/questions/1957645/when-to-use-
> an-assertion-and-when-to-use-an-exception
> >
> > "An assertion would stop the program from running, but an exception would
> > let the program continue running."
> >
> > I look at how Cassandra uses assert and how it manifests in how the code
> > operates in production. Assert is something like semi-unchecked
> exception.
> > All types of internal Util classes might throw it, downstream code is
> > essentially unaware and rarely specifically handles it. They do not
> always
> > result in the hard death one would expect from an assert.
> >
> > I know this is a ballpark type figure, but would "5% performance penalty"
> > be in the ballpark of a checked exception? Being that they tend to bubble
> > through things uncaught do they do more danger than good?
>
>

Question on assert

2016-09-21 Thread Edward Capriolo

There are a variety of assert usages in the Cassandra. You can find several
tickets like mine.

https://issues.apache.org/jira/browse/CASSANDRA-12643

https://issues.apache.org/jira/browse/CASSANDRA-11537

Just to prove that I am not the only one who runs into these:

https://issues.apache.org/jira/browse/CASSANDRA-12484

To paraphrase another ticket that I read today and can not find,
"The problem is X throws Assertion which is not caught by the Exception
handler and it bubbles over and creates a thread death."

The jvm.properties file claims this:

# enable assertions. disabling this in production will give a modest
# performance benefit (around 5%).
-ea

If assertions incur a "5% penalty" but are not always trapped what value do
they add?

These are common sentiments about how assert should be used: (not trying to
make this a this is what the internet says type debate)

http://stackoverflow.com/questions/2758224/what-does-the-java-assert-keyword-do-and-when-should-it-be-used

"Assertions
(by
way of the *assert* keyword) were added in Java 1.4. They are used to
verify the correctness of an invariant in the code. They should never be
triggered in production code, and are indicative of a bug or misuse of a
code path. They can be activated at run-time by way of the -eaoption on the
java command, but are not turned on by default."

http://stackoverflow.com/questions/1957645/when-to-use-an-assertion-and-when-to-use-an-exception

"An assertion would stop the program from running, but an exception would
let the program continue running."

I look at how Cassandra uses assert and how it manifests in how the code
operates in production. Assert is something like semi-unchecked exception.
All types of internal Util classes might throw it, downstream code is
essentially unaware and rarely specifically handles it. They do not always
result in the hard death one would expect from an assert.

I know this is a ballpark type figure, but would "5% performance penalty"
be in the ballpark of a checked exception? Being that they tend to bubble
through things uncaught do they do more danger than good?

Re: Proposal - 3.5.1

2016-09-16 Thread Edward Capriolo

If you all have never seen the movie "grandma's boy" I suggest it.

https://www.youtube.com/watch?v=uJLQ5DHmw-U

There is one funny seen where the product/project person says something
like, "The game is ready. We have fixed ALL THE BUGS". The people who made
the movie probably think the coders doing dance dance revolution is funny.
To me the funniest part of the movie is the summary statement that "all the
bugs are fixed".

I agree with Sylvain, that cutting branches really has nothing to do with
"quality". Quality like "production ready" is hard to define.

I am phrasing this next part as questions to encourage deep thought not to
be a jerk.

Someone jokingly said said 3.0 was the "break everything" release. What if
4.0 was the "fix everything" release?
What would that mean?
What would we need?
No new features for 6 months?
A vast network of amazon machines to test things?
Jepsen ++?
24 hour integration tests that run CAS operations across a multi-node mixed
version cluster while we chaos monkey nodes?
Could we keep busy for 6 months just looking at the code and fix all the
bugs for Mr. Cheezle?
Could we fix ALL THE BUGS and then from that day it is just feature,
feature, feature?
We sit there and join and unjoin nodes for 2 days while running stress and
at the end use the map reduce export and prove that not a single datum was
lost?

On Fri, Sep 16, 2016 at 2:42 PM, Sylvain Lebresne 
wrote:

> On Fri, Sep 16, 2016 at 6:59 PM, Blake Eggleston 
> wrote:
>
> > Clearly, we won’t get to this point right away, but it should definitely
> > be a goal.
> >
>
> I'm not entirely clear on why anyone would read in what I'm saying that it
> shouldn't be a goal. I'm a huge proponent of this and of putting emphasis
> on quality in general, and because it's Friday night and I'm tired, I'm
> gonna add that I think I have a bigger track record of actually acting on
> improving quality for Cassandra than anyone else that is putting word in my
> mouth.
>
> Mainly, I'm suggesting that we don't have to tie the existence of a clearly
> labeled stable branch (useful to user, especially newcomers) to future
> improvement in the "releasability" of trunk in our design of a new release
> cycle. If we do so, but releasability don't improve as quickly as we'd
> hope, we penalize users in the end. Adopting a release cycle that ensure
> said clearly labeled stable branch does exist no matter the rate of
> improvement to the level of "trunk" releasibility is feels safer, and
> doesn't preclude any effort in improving said releasibilty, nor
> re-evaluating this in 1-2 year to move to release stable releases from
> trunk directly if we have proven we're there.
>
>
>
> >
> > On September 16, 2016 at 9:04:03 AM, Sylvain Lebresne (
> > sylv...@datastax.com) wrote:
> >
> > On Fri, Sep 16, 2016 at 5:18 PM, Jonathan Haddad 
> > wrote:
> >
> > >
> > > This is a different mentality from having a "features" branch, where
> it's
> > > implied that at times it's acceptable that it not be stable.
> >
> >
> > I absolutely never implied that, though I willingly admit my choice of
> > branch
> > names may be to blame. I 100% agree that no releases should be done
> > without a green test board moving forward and if something was implicit
> > in my 'feature' branch proposal, it was that.
> >
> > Where we might not be in the same page is that I just don't believe it's
> > reasonable to expect the project will get any time soon in a state where
> > even a green test board release (with new features) meets the "can be
> > confidently put into production". I'm not even sure it's reasonable to
> > expect from *any* software, and even less so for an open-source
> > project based on volunteering. Not saying it wouldn't be amazing, it
> > would, I just don't believe it's realistic. In a way, the reason why I
> > think
> > tick-tock doesn't work is *exactly* because it's based on that
> unrealistic
> > assumption.
> >
> > Of course, I suppose that's kind of my opinion. I'm sure some will think
> > that the "historical trend" of release instability is simply due to a
> lack
> > of
> > effort (obviously Cassandra developers don't give a shit about users,
> that
> > must the simplest explanation).
> >
>

Re: Proposal - 3.5.1

2016-09-16 Thread Edward Capriolo

"The historical trend with the Cassandra codebase has been to test
minimally,
throw the code over the wall, and get feedback from people putting it in
prod who run into issues."

At the summit Brandon and a couple others were making fun over range
tombstones from thrift
https://issues.apache.org/jira/browse/CASSANDRA-5435

I added the thrift support based on code already in trunk. But there was
something ugly bit in there
and far on down the line someone else stuck with an edge case and had to
fix it. Now, I actually added a number
of tests, unit test, and nosetests. I am sure the range tombstones also had
their own set of tests at the storage level.

So as Brandon was making fun of me, I was thinking to myself, "Well I did
not make the bug, I just made it possible for others to find it! So I am
helping!"

The next time I submit a thrift patch I am going to write 5x the unit tests
jk :)

On Fri, Sep 16, 2016 at 11:18 AM, Jonathan Haddad  wrote:

> I've worked on a few projects where we've had a branch that new stuff went
> in before merging to master / trunk.  What you've described reminds me a
> lot of git-flow (http://nvie.com/posts/a-successful-git-branching-model/)
> although not quite the same.  I'll be verbose in this email to minimize the
> reader's assumptions.
>
> The goals of the release cycle should be (in descending order of priority):
>
> 1. Minimize bugs introduced through change
> 2. Allow the codebase to iterate quickly
> 3. Not get caught up in a ton of back porting bug fixes
>
> There is significant benefit to having a releasable trunk.  This is
> different from a trunk which is constantly released.  A releasable trunk
> simply means all tests should *always* pass and PMC & committers should
> feel confident that they could actually put it in prod for a project that
> actually matters.  Having it always be releasable (all tests pass, etc)
> means people can at least test the DB on sample data or evaluate it before
> the release happens, and get feedback to the team when there are bugs.
>
> This is a different mentality from having a "features" branch, where it's
> implied that at times it's acceptable that it not be stable.  The
> historical trend with the Cassandra codebase has been to test minimally,
> throw the code over the wall, and get feedback from people putting it in
> prod who run into issues.  In my experience I have found a general purpose
> "features" branch to result in poorly quality codebases.  It's shares a lot
> of the same problems as the 1+ year release cycle did previously, with
> things getting merged in and then an attempt to stabilize later.
>
> Improving the state of testing in trunk will catch more bugs, satisfying
> #1, which naturally leads to #2, and by reducing bugs before they get
> released #3 will happen over time.
>
> My suggestion for a *supported* feature release every 3 months (could just
> as well be 4 or 6) mixed with Benedict's idea of frequent non-supported
> releases (tagged as alpha).  Supported releases should get ~6 months worth
> of bug fixes, which if done right, will decrease over time due to a
> hopefully more stable codebase.  I 100% agree with Mick that semver makes
> sense here, it's not just for frameworks.  Major.Minor.Patch is well
> understood and is pretty standard throughout the world, I don't think we
> need to reinvent versioning.
>
> TL;DR:
> Release every 3 months
> Support for 6
> Keep a stable trunk
> New features get merged into trunk but the standard for code quality and
> testing needs to be property defined as something closer to "production
> ready" rather than "let the poor user figure it out"
>
> Jon
>
>
>
>
>
>
>
> On Fri, Sep 16, 2016 at 3:05 AM Sylvain Lebresne 
> wrote:
>
> > As probably pretty much everyone at this point, I agree the tick-tock
> > experiment
> > isn't working as well as it should and that it's probably worth course
> > correcting. I happen to have been thinking about this quite a bit already
> > as it
> > turns out so I'm going to share my reasoning and suggestion below, even
> > though
> > it's going to be pretty long, in the hope it can be useful (and if it
> > isn't, so
> > be it).
> >
> > My current thinking is that a good cycle should accommodate 2 main
> > constraints:
> >   1) be useful for users
> >   2) be realistic/limit friction on the development side
> > and let me develop what I mean by both points slightly first.
> >
> > I think users mostly want 2 things out of the release schedule: they
> want a
> > clearly labeled stable branch to know what they should run into
> production,
> > and
> > they want new features and improvements. Let me clarify that different
> > users
> > will want those 2 in different degrees and with variation over time, but
> I
> > believe it's mainly some combination of those. On the development side, I
> > don't
> > think it's realistic to expect more than 2/3 branches/series to be
> > supported at
> > any one time

Re: Proposal - 3.5.1

2016-09-15 Thread Edward Capriolo

Where did we come from?

We came from a place where we would say, "You probably do not want to run
2.0.X until it reaches 2.0.6"

One thing about Cassandra is we get into a situation where we can only go
forward. For example, when you update from version X to version Y, version
Y might start writing a new versions of sstables.

X - sstables-v1
Y - sstables-v2

This is very scary operations side because you can not bring the the system
back to running version X as Y data is unreadable.

Where are we at now?

We now seem to be in a place where you say "Problem in 3.5 (trunk at a
given day)?,  go to 3.9 (trunk at last tt- release) "

http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/

"To get there, we are investing significant effort in making trunk “always
releasable,” with the goal that each release, or at least each odd-numbered
bugfix release, should be usable in production. "

I support releasable trunk, but the qualifying statement "or at least each
odd number release" undoes the assertion of "always releasable". Not trying
to nit pick here. I realize it may be hard to get to the desired state of
releasable trunk in a short time.

Anecdotally I notice a lot of "movement" in class names/names of functions.
Generally, I can look at a stack trace of a piece of software and I can
bring up the line number in github and it is dead on, or fairly close to
the line of code. Recently I have tried this in versions fairly close
together and seen some drastic changes.

We know some things i personally do not like:
1) lack of stable-ish api's in the codebase
2) use of singletons rather than simple dependency injection (like even
constructor based injection)

IMHO these do not fit well with 'release often' and always produce 'high
quality release'.

I do not love the concept of 'bug fix release' I would not mind waiting
longer for a feature as long as I could have a high trust factor in in
working right the first time.

Take a feature like trickle_fs, By the description it sounds like a clear
optimization win. It is off by default. The description says "turn on for
ssd" but elsewhere in the configuration # disk_optimization_strategy: ssd.
Are we tuning for ssd by default or not?

By being false, it is not tested in wild, how is it covered and trusted
during tests, how many tests have it off vs on?

I think the concept that trickle_fs can be added as a feature, set false
and possibly gains real world coverage is not comforting to me. I do not
want to turn it on and get some weird issue because no one else is running
this. I would rather it be added on by default with extreme confidence or
not added at all.

On Thu, Sep 15, 2016 at 1:37 AM, Jonathan Haddad  wrote:

> In this particular case, I'd say adding a bug fix release for every version
> that's affected would be the right thing.  The issue is so easily
> reproducible and will likely result in massive data loss for anyone on 3.X
> WHERE X < 6 and uses the "date" type.
>
> This is how easy it is to reproduce:
>
> 1. Start Cassandra 3.5
> 2. create KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': 1};
> 3. use test;
> 4. create table fail (id int primary key, d date);
> 5. delete d from fail where id = 1;
> 6. Stop Cassandra
> 7. Start Cassandra
>
> You will get this, and startup will fail:
>
> ERROR 05:32:09 Exiting due to error while processing commit log during
> initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$
> CommitLogReplayException:
> Unexpected error deserializing mutation; saved to
> /var/folders/0l/g2p6cnyd5kx_1wkl83nd3y4rgn/T/
> mutation6313332720566971713dat.
> This may be caused by replaying a mutation against a table with the same
> name but incompatible schema.  Exception follows:
> org.apache.cassandra.serializers.MarshalException: Expected 4 byte long
> for
> date (0)
>
> I mean.. come on.  It's an easy fix.  It cleanly merges against 3.5 (and
> probably the other releases) and requires very little investment from
> anyone.
>
>
> On Wed, Sep 14, 2016 at 9:40 PM Jeff Jirsa 
> wrote:
>
> > We did 3.1.1 and 3.2.1, so there’s SOME precedent for emergency fixes,
> but
> > we certainly didn’t/won’t go back and cut new releases from every branch
> > for every critical bug in future releases, so I think we need to draw the
> > line somewhere. If it’s fixed in 3.7 and 3.0.x (x >= 6), it seems like
> > you’ve got options (either stay on the tick and go up to 3.7, or bail
> down
> > to 3.0.x)
> >
> > Perhaps, though, this highlights the fact that tick/tock may not be the
> > best option long term. We’ve tried it for a year, perhaps we should
> instead
> > discuss whether or not it should continue, or if there’s another process
> > that gives us a better way to get useful patches into versions people are
> > willing to run in production.
> >
> >
> >
> > On 9/14/16, 8:55 PM, "Jonathan Haddad"  wrote:
> >
> > >Common

Re: Github pull requests

2016-08-29 Thread Edward Capriolo

>> I think it goes the other way around. When you push to ASF git with the
right commit message then the integration from that side closes the pull
request.

Yes. This is how apache-gossip is setup. Someone makes a JIRA and they
include a link to there branch and tell me they are done. We review

git checkout apache master
git pull otherperson jira-123
git push origin master

Ticket on github is "magically" closed.

On Mon, Aug 29, 2016 at 8:45 AM, J. D. Jordan 
wrote:

> I think it goes the other way around. When you push to ASF git with the
> right commit message then the integration from that side closes the pull
> request.
>
> > On Aug 28, 2016, at 11:48 PM, Jonathan Ellis  wrote:
> >
> > Don't we need something on the infra side to turn a merged pull request
> > into a commit to the ASF repo?
> >
> > On Sun, Aug 28, 2016 at 11:07 PM, Nate McCall 
> > wrote:
> >
> >>>
> >>>
> >>> Infra is exploring options for giving PMCs greater control over GitHub
> >>> config (including allowing GitHub to be the master with a golden copy
> >>> held at the ASF) but that is a work in progress.
> >> ^  Per Mark's comment, there is not anything we can really do past what
> >> Jake F. described with Thrift. We dealt with this with Usergrid back in
> >> incubation two years ago (Jake F. actually helped us get it all sorted
> at
> >> the time) when we were using https://github.com/usergrid/usergrid as
> the
> >> source:
> >> http://mail-archives.apache.org/mod_mbox/usergrid-dev/201405.mbox/%
> >> 3CCANyrgvdTVzZQD7w3C96LUHa=h7-h4qmu4h7ajsxoat0gd0f...@mail.gmail.com%3E
> >>
> >> Here is the Thrift guide again for reference:
> >> https://github.com/apache/thrift/blob/master/
> CONTRIBUTING.md#contributing-
> >> via-github-pull-requests
> >>
> >> JClouds also has a nice write up/how-to (we based Usergrid on this,
> >> initially):
> >> https://cwiki.apache.org/confluence/display/JCLOUDS/Git+workflow
> >>
> >> Maybe we just amend our 'how-to-commit' with similar details as the two
> >> references above?
> >> http://cassandra.apache.org/doc/latest/development/how_to_commit.html
> >>
> >> -Nate
> >>
> >> On Mon, Aug 29, 2016 at 10:44 AM, Nate McCall 
> >> wrote:
> >>
> >>>
>  Nate, since you have experience with this from Usergrid, can you
> figure
>  out
>  what we need to do to make this happen and follow up with infra?
> >>>
> >>> Yep - i'll look into this.
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder, http://www.datastax.com
> > @spyced
>

Re: #cassandra-dev IRC logging

2016-08-26 Thread Edward Capriolo

Yes. I did. My bad.

On Fri, Aug 26, 2016 at 4:07 PM, Jason Brown <jasedbr...@gmail.com> wrote:

> Ed, did you mean this to post this to the other active thread today, the
> one about github pull requests? (just want to make sure I'm understanding
> correctly :) )
>
> On Fri, Aug 26, 2016 at 12:28 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
> > One thing to watch out for. The way apache-gossip is setup the PR's get
> > sent to the dev list. However the address is not part of the list so the
> > project owners get an email asking to approve/reject every PR and comment
> > on the PR.
> >
> > This is ok because we have a small quite group but you probably do not
> want
> > that with the number of SCM changes in the cassandra project.
> >
> > On Fri, Aug 26, 2016 at 3:05 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> > wrote:
> >
> > > +1 to both as well
> > >
> > > On 8/26/16, 11:59 AM, "Tyler Hobbs" <ty...@datastax.com> wrote:
> > >
> > > >+1 on doing this and using ASFBot in particular.
> > > >
> > > >On Fri, Aug 26, 2016 at 1:40 PM, Jason Brown <jasedbr...@gmail.com>
> > > wrote:
> > > >
> > > >> @Dave ASFBot looks like a winner. If others are on board with this,
> I
> > > can
> > > >> work on getting it up and going.
> > > >>
> > > >> On Fri, Aug 26, 2016 at 11:27 AM, Dave Lester <
> dave_les...@apple.com>
> > > >> wrote:
> > > >>
> > > >> > +1. Check out ASFBot for logging IRC, along with other
> > > integrations.[1]
> > > >> >
> > >
> > >
> >
>

Re: #cassandra-dev IRC logging

2016-08-26 Thread Edward Capriolo

One thing to watch out for. The way apache-gossip is setup the PR's get
sent to the dev list. However the address is not part of the list so the
project owners get an email asking to approve/reject every PR and comment
on the PR.

This is ok because we have a small quite group but you probably do not want
that with the number of SCM changes in the cassandra project.

On Fri, Aug 26, 2016 at 3:05 PM, Jeff Jirsa 
wrote:

> +1 to both as well
>
> On 8/26/16, 11:59 AM, "Tyler Hobbs"  wrote:
>
> >+1 on doing this and using ASFBot in particular.
> >
> >On Fri, Aug 26, 2016 at 1:40 PM, Jason Brown 
> wrote:
> >
> >> @Dave ASFBot looks like a winner. If others are on board with this, I
> can
> >> work on getting it up and going.
> >>
> >> On Fri, Aug 26, 2016 at 11:27 AM, Dave Lester 
> >> wrote:
> >>
> >> > +1. Check out ASFBot for logging IRC, along with other
> integrations.[1]
> >> >
>
>

Re: Cassandra Java Driver and DataStax

2016-06-08 Thread Edward Capriolo

What a fun topic. I re-joined the list just for this.

As I understand, it the nature of the Apache Software Licence any corporate
entity is allows to produce open and closed source software based on Apache
Cassandra, however the Cassandra name is a trademark of the ASF foundation.

As I under it, any corporation or person is free to maintain any
documentation about the software in a public or private form.

IMHO the Apache Cassandra wiki is in a sad state, and Corporate site X has
better material, but that is not an indictment of  Corporation X.

I will leave planetcassandra.org to be its own issue.

If someone were to propose a Java/Python driver to be included in the
source code of Cassandra, and said driver were rejected that would be a
clear red flag.

There are several awkward things about the driver being found at somewhere
else. These are all hypothetical but have practical implications.
Following the 'itch to scratch' philosophy perhaps I want to write the
driver in UDP for max performance. Right now even if it were implemented in
the database you have a situation where the driver living over there
ultimately is a VETO, you really can not accomplish one without there other
and they have to move lock step to do reasonable development.

There is a saying in apache something like "if it did not happen on the
list/in jira it did not happen." We have to ask ourselves honestly:

Q: Is it possible that technical writers "over there" are able to come up
with better documentation than the project itself?

A: Yes I wrote the Apache Hive book, and I believe it was more up to date
and complete than the documentation at the time

Q: Is that happening here? Who is to say?

Q: Is the CQL spec "written" in code or in documentation good enough for
someone to reasonable re-create the protocol?

Paraphrased things said on this thread that make me laugh, cry, nod:

"There are plenty of drivers like Kundera, hector"

These projects have been killed off by people as they are unable to keep up
with ever changing cassandra client specs. Thrift 0.6 -> 07 breaking
changes, CQL and the entire deprecation of thrift and the original data
model the database was built around.

"Web server X does not come with a web browser"
This is an established protocol for 20+ years and reasonable clients
already exist. That is not building a new protocol and implementation that
is conforming to an exist one apply the Apache logic to google Spdy.

"Postres does it like X"
Someone else pointed it out, but this ain't postgres, and this ain't
mongohq. The Apache licence and the Apache way are different things.

"No one at company X commits my patches because I dont work there"
As the minority (non facebook) hive committer for years I can tell you,
"wink wink"

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-13 Thread Edward Capriolo

There was a paging bug in 2.0 and a user just reported a bug sorting a one
row dataset.

So if you want to argue cql has surpassed thrift in all ways, one way it
clearly has not is correctness.

To demonatrate, search the changelog for cql bugs that return wrong result.
Then do the same search for thrift bugs that return the wrong result and
compare.

If nubes to the ml can pick up bugs and performance regressions it is a
serious issue.

On Wednesday, March 12, 2014, Jonathan Ellis jbel...@gmail.com wrote:
 I don't know if an IN query already does this without source diving,
 but it could certainly do so without needing extra syntax.

 On Wed, Mar 12, 2014 at 7:16 PM, Nicolas Favre-Felix nico...@acunu.com
wrote:
 If any new use cases
 come to light that can be done with Thrift but not CQL, we will commit
 to supporting those in CQL.

 Hello,

 (going back to the original topic...)

 I just wanted to point out that there is in my opinion an important
 use case that is doable in Thrift but not in CQL, which is to fetch
 several CQL rows from the same partition in a single isolated read. We
 lose the benefit of partition-level isolation if there is no way to
 read rows together.
 Of course we can perform range queries and even scan over
 multi-dimensional clustering keys with CASSANDRA-4851, but we still
 can't fetch rows using a set of clustering keys.

 I couldn't find a JIRA for this feature, does anyone know if there is
one?

 Cheers,
 Nicolas

 --
 For what it's worth, +1 on freezing Thrift.



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Edward Capriolo

, I don't know of any use cases for Thrift that can't be
 done in CQL

Can dynamic composites be used from CQL?


On Wed, Mar 12, 2014 at 4:44 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 +1 to Jonathan's proposal.


 On Tue, Mar 11, 2014 at 6:00 PM, Jonathan Ellis jbel...@gmail.com wrote:

  CQL3 is almost two years old now and has proved to be the better API
  that Cassandra needed.  CQL drivers have caught up with and passed the
  Thrift ones in terms of features, performance, and usability.  CQL is
  easier to learn and more productive than Thrift.
 
  With static columns and LWT batch support [1] landing in 2.0.6, and
  UDT in 2.1 [2], I don't know of any use cases for Thrift that can't be
  done in CQL.  Contrawise, CQL makes many things easy that are
  difficult to impossible in Thrift.  New development is overwhelmingly
  done using CQL.
 
  To date we have had an unofficial and poorly defined policy of add
  support for new features to Thrift when that is 'easy.'  However,
  even relatively simple Thrift changes can create subtle complications
  for the rest of the server; for instance, allowing Thrift range
  tombtones would make filter conversion for CASSANDRA-6506 more
  difficult.
 
  Thus, I think it's time to officially close the book on Thrift.  We
  will retain it for backwards compatibility, but we will commit to
  adding no new features or changes to the Thrift API after 2.1.0.  This
  will help send an unambiguous message to users and eliminate any
  remaining confusion from supporting two APIs.  If any new use cases
  come to light that can be done with Thrift but not CQL, we will commit
  to supporting those in CQL.
 
  (To a large degree, this merely formalizes what is already de facto
  reality.  Most thrift clients have not even added support for
  atomic_batch_mutate and cas from 2.0, and popular clients like
  Astyanax are migrating to the native protocol.)
 
  Reasonable?
 
  [1] https://issues.apache.org/jira/browse/CASSANDRA-6561
  [2] https://issues.apache.org/jira/browse/CASSANDRA-5590
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder, http://www.datastax.com
  @spyced

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Edward Capriolo

I am glad the project has is adoptimg unambigious language of their
position. It is nice to have the clarity that volunteer efforts to add
features to thrift will be rejected.

This is a shining example of how a volunteer  apache software foundation
project should be run. if users are attempting to add features, call a vote
and add language to stop them.

+1
On Wednesday, March 12, 2014, Sylvain Lebresne sylv...@datastax.com wrote:
 On Wed, Mar 12, 2014 at 1:38 PM, Edward Capriolo edlinuxg...@gmail.com
wrote:

 , I don't know of any use cases for Thrift that can't be
  done in CQL

 Can dynamic composites be used from CQL?


 Sure, you can use any AbstractType Class you want as type in CQL the same
 way you
 would do it with the thrift API.

 --
 Sylvain





 On Wed, Mar 12, 2014 at 4:44 AM, Sylvain Lebresne sylv...@datastax.com
 wrote:

  +1 to Jonathan's proposal.
 
 
  On Tue, Mar 11, 2014 at 6:00 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
   CQL3 is almost two years old now and has proved to be the better API
   that Cassandra needed.  CQL drivers have caught up with and passed
the
   Thrift ones in terms of features, performance, and usability.  CQL is
   easier to learn and more productive than Thrift.
  
   With static columns and LWT batch support [1] landing in 2.0.6, and
   UDT in 2.1 [2], I don't know of any use cases for Thrift that can't
be
   done in CQL.  Contrawise, CQL makes many things easy that are
   difficult to impossible in Thrift.  New development is overwhelmingly
   done using CQL.
  
   To date we have had an unofficial and poorly defined policy of add
   support for new features to Thrift when that is 'easy.'  However,
   even relatively simple Thrift changes can create subtle complications
   for the rest of the server; for instance, allowing Thrift range
   tombtones would make filter conversion for CASSANDRA-6506 more
   difficult.
  
   Thus, I think it's time to officially close the book on Thrift.  We
   will retain it for backwards compatibility, but we will commit to
   adding no new features or changes to the Thrift API after 2.1.0.
 This
   will help send an unambiguous message to users and eliminate any
   remaining confusion from supporting two APIs.  If any new use cases
   come to light that can be done with Thrift but not CQL, we will
commit
   to supporting those in CQL.
  
   (To a large degree, this merely formalizes what is already de facto
   reality.  Most thrift clients have not even added support for
   atomic_batch_mutate and cas from 2.0, and popular clients like
   Astyanax are migrating to the native protocol.)
  
   Reasonable?
  
   [1] https://issues.apache.org/jira/browse/CASSANDRA-6561
   [2] https://issues.apache.org/jira/browse/CASSANDRA-5590
  
   --
   Jonathan Ellis
   Project Chair, Apache Cassandra
   co-founder, http://www.datastax.com
   @spyced
  
 



-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo

I am -1. For a few reasons:

Cassandra will be the only database ( that I know of ) where the only
official client to the database will live in source control outside of the
project. I would like some clarity on this development will go on in an
open source fashion. Namely:

1) Who does and how do they do regression testing between the database
server and the client? I.E. are the bugs on the client or in the server
hard to say when there is no official client.
2) How can an open source apache project depend on a non apache managed
resource to accomplish basic development? IE if there is a cassandra
committer that does not have commit on the driver source code get work done?
3) Who has the final word on how a feature is implemented in the native
protocol? Imagine there are two implementations of CQL native-cql-ruby and
native-cql-java. Let's say these libraries have both interpreted the
transport spec differently. One of them has to be broken to fix the
problem. Who resolves this issue and how?

With static columns and LWT batch support [1] landing in 2.0.6, and
 UDT in 2.1 [2], I don't know of any use cases for Thrift that can't be
 done in CQL.

Do we mean CQL the transport, CQL the storage engine, CQL the procedure
engine (auto timestamps), or CQL the language? :)  Its hard for thrift to
do things when specific read before write list collection operations
are impossible to do from a transport.

To a large degree, this merely formalizes what is already de facto
reality.  Most thrift clients have not even added support for
atomic_batch_mutate and cas from 2.0, and popular clients like Astyanax are
migrating to the native protocol.

This is such a loaded statement, most committers have not even committed
to adding features to thrift. Take for example 
https://issues.apache.org/jira/browse/CASSANDRA-5435; adding range
tombstones to thrift was actually a very simple effort. One day I just got
off my couch and went through the simple effort of pushing this along. What
is happening is a self fulfilling prophecy, if everyone throws tons of
development effort in one direction unsurprisingly the other direction lags
behind.



On Tue, Mar 11, 2014 at 1:43 PM, Gary Dusbabek gdusba...@gmail.com wrote:

 +1



 On Tue, Mar 11, 2014 at 12:00 PM, Jonathan Ellis jbel...@gmail.com
 wrote:

  CQL3 is almost two years old now and has proved to be the better API
  that Cassandra needed.  CQL drivers have caught up with and passed the
  Thrift ones in terms of features, performance, and usability.  CQL is
  easier to learn and more productive than Thrift.
 
  With static columns and LWT batch support [1] landing in 2.0.6, and
  UDT in 2.1 [2], I don't know of any use cases for Thrift that can't be
  done in CQL.  Contrawise, CQL makes many things easy that are
  difficult to impossible in Thrift.  New development is overwhelmingly
  done using CQL.
 
  To date we have had an unofficial and poorly defined policy of add
  support for new features to Thrift when that is 'easy.'  However,
  even relatively simple Thrift changes can create subtle complications
  for the rest of the server; for instance, allowing Thrift range
  tombtones would make filter conversion for CASSANDRA-6506 more
  difficult.
 
  Thus, I think it's time to officially close the book on Thrift.  We
  will retain it for backwards compatibility, but we will commit to
  adding no new features or changes to the Thrift API after 2.1.0.  This
  will help send an unambiguous message to users and eliminate any
  remaining confusion from supporting two APIs.  If any new use cases
  come to light that can be done with Thrift but not CQL, we will commit
  to supporting those in CQL.
 
  (To a large degree, this merely formalizes what is already de facto
  reality.  Most thrift clients have not even added support for
  atomic_batch_mutate and cas from 2.0, and popular clients like
  Astyanax are migrating to the native protocol.)
 
  Reasonable?
 
  [1] https://issues.apache.org/jira/browse/CASSANDRA-6561
  [2] https://issues.apache.org/jira/browse/CASSANDRA-5590
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder, http://www.datastax.com
  @spyced

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo

With support officially deprecated that will be the only way to go. If a
user wants to add a function to thrift they will have to fork off
cassandra, code the function themselves write the internals, manage the
internals. I see this as being a very hard task because the server could
change rapidly with no regards to them. Also this could cause a
proliferation of functions. Could you imagine a thrift server with 300
methods :). This is why I think keeping the support in trunk and carefully
adding things would be sane, but seemingly no one wants to support it at
all so a fork is probably in order.


On Tue, Mar 11, 2014 at 7:46 PM, Russ Bradberry rbradbe...@gmail.comwrote:

 I would like to suggest the possibility of having the interface somewhat
 pluggable so another project can provide the Thrift interface as a drop in
 JAR. Thoughts?

 Sent from my iPhone

  On Mar 11, 2014, at 7:26 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
 
  If you are using thrift there probably isn't a reason to upgrade to 2.1
 
  What? Upgrading gets you performance regardless of your api.
 
  We have already gone from no new feature talk to less enphisis on
  testing.
 
  How comforting.
  On Tuesday, March 11, 2014, Dave Brosius dbros...@mebigfatguy.com
 wrote:
 
  +1,
 
  altho supporting thrift in 2.1 seems overly conservative.
 
  If you are using thrift there probably isn't a reason to upgrade to 2.1,
  in fact doing so will become an increasingly dumb idea as lesser and
 lesser
  emphasis will be placed on testing with 2.1+. This would allow us to
  greatly simplify the code footprint in 2.1
 
 
 
 
  On 03/11/2014 01:00 PM, Jonathan Ellis wrote:
 
  CQL3 is almost two years old now and has proved to be the better API
  that Cassandra needed.  CQL drivers have caught up with and passed the
  Thrift ones in terms of features, performance, and usability.  CQL is
  easier to learn and more productive than Thrift.
 
  With static columns and LWT batch support [1] landing in 2.0.6, and
  UDT in 2.1 [2], I don't know of any use cases for Thrift that can't be
  done in CQL.  Contrawise, CQL makes many things easy that are
  difficult to impossible in Thrift.  New development is overwhelmingly
  done using CQL.
 
  To date we have had an unofficial and poorly defined policy of add
  support for new features to Thrift when that is 'easy.'  However,
  even relatively simple Thrift changes can create subtle complications
  for the rest of the server; for instance, allowing Thrift range
  tombtones would make filter conversion for CASSANDRA-6506 more
  difficult.
 
  Thus, I think it's time to officially close the book on Thrift.  We
  will retain it for backwards compatibility, but we will commit to
  adding no new features or changes to the Thrift API after 2.1.0.  This
  will help send an unambiguous message to users and eliminate any
  remaining confusion from supporting two APIs.  If any new use cases
  come to light that can be done with Thrift but not CQL, we will commit
  to supporting those in CQL.
 
  (To a large degree, this merely formalizes what is already de facto
  reality.  Most thrift clients have not even added support for
  atomic_batch_mutate and cas from 2.0, and popular clients like
  Astyanax are migrating to the native protocol.)
 
  Reasonable?
 
  [1] https://issues.apache.org/jira/browse/CASSANDRA-6561
  [2] https://issues.apache.org/jira/browse/CASSANDRA-5590
 
  --
  Sorry this was sent from mobile. Will do less grammar and spell check
 than
  usual.

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo

I meant to say that thrift provides a facade over the StorageProxy. Without
thrift the only user of the cassandra engine would be CQL. At that point
the storage engine would likely evolve less usable and plugable. Thrift
has it easy because it has friendly methods like
StorageProxy.batch_mutate() to call. Without that project level support
many of the things that plugable_application_x would want to call buried
inside a set of interfaces that are designed only with the CQL use case in
mind. In a simple case imagine something you want inside
cool_new_interface_x is marked private in cassandra. You then need to fork
the code, or convince upstream to make it accessible.

BTW I think you know, but I already took a stab at what your describing,
pluggable, rest, and jvm language (https://github.com/zznate/intravert-ug)


On Tue, Mar 11, 2014 at 8:16 PM, Russell Bradberry rbradbe...@gmail.comwrote:

 I didn't mean a someone should maintain a fork of Cassandra. More like
 something that could be dropped in. Just like clients have to keep up with
 the server, a project like this would also.  I think if the interface was
 pluggable it would also allow others to expand and come up with new
 interfaces that can possibly expand the user base.  One example would be a
 built in REST interface that doesn't rely on an external web server that
 translates requests to CQL, just drop in a JAR and the interface comes
 available.

 This would also lend itself to allow anyone to write an interface in any
 (JVM) language they want, if they want to add external stored procedures
 via this interface then they would be able to.   I'm for the removal of
 Thrift in the trunk, but I think there is a use-case for an extensible
 interface.

 I still seem to remember there was a few angry users when Avro was removed.


 On Tue, Mar 11, 2014 at 8:04 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

  With support officially deprecated that will be the only way to go. If a
  user wants to add a function to thrift they will have to fork off
  cassandra, code the function themselves write the internals, manage the
  internals. I see this as being a very hard task because the server could
  change rapidly with no regards to them. Also this could cause a
  proliferation of functions. Could you imagine a thrift server with 300
  methods :). This is why I think keeping the support in trunk and
 carefully
  adding things would be sane, but seemingly no one wants to support it at
  all so a fork is probably in order.
 
 
  On Tue, Mar 11, 2014 at 7:46 PM, Russ Bradberry rbradbe...@gmail.com
  wrote:
 
   I would like to suggest the possibility of having the interface
 somewhat
   pluggable so another project can provide the Thrift interface as a drop
  in
   JAR. Thoughts?
  
   Sent from my iPhone
  
On Mar 11, 2014, at 7:26 PM, Edward Capriolo edlinuxg...@gmail.com
   wrote:
   
If you are using thrift there probably isn't a reason to upgrade to
 2.1
   
What? Upgrading gets you performance regardless of your api.
   
We have already gone from no new feature talk to less enphisis on
testing.
   
How comforting.
On Tuesday, March 11, 2014, Dave Brosius dbros...@mebigfatguy.com
   wrote:
   
+1,
   
altho supporting thrift in 2.1 seems overly conservative.
   
If you are using thrift there probably isn't a reason to upgrade to
  2.1,
in fact doing so will become an increasingly dumb idea as lesser and
   lesser
emphasis will be placed on testing with 2.1+. This would allow us to
greatly simplify the code footprint in 2.1
   
   
   
   
On 03/11/2014 01:00 PM, Jonathan Ellis wrote:
   
CQL3 is almost two years old now and has proved to be the better
 API
that Cassandra needed.  CQL drivers have caught up with and passed
  the
Thrift ones in terms of features, performance, and usability.  CQL
 is
easier to learn and more productive than Thrift.
   
With static columns and LWT batch support [1] landing in 2.0.6, and
UDT in 2.1 [2], I don't know of any use cases for Thrift that can't
  be
done in CQL.  Contrawise, CQL makes many things easy that are
difficult to impossible in Thrift.  New development is
 overwhelmingly
done using CQL.
   
To date we have had an unofficial and poorly defined policy of add
support for new features to Thrift when that is 'easy.'  However,
even relatively simple Thrift changes can create subtle
 complications
for the rest of the server; for instance, allowing Thrift range
tombtones would make filter conversion for CASSANDRA-6506 more
difficult.
   
Thus, I think it's time to officially close the book on Thrift.  We
will retain it for backwards compatibility, but we will commit to
adding no new features or changes to the Thrift API after 2.1.0.
   This
will help send an unambiguous message to users and eliminate any
remaining confusion from supporting two APIs.  If any new use cases

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo

I can agree with not liking the construction kit approach.

Redis http://redis.io/commands 40 plus commands over telnet.

elastic search: json over http:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-search.html

couch db: json over http and javascript:
http://docs.couchdb.org/en/latest/intro/tour.html

mongo db: json over binary api, with javascript and in database map reduce.

At this point it is just different strokes for different folks, some people
want a query api because they dont get nosql and some dont.


On Tue, Mar 11, 2014 at 8:35 PM, Jonathan Ellis jbel...@gmail.com wrote:

 I don't think we're well-served by the construction kit approach.
 It's difficult enough to evaluate NoSQL without deciding if you should
 run CQLSandra or Hectorsandra or Intravertandra etc.

 On Tue, Mar 11, 2014 at 7:16 PM, Russell Bradberry rbradbe...@gmail.com
 wrote:
  I didn't mean a someone should maintain a fork of Cassandra. More like
  something that could be dropped in. Just like clients have to keep up
 with
  the server, a project like this would also.  I think if the interface was
  pluggable it would also allow others to expand and come up with new
  interfaces that can possibly expand the user base.  One example would be
 a
  built in REST interface that doesn't rely on an external web server that
  translates requests to CQL, just drop in a JAR and the interface comes
  available.
 
  This would also lend itself to allow anyone to write an interface in any
  (JVM) language they want, if they want to add external stored procedures
  via this interface then they would be able to.   I'm for the removal of
  Thrift in the trunk, but I think there is a use-case for an extensible
  interface.
 
  I still seem to remember there was a few angry users when Avro was
 removed.
 
 
  On Tue, Mar 11, 2014 at 8:04 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
 
  With support officially deprecated that will be the only way to go. If a
  user wants to add a function to thrift they will have to fork off
  cassandra, code the function themselves write the internals, manage the
  internals. I see this as being a very hard task because the server could
  change rapidly with no regards to them. Also this could cause a
  proliferation of functions. Could you imagine a thrift server with 300
  methods :). This is why I think keeping the support in trunk and
 carefully
  adding things would be sane, but seemingly no one wants to support it at
  all so a fork is probably in order.
 
 
  On Tue, Mar 11, 2014 at 7:46 PM, Russ Bradberry rbradbe...@gmail.com
  wrote:
 
   I would like to suggest the possibility of having the interface
 somewhat
   pluggable so another project can provide the Thrift interface as a
 drop
  in
   JAR. Thoughts?
  
   Sent from my iPhone
  
On Mar 11, 2014, at 7:26 PM, Edward Capriolo edlinuxg...@gmail.com
 
   wrote:
   
If you are using thrift there probably isn't a reason to upgrade to
 2.1
   
What? Upgrading gets you performance regardless of your api.
   
We have already gone from no new feature talk to less enphisis on
testing.
   
How comforting.
On Tuesday, March 11, 2014, Dave Brosius dbros...@mebigfatguy.com
 
   wrote:
   
+1,
   
altho supporting thrift in 2.1 seems overly conservative.
   
If you are using thrift there probably isn't a reason to upgrade to
  2.1,
in fact doing so will become an increasingly dumb idea as lesser and
   lesser
emphasis will be placed on testing with 2.1+. This would allow us to
greatly simplify the code footprint in 2.1
   
   
   
   
On 03/11/2014 01:00 PM, Jonathan Ellis wrote:
   
CQL3 is almost two years old now and has proved to be the better
 API
that Cassandra needed.  CQL drivers have caught up with and passed
  the
Thrift ones in terms of features, performance, and usability.
  CQL is
easier to learn and more productive than Thrift.
   
With static columns and LWT batch support [1] landing in 2.0.6,
 and
UDT in 2.1 [2], I don't know of any use cases for Thrift that
 can't
  be
done in CQL.  Contrawise, CQL makes many things easy that are
difficult to impossible in Thrift.  New development is
 overwhelmingly
done using CQL.
   
To date we have had an unofficial and poorly defined policy of
 add
support for new features to Thrift when that is 'easy.'  However,
even relatively simple Thrift changes can create subtle
 complications
for the rest of the server; for instance, allowing Thrift range
tombtones would make filter conversion for CASSANDRA-6506 more
difficult.
   
Thus, I think it's time to officially close the book on Thrift.
  We
will retain it for backwards compatibility, but we will commit to
adding no new features or changes to the Thrift API after 2.1.0.
   This
will help send an unambiguous message to users and eliminate any
remaining confusion from supporting two APIs

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo

If you are using thrift there probably isn't a reason to upgrade to 2.1

What? Upgrading gets you performance regardless of your api.

We have already gone from no new feature talk to less enphisis on
testing.

How comforting.
On Tuesday, March 11, 2014, Dave Brosius dbros...@mebigfatguy.com wrote:

 +1,

 altho supporting thrift in 2.1 seems overly conservative.

 If you are using thrift there probably isn't a reason to upgrade to 2.1,
in fact doing so will become an increasingly dumb idea as lesser and lesser
emphasis will be placed on testing with 2.1+. This would allow us to
greatly simplify the code footprint in 2.1




 On 03/11/2014 01:00 PM, Jonathan Ellis wrote:

 CQL3 is almost two years old now and has proved to be the better API
 that Cassandra needed.  CQL drivers have caught up with and passed the
 Thrift ones in terms of features, performance, and usability.  CQL is
 easier to learn and more productive than Thrift.

 With static columns and LWT batch support [1] landing in 2.0.6, and
 UDT in 2.1 [2], I don't know of any use cases for Thrift that can't be
 done in CQL.  Contrawise, CQL makes many things easy that are
 difficult to impossible in Thrift.  New development is overwhelmingly
 done using CQL.

 To date we have had an unofficial and poorly defined policy of add
 support for new features to Thrift when that is 'easy.'  However,
 even relatively simple Thrift changes can create subtle complications
 for the rest of the server; for instance, allowing Thrift range
 tombtones would make filter conversion for CASSANDRA-6506 more
 difficult.

 Thus, I think it's time to officially close the book on Thrift.  We
 will retain it for backwards compatibility, but we will commit to
 adding no new features or changes to the Thrift API after 2.1.0.  This
 will help send an unambiguous message to users and eliminate any
 remaining confusion from supporting two APIs.  If any new use cases
 come to light that can be done with Thrift but not CQL, we will commit
 to supporting those in CQL.

 (To a large degree, this merely formalizes what is already de facto
 reality.  Most thrift clients have not even added support for
 atomic_batch_mutate and cas from 2.0, and popular clients like
 Astyanax are migrating to the native protocol.)

 Reasonable?

 [1] https://issues.apache.org/jira/browse/CASSANDRA-6561
 [2] https://issues.apache.org/jira/browse/CASSANDRA-5590




-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Node side processing

2014-02-27 Thread Edward Capriolo

Check intravert on github. I am working t get many of those features into
cassandra.

On Thursday, February 27, 2014, Brandon Williams dri...@gmail.com wrote:
 A few:

 https://issues.apache.org/jira/browse/CASSANDRA-4914

 https://issues.apache.org/jira/browse/CASSANDRA-5184

 https://issues.apache.org/jira/browse/CASSANDRA-6704

 https://issues.apache.org/jira/browse/CASSANDRA-6167



 On Thu, Feb 27, 2014 at 7:50 AM, David Semeria da...@lmframework.com
wrote:

 Hi List,

 I was wondering whether there have been any past proposals for
 implementing node side processing (NSP) in C*. By NSP, I mean the
passing a
 reference to a Java class which would then process the result set before
it
 being returned to the client.

 In our particular use case our clients typically loop through result sets
 of a million or more rows to produce a tiny amount of output (sums,
means,
 variance, etc). The bottleneck -- quite obviously -- is the need to
 transfer a million rows to the client before processing can take place.
It
 would be extremely useful to execute this processing on the coordinator
 node and only transfer the results to the client.

 I mention this here because I can imagine other C* users having similar
 requirements.

 Thanks

 D.



-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Initialising / maintaining a list of nodes in the cluster

2013-09-07 Thread Edward Capriolo

A cassandra cluster must always use the same rpc port default 9160

On Friday, September 6, 2013, Paul LeoNerd leon...@leonerd.org.uk wrote:
 I'm trying to work out how a client is best to maintain a list of what
 nodes are available in the cluster, for maintaining connections to.

 I understand the general idea is to query the system.peers table, and
 REGISTER an interest in TOPOLOGY_CHANGE and STATUS_CHANGE messages to
 watch for nodes being added/removed or becoming unavailable/available.
 So far so good.

 A few details of this seem a bit awkward though:

  * The system.peers table identifies peers only by their IP address,
not including the port number, whereas TOPOLOGY and STATUS_CHANGE
messages include a port.

What happens if there is more than one copy of a node using the same
IP address? How do I know which TCP port I can use to communicate
CQL with a given peer?

  * The system.peers table doesn't contain any information giving the
current availability status of the nodes, so I don't know if they
are initially up or down.

I can just presume all the known nodes are up until I try connecting
to them - in any case, it could be that Cassandra knows of the
existence of nodes that for some reason my client can't connect to,
so I'd have to handle this case anyway. But it feels like that hint
should be there somewhere.

  * The system.peers table doesn't include the actual node I am
querying it on.

Most of the missing information does appear in the system.local
table, but not the address. The client does know /an/ address it has
connected to that node using, but how can I be sure that this
address is the one that will appear in the peers list on other
nodes? It's quite common for a server to have multiple addresses, so
it may be that I've connected to some address different to that
which the other nodes know it by.

 I'm quite new to Cassandra, so there stands a chance I've overlooked
 something somewhere. Can anyone offer any comment or advice on these
 issues, or perhaps point me in the direction of some client code that
 manages to overcome them?

 Thanks,

 --
 Paul LeoNerd Evans

 leon...@leonerd.org.uk
 ICQ# 4135350   |  Registered Linux# 179460
 http://www.leonerd.org.uk/

Re: Fw: Fwd: CQL Thrift

2013-08-30 Thread Edward Capriolo

This is always so hard to explain but

http://www.datastax.com/dev/blog/thrift-to-cql3

Get to the part that looks like this:

update column family user_profiles
with key_validation_class = UTF8Type
and comparator = UTF8Type
and column_metadata=[]

Since the static column values validation types have been dropped, they
are not available to your client library anymore. In particular, as can be
seen in the output above, cqlsh display some value in a non human-readable
format. And unless the client library exposes an easy way to force the
deserialization format for a value, such deserialization will have to be
done manually in client code.

This I think the above is largest reason. Due to the way 'CQL'  wants to
present 'thrift' column familes, you have to lose your 'thrift' notion of
schema, because it is not compatible with the 'cql' notion of schema. I am
wrapping 'thrift' and 'cql' in quotes because 'CQL' is an access language,
but when you define tables as non-compact storage they gain 'features' that
make them not understandable by non-cql clients.

They have two different schema systems, two different access languages,
there is some compatibility between the two, but working out which feature
sets mix and match is more effort then just picking one.


On Fri, Aug 30, 2013 at 2:05 PM, Vivek Mishra vivek.mis...@yahoo.comwrote:

 fyi. Just curious to know the real reason behind not to mix thrift and
 CQL3.

 Any pointers?

 -Vivek



 -- Forwarded message --
 From: Vivek Mishra mishra.v...@gmail.com
 Date: Fri, Aug 30, 2013 at 11:21 PM
 Subject: Re: CQL  Thrift
 To: u...@cassandra.apache.org



 Hi,
 I understand that, but i want to understand the reason behind
 such behavior?  Is it because of maintaining different metadata objects for
 CQL3 and thrift?

 Any suggestion?

 -Vivek



 On Fri, Aug 30, 2013 at 11:15 PM, Jon Haddad j...@jonhaddad.com wrote:

 If you're going to work with CQL, work with CQL.  If you're going to work
 with Thrift, work with Thrift.  Don't mix.
 
 
 On Aug 30, 2013, at 10:38 AM, Vivek Mishra mishra.v...@gmail.com wrote:
 
 Hi,
 If i a create a table with CQL3 as
 
 
 create table user(user_id text PRIMARY KEY, first_name text, last_name
 text, emailid text);
 
 
 and create index as:
 create index on user(first_name);
 
 
 then inserted some data as:
 insert into user(user_id,first_name,last_name,emailId)
 values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');
 
 
 
 
 
 Then if update same column family using Cassandra-cli as:
 
 
 update column family user with key_validation_class='UTF8Type' and
 column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
 index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
 index_type:KEYS}];
 
 
 
 
 
 Now if i connect via cqlsh and explore user table, i can see column
 first_name,last_name are not part of table structure anymore. Here is the
 output:
 
 
 CREATE TABLE user (
   key text PRIMARY KEY
 ) WITH
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};
 
 
 cqlsh:cql3usage select * from user;
 
 
  user_id
 -
  @mevivs
 
 
 
 
 
 
 
 
 
 
 I understand that, CQL3 and thrift interoperability is an issue. But
 this looks to me a very basic scenario.
 
 
 
 
 
 
 Any suggestions? Or If anybody can explain a reason behind this?
 
 
 -Vivek

Re: CQL vs Thrift

2013-07-18 Thread Edward Capriolo

If you understand how cql collections are written you can decode them and
work with them from thrift. It's quite a chore and i would not suggest
trying yo do it however.

(I suspect tyler tried it and jonathan broke his hand jk)

There is a perl cassandra driver that did something like this.

On Wednesday, July 17, 2013, Jonathan Ellis jbel...@gmail.com wrote:
 On Wed, Jul 17, 2013 at 4:03 PM, Tyler Hobbs ty...@datastax.com wrote:
 I'll leave it to somebody else to comment on adding collections, etc to
 Thrift.

 Doesn't make sense, since Thrift is all about the raw data cells, and
 collections are an abstraction layer on top of that.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced

Re: Major compaction does not seems to free the disk space a lot if wide rows are used.

2013-05-16 Thread Edward Capriolo

This makes sense. Unless you are running major compaction a delete could
only happen if the bloom filters confirmed the row was not in the sstables
not being compacted. If your rows are wide the odds are that they are in
most/all sstables and then finally removing them would be tricky.


On Thu, May 16, 2013 at 12:00 PM, Louvet, Jacques 
jacques_lou...@cable.comcast.com wrote:

  Boris,

  We hit exactly the same issue, and you are correct the newly created
 SSTables are the cause of why most of the column-tombstone not being purged.

  There is an improvement in 1.2 train where both the minimum and maximum
 timestamp for a row is now stored and used during the compaction to
 determine if the portion of the row can be purged.
 However, this only appears to help Major compaction has the other
 restriction where all the files encompassing the deleted rows must be part
 of the compaction for the row to be purged still remains.

  We have switched to column delete rather that row delete wherever
 practical. A little more work on the app, but a big improvement in reads
 due to much more efficient compaction.

  Regards,
 Jacques

   From: Boris Yen yulin...@gmail.com
 Reply-To: u...@cassandra.apache.org u...@cassandra.apache.org
 Date: Thursday, May 16, 2013 04:07
 To: u...@cassandra.apache.org u...@cassandra.apache.org, 
 dev@cassandra.apache.org dev@cassandra.apache.org
 Subject: Major compaction does not seems to free the disk space a lot if
 wide rows are used.

  Hi All,

 Sorry for the wide distribution.

  Our cassandra is running on 1.0.10. Recently, we are facing a weird
 situation. We have a column family containing wide rows (each row might
 have a few million of columns). We delete the columns on a daily basis and
 we also run major compaction on it everyday to free up disk space (the
 gc_grace is set to 600 seconds).

  However, every time we run the major compaction, only 1 or 2GB disk space
 is freed. We tried to delete most of the data before running compaction,
 however, the result is pretty much the same.

  So, we tried to check the source code. It seems that the column
 tombstones could only be purged when the row key is not in other sstables.
 I know the major compaction should include all sstables, however, in our
 use case, columns get inserted rapidly. This will make the cassandra flush
 the memtables to disk and create new sstables. The newly created sstables
 will have the same keys as the sstables that are being compacted (the
 compaction will take 2 or 3 hours to finish). My question is that will
 these newly created sstables be the cause of why most of the
 column-tombstone not being purged?

  p.s. We also did some other tests. We inserted data to the same CF with
 the same wide-row pattern and deleted most of the data. This time we
 stopped all the writes to cassandra and did the compaction. The disk usage
 decreased dramatically.

  Any suggestions or is this a know issue.

  Thanks and Regards,
  Boris

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-12 Thread Edward Capriolo

I am not sure about the collection case. But for compact storage you can
specify multiple-ranges in a slice query.

https://issues.apache.org/jira/browse/CASSANDRA-3885

I am not sure this will get you all the way to bit-map indexes but in a
wide row scenario it seems like you could support a event contains 1 or
event contains 2 or event contains 3

I am not sure how arbitrarily complex the CQL query handler can/will
become. For intravert (something I am dabling with) the concept is to apply
a server side function to the result of a slice.

https://github.com/zznate/intravert-ug/wiki/Filter-mode

There is a huge win in having multiple indexes behind the plugable index
support, not all of the plugable indexes and query options will be easy to
CQL-ify.




On Fri, Apr 12, 2013 at 10:52 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Something like this?

 SELECT * FROM users
 WHERE user_id IN (select user_id from events where type in (1, 2, 3))
   AND user_id NOT IN (select user_id from events where type=4)

 This doesn't really look like a Cassandra query to me.  More like a
 query for Hive (or Drill, or Impala).

 But, I know Sylvain is looking forward to adding index support to
 Collections [1], so something like this might fit:

 SELECT * FROM users
 WHERE (events CONTAINS 1 OR events CONTAINS 2 OR events CONTAINS 3)
AND NOT (events CONTAINS 4)

 However, even this is more than our current query planner can handle;
 we don't really handle disjunctions at all, except for the special
 case of IN on the partition key (which translates to multiget), let
 alone arbitrary logical predicates.

 I think that between bitmap indexes and query planning, the latter
 is actually the hard part.  QueryProcessor is about at the limits of
 tractable complexity already; I think we'd need a new approach if we
 want to handle arbitrarily complex predicates like that.

 [1] https://issues.apache.org/jira/browse/CASSANDRA-4511


 On Wed, Apr 10, 2013 at 4:40 PM, mrevilgnome mrevilgn...@gmail.com
 wrote:
  What do you think about set manipulation via indexes in Cassandra? I'm
  interested in answering queries such as give me all users that performed
  event 1, 2, and 3, but not 4. If the answer is yes than I can make a case
  for spending my time on C*. The only downside for us would be our current
  prototype is in C++ so we would loose some performance and the ability to
  dedicate an entire machine to caching/performing queries.
 
 
  On Wed, Apr 10, 2013 at 11:57 AM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  If you mean, Can someone help me figure out how to get started updating
  these old patches to trunk and cleaning out the Avro? then yes, I've
 been
  knee-deep in indexing code recently.
 
 
  On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome mrevilgn...@gmail.com
  wrote:
 
   I'm currently building a distributed cluster on top of cassandra to
  perform
   fast set manipulation via bitmap indexes. This gives me the ability to
   perform unions, intersections, and set subtraction across sub-queries.
   Currently I'm storing index information for thousands of dimensions as
   cassandra rows, and my cluster keeps this information cached,
 distributed
   and replicated in order to answer queries.
  
   Every couple of days I think to myself this should really exist in C*.
   Given all the benifits would there be any interest in
   reviving CASSANDRA-1472?
  
   Some downsides are that this is very memory intensive, even for sparse
   bitmaps.
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder, http://www.datastax.com
  @spyced
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced

Re: bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-05 Thread Edward Capriolo

http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/schema_vs_schema_less

Does your the tool handle the fact that foreign keys do not work? Or for
that matter, how are your dealing with the fact that a primary key in
cassandra is nothing like a primary key in a RDBMS?

Generally under the impression that CRUD tools that auto-generate CQL
schema's can give someone the rope to hang themselves.

On Tue, Mar 5, 2013 at 3:46 PM, Andrew Prendergast a...@andrewprendergast.com
 wrote:

 Hi Tristan,

 I've spent the last couple weekends testing the CRUD DML stuff and its very
 close to meeting that objective (although NULL handling needs some tuning).

 The main hiccups are in the JDBC driver which I have been working through
 with Rick - once he accepts my patches it'll be pretty solid in terms of
 cross-platform compatibility.

 On the DDL, I personally have a need for similar compatibility. One app I'm
 working on  programmatically creates the schema for a rather big ETL
 environment. It includes a very nice abstraction that creates databases and
 tables to accommodate tuples as they pass through the pipeline and behaves
 the same regardless of which DBMS is being used as the storage engine.

 This is possible because it turns out there is a subset of DDL that is
 common to all of the DBMS platforms and it would be very useful to see that
 in Cassandra.

 ap




 On Tue, Mar 5, 2013 at 8:26 PM, Tristan Tarrant
 tristan.tarr...@gmail.comwrote:

  On Tue, Mar 5, 2013 at 10:20 AM, Sylvain Lebresne sylv...@datastax.com
  wrote:
 
This is just one of a few small adjustments that can be made to the
   grammar
to make everyone's life easier while still maintaining the spirit of
   NOSQL.
  
   To be clear, I am *not* necessarily against making CQL3 closer to the
   ANSI-SQL
   as a convenience. But only if that doesn't compromise the language
   integrity
   and is justified. Adding a syntax with a well known semantic but
 without
  
 
  To me database DDL (such as the CREATE statement we are talking about) is
  always going to be handled in a custom fashion by applications.
  While ANSI SQL compatibility for CRUD operations is a great objective, I
  don't think it really matters for DDL.
 
  Tristan

Re: bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-05 Thread Edward Capriolo

Not to say that you can not do it. Or that it is impossible to do
correctly, but currently Cassandra does not allow it's validation to accept
parameters per column. IE you can set a column to be varchar UTF8Type, or
int int32Type but you CAN'T attach more properties to that type, such as
the size of the text or the integer.

I am very wary of Cassandra adding anymore schema. I signed up for a
schema-LESS database. If schema can be added that is not conflicting with
the original use cases so be it. However the latest round of schema has
caused COMPACT TABLES and CQL tables to be very different and essentially
not compatible with each other.

With schema and cassandra less is more.

On Tue, Mar 5, 2013 at 4:08 PM, Edward Capriolo edlinuxg...@gmail.comwrote:


 http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/schema_vs_schema_less

 Does your the tool handle the fact that foreign keys do not work? Or for
 that matter, how are your dealing with the fact that a primary key in
 cassandra is nothing like a primary key in a RDBMS?

 Generally under the impression that CRUD tools that auto-generate CQL
 schema's can give someone the rope to hang themselves.

 On Tue, Mar 5, 2013 at 3:46 PM, Andrew Prendergast 
 a...@andrewprendergast.com wrote:

 Hi Tristan,

 I've spent the last couple weekends testing the CRUD DML stuff and its
 very
 close to meeting that objective (although NULL handling needs some
 tuning).

 The main hiccups are in the JDBC driver which I have been working through
 with Rick - once he accepts my patches it'll be pretty solid in terms of
 cross-platform compatibility.

 On the DDL, I personally have a need for similar compatibility. One app
 I'm
 working on  programmatically creates the schema for a rather big ETL
 environment. It includes a very nice abstraction that creates databases
 and
 tables to accommodate tuples as they pass through the pipeline and behaves
 the same regardless of which DBMS is being used as the storage engine.

 This is possible because it turns out there is a subset of DDL that is
 common to all of the DBMS platforms and it would be very useful to see
 that
 in Cassandra.

 ap




 On Tue, Mar 5, 2013 at 8:26 PM, Tristan Tarrant
 tristan.tarr...@gmail.comwrote:

  On Tue, Mar 5, 2013 at 10:20 AM, Sylvain Lebresne sylv...@datastax.com
  wrote:
 
This is just one of a few small adjustments that can be made to the
   grammar
to make everyone's life easier while still maintaining the spirit of
   NOSQL.
  
   To be clear, I am *not* necessarily against making CQL3 closer to the
   ANSI-SQL
   as a convenience. But only if that doesn't compromise the language
   integrity
   and is justified. Adding a syntax with a well known semantic but
 without
  
 
  To me database DDL (such as the CREATE statement we are talking about)
 is
  always going to be handled in a custom fashion by applications.
  While ANSI SQL compatibility for CRUD operations is a great objective, I
  don't think it really matters for DDL.
 
  Tristan

Re: bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-05 Thread Edward Capriolo

yes. It doesn't use foreign keys or any constraints, they slow things down.

Exactly what you do not want. Check the history of the features that do
read before write. Counters, the old read before write secondary indexes,
the new collection functions that impose read before write.

Once people start using them they send an email to cassandra mailing list
that goes like this:

Subject: Why is Cassandra so slow?
Message: I am using secondary indexes and as I write data I seem my
READ_STAGE is filling up. What is going on? I thought cassandra was faster
then MySQL? Once my database gets bigger then X GB it slows to a crawl.
Please help.

If we make tools that design anti-pattern schema's people will use them, no
one wins.


On Tue, Mar 5, 2013 at 4:30 PM, Andrew Prendergast a...@andrewprendergast.com
 wrote:

 *

 http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/schema_vs_schema_less
 *
 Thanks for the link Ed, I'm aware of all that.

 * Does your the tool handle the fact that foreign keys do not work?
 *
 yes. It doesn't use foreign keys or any constraints, they slow things down.

 * how are your dealing with the fact that a primary key in cassandra is
 nothing like a primary key in a RDBMS?
 *
 locality preserving sequences  natural keys. There are no range queries.

 * Generally under the impression that CRUD tools that auto-generate CQL
 schema's can give someone the rope to hang themselves.
 *
 For those of us that know what we are doing and have had to put up with SQL
 based ETL, refining CQL3 would be life changing and ease the transition.

 ap




 On Wed, Mar 6, 2013 at 8:08 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

 
 
 http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/schema_vs_schema_less
 
  Does your the tool handle the fact that foreign keys do not work? Or for
  that matter, how are your dealing with the fact that a primary key in
  cassandra is nothing like a primary key in a RDBMS?
 
  Generally under the impression that CRUD tools that auto-generate CQL
  schema's can give someone the rope to hang themselves.
 
  On Tue, Mar 5, 2013 at 3:46 PM, Andrew Prendergast 
  a...@andrewprendergast.com
   wrote:
 
   Hi Tristan,
  
   I've spent the last couple weekends testing the CRUD DML stuff and its
  very
   close to meeting that objective (although NULL handling needs some
  tuning).
  
   The main hiccups are in the JDBC driver which I have been working
 through
   with Rick - once he accepts my patches it'll be pretty solid in terms
 of
   cross-platform compatibility.
  
   On the DDL, I personally have a need for similar compatibility. One app
  I'm
   working on  programmatically creates the schema for a rather big ETL
   environment. It includes a very nice abstraction that creates databases
  and
   tables to accommodate tuples as they pass through the pipeline and
  behaves
   the same regardless of which DBMS is being used as the storage engine.
  
   This is possible because it turns out there is a subset of DDL that is
   common to all of the DBMS platforms and it would be very useful to see
  that
   in Cassandra.
  
   ap
  
  
  
  
   On Tue, Mar 5, 2013 at 8:26 PM, Tristan Tarrant
   tristan.tarr...@gmail.comwrote:
  
On Tue, Mar 5, 2013 at 10:20 AM, Sylvain Lebresne 
  sylv...@datastax.com
wrote:
   
  This is just one of a few small adjustments that can be made to
 the
 grammar
  to make everyone's life easier while still maintaining the spirit
  of
 NOSQL.

 To be clear, I am *not* necessarily against making CQL3 closer to
 the
 ANSI-SQL
 as a convenience. But only if that doesn't compromise the language
 integrity
 and is justified. Adding a syntax with a well known semantic but
   without

   
To me database DDL (such as the CREATE statement we are talking
 about)
  is
always going to be handled in a custom fashion by applications.
While ANSI SQL compatibility for CRUD operations is a great
 objective,
  I
don't think it really matters for DDL.
   
Tristan

Re: bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-02 Thread Edward Capriolo

If the syntax effectively does nothing I do not see the point of adding it.
CQL is never going to be 100% compatible ANSI-SQL dialect.

On Sat, Mar 2, 2013 at 12:19 PM, Michael Kjellman
mkjell...@barracuda.comwrote:

 Might want to create a Jira ticket at issues.apache.org instead of
 submitting the bug report thru email.

 On Mar 2, 2013, at 3:11 AM, Andrew Prendergast a...@andrewprendergast.com
 wrote:

  *DESCRIPTION*
 
  When creating a table in all ANSI-SQL compliant RDBMS' the VARCHAR
 datatype
  takes a numeric parameter, however this parameter is generating errors in
  CQL3.
 
  *STEPS TO REPRODUCE*
 
  CREATE TABLE test (id BIGINT PRIMARY KEY, col1 VARCHAR(256)); // emits
 Bad
  Request: line 1:54 mismatched input '(' expecting ')'
 
  CREATE TABLE test (id BIGINT PRIMARY KEY, col1 VARCHAR); // this works
 
  *SUGGESTED RESOLUTION*
 
  The current fail-fast approach does not create the column so that the
 user
  is 100% clear that the length parameter means nothing to NOSQL.
 
  I would like to propose that the column length be allowed in the grammar
  (but ignored by cassandra), allowing better ANSI-SQL client
 compatibility.

 Copy, by Barracuda, helps you store, protect, and share all your amazing

 things. Start today: www.copy.com.

Re: Notes from committer's meeting: overview

2013-02-25 Thread Edward Capriolo

I am curious what you mean when you say does the fat client work right
now?

What does not work about it? I have a fat client app running same jvm as c*
it seems to work well.

On Monday, February 25, 2013, Jonathan Ellis jbel...@gmail.com wrote:
Last Thursday, DataStax put together a meeting of the active Cassandra
committers in San Mateo. Dave Brosius was unable to make it to the
West coast, but Brandon, Eric, Gary, Jason, Pavel, Sylvain, Vijay,
Yuki, and I were able to attend, with Aleksey and Jake able to attend
part time over Google Hangout.

We started by asking each committer to outline his top 3 priorities
for 2.0. There was pretty broad consensus around the following big
items, which I will break out into separate threads:

* Streaming and repair
* Counters

There was also a lot of consensus that we'll be able to ship some form
of Triggers [1] in 2.0. Gary's suggestion was to focus on getting the
functionality nailed down first, then worry about classloader voodoo
to allow live reloading. There was also general agreement that we
need to split jar loading from trigger definition, to allow a single
trigger to be applied to be multiple tables.

There was less consensus around CAS [2], primarily because of
implementation difficulties. (I've since read up some more on Paxos
and Spinnaker and posted my thoughts to the ticket.)

Other subjects discussed:

A single Cassandra process does not scale well beyond 12 physical
cores. Further research is needed to understand why. One possibility
is GC overhead. Vijay is going to test Azul's Zing VM to confirm or
refute this.

Server-side aggregation functions [3]. This would remove the need to
pull a lot of data over the wire to a client unnecessarily. There was
some unease around moving beyond the relatively simple queries we've
traditionally supported, but I think there was general agreement that
this can be addressed by fencing aggregation to a single partition
unless explicitly allowed otherwise a la ALLOW FILTERING [4].

Extending cross-datacenter forwarding [5] to a star model. That is,
in the case of three or more datacenters, instead of the original
coordinator in DC A sending to replicas in DC B C, A would forward
to B, which would forward to C. Thus, the bandwidth required for any
one DC would be constant as more datacenters are added.

Vnode improvements such as a vnode-aware replication strategy [6].

Cluster merging and splitting -- if I have multiple applications using
a single cassandra cluster, and one gets a lot more traffic than the
others, I may want to split that out into its own cluster. I think
there was a concrete proposal as to how this could work but someone
else will have to fill that in because I didn't write it down.

Auto-paging of SELECT queries for CQL [7], or put another way,
transparent cursors for the native CQL driver.

Make the storage engine more CQL-aware. Low-hanging fruit here
includes a prefix dictionary for all the composite cell names [8].

Resurrecting the StorageProxy API aka Fat Client. (Does it even work
right now? Not really.)

Reducing context switches and increasing fairness in client
connections. HSHA prefers to accept new connections vs servicing
existing ones, so overload situations are problematic.

Gossip is unreliable at 100s of nodes. Here again I missed any
concrete proposals to address this.

[1] https://issues.apache.org/jira/browse/CASSANDRA-1311. Start with

https://issues.apache.org/jira/browse/CASSANDRA-1311?focusedCommentId=13492827page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13492827
for the parts relevant to Vijay's proof of concept patch.
[2] https://issues.apache.org/jira/browse/CASSANDRA-5062
[3] https://issues.apache.org/jira/browse/CASSANDRA-4914
[4] https://issues.apache.org/jira/browse/CASSANDRA-4915
[5] https://issues.apache.org/jira/browse/CASSANDRA-3577
[6] https://issues.apache.org/jira/browse/CASSANDRA-4123
[7] https://issues.apache.org/jira/browse/CASSANDRA-4415
[8] https://issues.apache.org/jira/browse/CASSANDRA-4175

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Re: Understanding Read and Writes During Transient States

2013-02-16 Thread Edward Capriolo

When a node is joining/bootstrapping the ring and replication factor
is 3, the write operation should be delivered to 4 nodes. The three
current natural endpoints and the new one. In this way if the joining
node fails to join the other nodes did not miss any writes.

The joining node will not answer read requests until it is done bootstrapping.

On Fri, Feb 15, 2013 at 5:24 PM, Muntasir Raihan Rahman
muntasir.rai...@gmail.com wrote:
 Hi,

 I am trying to understand what happens to reads and writes to cassandra
 while nodes leave or join the system. Specifically, what happens when a
 node is about to leave or join, but gets a read/write request?

 Any pointers on this?

 Muntasir.

 --
 Best Regards
 Muntasir Raihan Rahman
 Email: muntasir.rai...@gmail.com
 Phone: 1-217-979-9307
 Department of Computer Science,
 University of Illinois Urbana Champaign,
 3111 Siebel Center,
 201 N. Goodwin Avenue,
 Urbana, IL  61801

Re: Proposal: require Java7 for Cassandra 2.0

2013-02-07 Thread Edward Capriolo

Counter proposal java 8 and closures. Jk
On Thursday, February 7, 2013, Carl Yeksigian c...@yeksigian.com wrote:
 +1


 On Wed, Feb 6, 2013 at 5:21 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Java 6 EOL is this month.  Java 7 will be two years old when C* 2.0
 comes out (July).  Anecdotally, a bunch of people are running C* on
 Java7 with no issues, except for the Snappy-on-OS-X problem (which
 will be moot if LZ4 becomes our default, as looks likely).

 Upgrading to Java7 lets us take advantage of new (two year old)
 features as well as simplifying interoperability with other
 dependencies, e.g., Jetty's BlockingArrayQueue requires java7.

 Thoughts?

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced

Re: max_compaction_threshold removed - bad move

2013-01-09 Thread Edward Capriolo

:( Seems like a good thing to have, i can figure at least one degenerate
scenario where having that helps. The first being a currupt sstable...
compaction will never be able to remove it and then each compaction will
likely try to comact it again... and fail.

On Wed, Jan 9, 2013 at 10:35 AM, Brandon Williams dri...@gmail.com wrote:

 On Wed, Jan 9, 2013 at 9:21 AM, Radim Kolar h...@filez.com wrote:
  removing max_compaction_threshold in 1.2 was bad move, keeping it low
 helps
  compaction throughput because it lowers number of disk seeks.

 :(

Re: max_compaction_threshold removed - bad move

2013-01-09 Thread Edward Capriolo

If you want to complain about bad names in the code, start with the class
implementing keyspaces being called Table.

OMG that is terrible!

We should only be wrongfully calling a column family a table :)

(In hbase tables are actually a collection of column familes right so that
is probably where that came from)

On Wed, Jan 9, 2013 at 11:25 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Wed, Jan 9, 2013 at 5:04 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

  Was the change well accounted for in the changes.TXT or the readme.txt?
 

 The news file says:
 CQL3 is now considered final in this release. Compared to the beta
  version that is part of 1.1, this final version has a few additions
  (collections), but also some (incompatible) changes in the syntax for the
  options of the create/alter keyspace/table statements.
  (...)
  Please refer to the CQL3 documentation for details

 That last sentence refers to
 http://cassandra.apache.org/doc/cql3/CQL.htmland yes, that should be
 in the news file but that same url was pointing to
 the 1.1 CQL3 doc before 1.2.0 was release so I didn't wanted to list it in
 the news file for the betas and rcs and I forgot to add back the link to
 that news file for the final, my bad (I'm sorry and I will add the link to
 the NEWS file for the next release). And of course having forgotten to
 update the max_threshold thing in said reference doc was infortunate but
 that's fixed now.

 Now I know you are not happy with us having made breaking changes between
 CQL3 beta in 1.1 and CQL3 final in 1.2. I'm sorry we did, but I do am happy
 with the coherence of the language we have in that final so I think that
 was probably worth it in the end. I do want to stress that the goal was to
 have a CQL3 final for which we won't do breaking changes for the forseable
 future.


 
  // Note that isCompact means here that no componet of the comparator
  correspond to the column names
  // defined in the CREATE TABLE QUERY. This is not exactly equivalent to
 the
  'WITH COMPACT STORAGE'
  // option when creating a table in that static CF without a composite
  type will have isCompact == false
// even though one must use 'WITH COMPACT STORAGE' to declare them.
 
 
  Confused
 

 Granted that is not the cleanest thing ever and we could probably rename
 that isCompact variable but you do realize that is just an implementation
 detail that have no impact whatsoever on users. If you want to complain
 about bad names in the code, start with the class implementing keyspaces
 being called Table.

 --
 Sylvain

Re: [VOTE CLOSED] Release Apache Cassandra 1.2.0-rc1

2013-01-02 Thread Edward Capriolo

With thrift methods can not be over-loaded but objects can have optional
parameters.

In the future should we avoid:

CqlResult execute_cql3_query(1:required binary query, 2:required
Compression compression, 3:required ConsistencyLevel consistency)
throws (1:InvalidRequestException ire,
2:UnavailableException ue,
3:TimedOutException te,
4:SchemaDisagreementException sde)

Instead

CqlResult execute_cql3_query(1:required CqlRequestObject object)

and the CqlRequestObject can contain all optional parameters. I can not
find the exact reference, but I remember reading this is the way google has
suggested using protobufs, mark all fields optional always for maximum
compatibility.

On Tue, Jan 1, 2013 at 2:25 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Tue, Jan 1, 2013 at 11:42 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
  Question. 1.2.0-beta2
 
  Why does the thrift interface have 2 CQL methods?

 To preserve cql2 compatibility.  cql3 pulls consistencylevel into the
 method call instead of the query language.

  Is this something we are going to continue?

 When necessary for compatibility, yes.

  I wish we could have done the cassandra 0.6.X - 0.7.X
  migration this way:)

 In retrospect, I agree.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder, http://www.datastax.com
 @spyced

Re: [VOTE CLOSED] Release Apache Cassandra 1.2.0-rc1

2013-01-01 Thread Edward Capriolo

Question. 1.2.0-beta2

Why does the thrift interface have 2 CQL methods?

  CqlResult execute_cql_query(1:required binary query, 2:required
Compression compression)
throws (1:InvalidRequestException ire,
2:UnavailableException ue,
3:TimedOutException te,
4:SchemaDisagreementException sde)

  CqlResult execute_cql3_query(1:required binary query, 2:required
Compression compression, 3:required ConsistencyLevel consistency)
throws (1:InvalidRequestException ire,
2:UnavailableException ue,
3:TimedOutException te,
4:SchemaDisagreementException sde)

Is this something we are going to continue? Just naming methods
execute_cql3_query? I wish we could have done the cassandra 0.6.X - 0.7.X
migration this way:)

get(String keyspace, String column family, String rowkey, String column)

get7(String columnFamily, binay rowkey, binary column )


On Mon, Dec 3, 2012 at 12:34 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 Alright, seems we can use a beta 3 before calling this a RC1.
 So I'm closing this vote and I'll rebrand this as beta3 and do a short 24h
 with that. And hopefully we'll have a true RC1 quickly after that.

 Stay tuned.

 --
 Sylvain


 On Mon, Dec 3, 2012 at 5:57 AM, Brandon Williams dri...@gmail.com wrote:

  On Sun, Dec 2, 2012 at 10:45 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
   I'm not a fan of blocking a new rc because of bugs that are not
   regressions new in that release.  I'd also like to get more testing on
   the 1.2 fixes since b2.  But we can call it b3 instead of rc1 if you
   want.
 
  I agree with everything you've said.  I'm fine with calling it b3,
  though I expect we'll have that ticket closed soon and could re-roll
  an rc1 on Tuesday.
 
  -Brandon

Re: Compund/Composite column names

2012-12-17 Thread Edward Capriolo

This was discussed in one of the tickets. The problem is that CQL3's sparse
tables is it has different metadata that has NOT been added to thrift's
CFMetaData. Thus thrift is unaware of exactly how to verify the insert.

Originally it was made impossible for thrift to see a sparse table (but
that restriction has been lifted) it seems. It is probably a bad idea to
thrift insert into a sparse table until Cassandra does not have two
distinct sources of meta information.





On Mon, Dec 17, 2012 at 9:52 AM, Vivek Mishra vivek.mis...@yahoo.comwrote:

 Looks like Thrift API is not working as expected?

 -Vivek




 
  From: Brian O'Neill b...@alumni.brown.edu
 To: dev@cassandra.apache.org
 Cc: Vivek Mishra vivek.mis...@yahoo.com
 Sent: Monday, December 17, 2012 8:12 PM
 Subject: Re: Compund/Composite column names

 FYI -- I'm still seeing this on 1.2-beta1.

 If you create a table via CQL, then insert into it (via Java API) with
 an incorrect number of components.  The insert works, but select *
 from CQL results in a TSocket read error.

 I showed this in the webinar last week, just in case people ran into
 it.  It would be great to translate the ArrayIndexOutofBoundsException
 from the server side into something meaningful in cqlsh to help people
 diagnose the problem.  (a regular user probably doesn't have access to
 the server-side logs)

 You can see it at minute 41 in the video from the webinar:
 http://www.youtube.com/watch?v=AdfugJxfd0ofeature=youtu.be

 -brian


 On Tue, Oct 9, 2012 at 9:39 AM, Jonathan Ellis jbel...@gmail.com wrote:
  Sounds like you're running into the keyspace drop bug.  It's mostly
 fixed
  in 1.1.5 but you might need the latest from 1.1 branch.  1.1.6 will be
  released soon with the final fix.
  On Oct 9, 2012 1:58 AM, Vivek Mishra vivek.mis...@yahoo.com wrote:
 
 
 
  Ok. I am able to understand the problem now. Issue is:
 
  If i create a column family altercations as:
 
 
 
 **8
  CREATE TABLE altercations (
 instigator text,
 started_at timestamp,
 ships_destroyed int,
 energy_used float,
 alliance_involvement boolean,
 PRIMARY KEY (instigator,started_at,ships_destroyed)
 );
  /
 INSERT INTO altercations (instigator, started_at, ships_destroyed,
   energy_used, alliance_involvement)
   VALUES ('Jayne Cobb', '2012-07-23', 2, 4.6,
 'false');
 
 
 *
 
  It works!
 
  But if i create a column family with compound primary key with 2
 composite
  column as:
 
 
 
 *
  CREATE TABLE altercations (
 instigator text,
 started_at timestamp,
 ships_destroyed int,
 energy_used float,
 alliance_involvement boolean,
 PRIMARY KEY (instigator,started_at)
 );
 
 
 
 *
  and Then drop this column family:
 
 
 
 *
  drop columnfamily altercations;
 
 
 *
 
  and then try to create same one with primary compound key with 3
 composite
  column:
 
 
 
 *
 
  CREATE TABLE altercations (
 instigator text,
 started_at timestamp,
 ships_destroyed int,
 energy_used float,
 alliance_involvement boolean,
 PRIMARY KEY (instigator,started_at,ships_destroyed)
 );
 
 
 *
 
  it gives me error: TSocket read 0 bytes
 
  Rest, as no column family is created, so nothing onwards will work.
 
  Is this an issue?
 
  -Vivek
 
 
  
   From: Jonathan Ellis jbel...@gmail.com
  To: dev@cassandra.apache.org; Vivek Mishra vivek.mis...@yahoo.com
  Sent: Tuesday, October 9, 2012 9:08 AM
  Subject: Re: Compund/Composite column names
 
  Works for me on latest 1.1 in cql3 mode.  cql2 mode gives a parse error.
 
  On Mon, Oct 8, 2012 at 9:18 PM, Vivek Mishra vivek.mis...@yahoo.com
  wrote:
   Hi All,
  
   I am trying to use compound primary key column name and i am referring
  to:
   http://www.datastax.com/dev/blog/whats-new-in-cql-3-0
  
  
   As mentioned on this example, i tried to create a column family
  containing compound primary key (one or more) as:

Re: Stable Hector version with cassandra 1.1.6

2012-12-04 Thread Edward Capriolo

One thing to note. The maven repo has moved from me.prettyprint to
org.Hector-client so that should aid in your searches of the maven repo.

On Tuesday, December 4, 2012, Bisht, Jaikrit bis...@visa.com wrote:

 Hi,

 Could someone recommend the stable version of Hector libraries for
Cassandra 1.1.6?

 Regards
 Jay

 VISA EUROPE is a technology business that provides the brand, systems,
services and rules that make electronic payments between millions of
consumers, retailers, businesses and governments happen.  Visa Europe is a
membership association of more than 3,700 members that includes banks and
other payment service providers from 36 countries across Europe.  We
continually invest and innovate to create new and better ways to pay and be
paid.  For more information, please visit www.visaeurope.com.

 Please consider the environment before printing this email.

 This email (including attachments) is confidential and is solely intended
for the addressee. Unless you are the addressee, you may not read, use or
store this email in any way, or permit others to. If you have received it
in error, please contact Visa Europe on +44 (0)20 7937 8111.

 Visa Europe Services Inc. is a company incorporated in Delaware USA,
acting through its UK Establishment (UK Establishment number BR007632)
whose registered office is at 1 Sheldon Square, London, W2 6TT.

Re: 2.0

2012-11-30 Thread Edward Capriolo

Good idea. Lets remove thrift, CQL3 is still beta, but I am willing to
upgrade to a version that removes thrift. Then when all our clients can not
connect they will be forced to get with the program.

On Fri, Nov 30, 2012 at 5:33 PM, Jason Brown jasedbr...@gmail.com wrote:

 Hi Jonathan,

 I'm in favor of paying off the technical debt, as well, and I wonder if
 there is value in removing support for thrift with 2.0? We're currently in
 'do as little as possible' mode with thrift, so should we aggressively cast
 it off and push the binary CQL protocol? Seems like a jump to '2.0', along
 with the other initiatives, would be a reasonable time/milestone to do so.

 Thanks,

 -Jason


 On Fri, Nov 30, 2012 at 12:12 PM, Jonathan Ellis jbel...@gmail.com
 wrote:

  The more I think about it, the more I think we should call 1.2-next,
  2.0.  I'd like to spend some time paying off our technical debt:
 
  - replace supercolumns with composites (CASSANDRA-3237)
  - rewrite counters (CASSANDRA-4775)
  - improve storage engine support for wide rows
  - better stage management to improve latency (disruptor? lightweight
  threads?  custom executor + queue?)
  - improved repair (CASSANDRA-3362, 2699)
 
  Of course, we're planning some new features as well:
  - triggers (CASSANDRA-1311)
  - improved query fault tolerance (CASSANDRA-4705)
  - row size limits (CASSANDRA-3929)
  - cql3 integration for hadoop (CASSANDRA-4421)
  - improved caching (CASSANDRA-1956, 2864)
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder, http://www.datastax.com
  @spyced

Re: maximum sstable size

2012-11-03 Thread Edward Capriolo

I have another ticket open for this.

On Sat, Nov 3, 2012 at 6:29 PM, Radim Kolar h...@filez.com wrote:
 done
 https://issues.apache.org/jira/browse/CASSANDRA-4897

Re: findbugs

2012-07-30 Thread Edward Capriolo

I am sure no one would have an issue with an optional findbugs target.

On Mon, Jul 30, 2012 at 10:32 AM, Radim Kolar h...@filez.com wrote:
 was any decision about findbugs made? you do not consider code style
 recommended by findbugs as good practice which should be followed?

 I can submit few findbugs patches, but it will probably turns into flamewar
 WE vs FINDBUGS like there:
 https://issues.apache.org/jira/browse/HADOOP-8619

 findbugs problems are pretty easy to fix and there are just 70 of them, it
 could be done in two days.

 I do not care about findbugs+cas-dev issue much because i need to fork
 cassandra anyway to get performance patches there. Its just matter of
 schedule for me if 1 should feed you fb patches before i fork it.

Re: Welcome committers Dave Brosius and Yuki Morishita!

2012-05-22 Thread Edward Capriolo

Congrats!

On Tue, May 22, 2012 at 10:43 AM, Jonathan Ellis jbel...@gmail.com wrote:
 Thanks to both of you for your help!

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com

Re: how to upgrade my cassadra from SizeTieredCompaction to LeveledCompactiom

2012-05-13 Thread Edward Capriolo

As soon as you use the CLI to change the compaction strategy for a
column family Cassandra will consider all SSTables level 0 and being
level-ing them.  With that much data, think hard before making the
change. You have to understand how level ed will work with your
workload.

On Sun, May 13, 2012 at 10:09 PM, zhangcheng zhangch...@jike.com wrote:

 There is 2T data on each server. Can someone give me some advice?

 Thanks.

Re: java.net.SocketException

2012-04-12 Thread Edward Capriolo

If you are using ~[na:1.6.0_14 you should upgrade to a later 1.6 JVM
before trying to troubleshoot anything else.

On Thu, Apr 12, 2012 at 11:05 PM, Chao Wang chao.w...@ericsson.com wrote:
 HI,
 Does this java.net.SocketException: No buffer space available related to 
 cassandra?
 /Chao
 11:00:30.269 
 [Hector.me.prettyprint.cassandra.connection.CassandraHostRetryService-1] INFO 
  [m.p.c.c.CassandraHostRetryService:113] - Downed Host retry status true with 
 host: localhost(127.0.0.1):7160
 11:00:30.300 
 [Hector.me.prettyprint.cassandra.connection.CassandraHostRetryService-1] 
 ERROR [m.p.c.connection.HConnectionManager:109] - Transport exception host to 
 HConnectionManager: localhost(127.0.0.1):7160
 me.prettyprint.hector.api.exceptions.HectorTransportException: Unable to open 
 transport to localhost(127.0.0.1):7160 , java.net.SocketException: No buffer 
 space available (maximum connections reached?): connect
        at 
 me.prettyprint.cassandra.connection.HThriftClient.open(HThriftClient.java:128)
  ~[na:na]
        at 
 me.prettyprint.cassandra.connection.ConcurrentHClientPool.init(ConcurrentHClientPool.java:48)
  ~[na:na]
        at 
 me.prettyprint.cassandra.connection.RoundRobinBalancingPolicy.createConnection(RoundRobinBalancingPolicy.java:68)
  ~[na:na]
        at 
 me.prettyprint.cassandra.connection.HConnectionManager.addCassandraHost(HConnectionManager.java:104)
  ~[na:na]
        at 
 me.prettyprint.cassandra.connection.CassandraHostRetryService$RetryRunner.run(CassandraHostRetryService.java:115)
  ~[na:na]
        at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) 
 ~[na:1.6.0_14]
        at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) 
 ~[na:1.6.0_14]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) 
 ~[na:1.6.0_14]
        at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
  ~[na:1.6.0_14]
        at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
  ~[na:1.6.0_14]
        at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
  ~[na:1.6.0_14]
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  ~[na:1.6.0_14]
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  ~[na:1.6.0_14]
        at java.lang.Thread.run(Thread.java:619) ~[na:1.6.0_14]
 Caused by: org.apache.thrift.transport.TTransportException: 
 java.net.SocketException: No buffer space available (maximum connections 
 reached?): connect
        at org.apache.thrift.transport.TSocket.open(TSocket.java:183) ~[na:na]
        at 
 org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81) 
 ~[na:na]
        at 
 me.prettyprint.cassandra.connection.HThriftClient.open(HThriftClient.java:122)
  ~[na:na]
        ... 13 common frames omitted
 Caused by: java.net.SocketException: No buffer space available (maximum 
 connections reached?): connect
        at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.6.0_14]
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) 
 ~[na:1.6.0_14]
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) 
 ~[na:1.6.0_14]
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) 
 ~[na:1.6.0_14]
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) 
 ~[na:1.6.0_14]
        at java.net.Socket.connect(Socket.java:519) ~[na:1.6.0_14]
        at org.apache.thrift.transport.TSocket.open(TSocket.java:178) ~[na:na]
        ... 15 common frames omitted

Re: Document storage

2012-03-29 Thread Edward Capriolo

The issue with these super complex types is to do anything useful with
them you would either need scanners or co processors. As its stands
right now complex data like json is fairly opaque to Cassandra.
Getting cassandra to natively speak protobuffs or whatever flavor of
the week serialization framework is hip right now we make the codebase
very large. How is that field sorted? How is it indexed? This is
starting to go very far against the schema-less nosql grain. Where
does this end up users wanting to store binary XML index it and feed
cassandra XPath queries?


On Thu, Mar 29, 2012 at 11:23 AM, Ben McCann b...@benmccann.com wrote:
 Creating materialized paths may well be a possible solution.  If that were
 the solution the community were to agree upon then I would like it to be a
 standardized and well-documented best practice.  I asked how to store a
 list of values on the user
 listhttp://www.mail-archive.com/user@cassandra.apache.org/msg21274.html
 and
 no one suggested [fieldName, TimeUUID]: fieldValue.  It would be a
 huge pain right now to create materialized paths like this for each of my
 objects, so client library support would definitely be needed.  And the
 client libraries should agree.  If Astyanax and lazyboy both add support
 for materialized path and I write an object to Cassandra with Astyanax,
 then I should be able to read it back with lazyboy.  The benefit of using
 JSON/SMILE is that it's very clear that there's exactly one way to
 serialize and deserialize the data and it's very easy.  It's not clear to
 me that this is true using materialized paths.


 On Thu, Mar 29, 2012 at 8:21 AM, Tyler Patterson 
 tpatter...@datastax.comwrote:

 
 
  Would there be interest in adding a JsonType?


 What about checking that data inserted into a JsonType is valid JSON? How
 would you do it, and would the overhead be something we are concerned
 about, especially if the JSON string is large?

Re: Document storage

2012-03-28 Thread Edward Capriolo

Some work I did stores JSON blobs in columns. The question on JSON
type is how to sort it.

On Wed, Mar 28, 2012 at 7:35 PM, Jeremy Hanna
jeremy.hanna1...@gmail.com wrote:
 I don't speak for the project, but you might give it a day or two for people 
 to respond and/or perhaps create a jira ticket.  Seems like that's a 
 reasonable data type that would get some traction - a json type.  However, 
 what would validation look like?  That's one of the main reasons there are 
 the data types and validators, in order to validate on insert.

 On Mar 29, 2012, at 12:27 AM, Ben McCann wrote:

 Any thoughts?  I'd like to submit a patch, but only if it will be accepted.

 Thanks,
 Ben


 On Wed, Mar 28, 2012 at 8:58 AM, Ben McCann b...@benmccann.com wrote:

 Hi,

 I was wondering if it would be interesting to add some type of
 document-oriented data type.

 I've found it somewhat awkward to store document-oriented data in
 Cassandra today.  I can make a JSON/Protobuf/Thrift, serialize it, and
 store it, but Cassandra cannot differentiate it from any other string or
 byte array.  However, if my column validation_class could be a JsonType
 that would allow tools to potentially do more interesting introspection on
 the column value.  E.g. bug 
 3647https://issues.apache.org/jira/browse/CASSANDRA-3647calls for 
 supporting arbitrarily nested documents in CQL.  Running a
 query against the JSON column in Pig is possible as well, but again in this
 use case it would be helpful to be able to encode in column metadata that
 the column is stored as JSON.  For debugging, running nightly reports, etc.
 it would be quite useful compared to the opaque string and byte array types
 we have today.  JSON is appealing because it would be easy to implement.
 Something like Thrift or Protocol Buffers would actually be interesting
 since they would be more space efficient.  However, they would also be a
 bit more difficult to implement because of the extra typing information
 they provide.  I'm hoping with Cassandra 1.0's addition of compression that
 storing JSON is not too inefficient.

 Would there be interest in adding a JsonType?  I could look at putting a
 patch together.

 Thanks,
 Ben

Re: RFC: Cassandra Virtual Nodes

2012-03-21 Thread Edward Capriolo

On Wed, Mar 21, 2012 at 9:50 AM, Eric Evans eev...@acunu.com wrote:
 On Tue, Mar 20, 2012 at 9:53 PM, Jonathan Ellis jbel...@gmail.com wrote:
 It's reasonable that we can attach different levels of importance to
 these things.  Taking a step back, I have two main points:

 1) vnodes add enormous complexity to *many* parts of Cassandra.  I'm
 skeptical of the cost:benefit ratio here.

 1a) The benefit is lower in my mind because many of the problems
 solved by vnodes can be solved well enough for most people, for
 some value of those two phrases, without vnodes.

 2) I'm not okay with a commit something half-baked and sort it out
 later approach.

 I must admit I find this a little disheartening.  The discussion has
 barely started.  No one has had a chance to discuss implementation
 specifics so that the rest of us could understand *how* disruptive it
 would be (a necessary requirement in weighing cost:benefit), or what
 an incremental approach would look like, and yet work has already
 begun on shutting this down.

 Unless I'm reading you wrong, your mandate (I say mandate because you
 hinted at a veto elsewhere), is No to anything complex or invasive
 (for some value of each).  The only alternative would seem to be a
 phased or incremental approach, but you seem to be saying No to that
 as well.

 There seems to be quite a bit of interest in having virtual nodes (and
 there has been for as long as I can remember), the only serious
 reservations relate to the difficulty/complexity.  Is there really no
 way to put our heads together and figure out how to properly manage
 that aspect?

 On Tue, Mar 20, 2012 at 11:10 AM, Richard Low r...@acunu.com wrote:
 On 20 March 2012 14:55, Jonathan Ellis jbel...@gmail.com wrote:
 Here's how I see Sam's list:

 * Even load balancing when growing and shrinking the cluster

 Nice to have, but post-bootstrap load balancing works well in practice
 (and is improved by TRP).

 Post-bootstrap load balancing without vnodes necessarily streams more
 data than is necessary.  Vnodes streams the minimal amount.

 In fact, post-bootstrap load balancing currently streams a constant
 fraction of your data - the network traffic involved in a rebalance
 increases linearly with the size of your cluster.  With vnodes it
 decreases linearly.

 Including removing the ops overhead of running the load balance and
 calculating new tokens, this makes removing post-bootstrap load
 balancing a pretty big deal.

 * Greater failure tolerance in streaming

 Directly addressed by TRP.

 Agreed.

 * Evenly distributed impact of streaming operations

 Not a problem in practice with stream throttling.

 Throttling slows them down, increasing rebuild times so increasing downtime.

 * Possibility for active load balancing

 Not really a feature of vnodes per se, but as with the other load
 balancing point, this is also improved by TRP.

 Again with the caveat that more data is streamed with TRP.  Vnodes
 removes the need for any load balancing with RP.

 * Distributed rebuild

 This is the 20% that TRP does not address.  Nice to have?  Yes.  Can I
 live without it?  I have so far.  Is this alone worth the complexity
 of vnodes?  No, it is not.  Especially since there are probably other
 approaches that we can take to mitigate this, one of which Rick has
 suggested in a separate sub-thread.

 Distributed rebuild means you can store more data per node with the
 same failure probabilities.  This is frequently a limiting factor on
 how much data you can store per node, increasing cluster sizes
 unnecessarily.  I'd argue that this alone is worth the complexity of
 vnodes.

 Richard.



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



 --
 Eric Evans
 Acunu | http://www.acunu.com | @acunu

I have also thought of how I would like Vnodes to work from an
operational prospective rather then a software one. I would like these
features.
1) No more raid 0. If a machine is responsible for 4 vnodes they
should correspond to for JBOD.

2) Vnodes should be able to be hot pluged. My normal cassandra chassis
would be a 2U with 6 drive bays. Imagine I have 10 nodes. Now if my
chassis dies I should be able to take the disks out and physically
plug them into another chassis. Then in cassandra I should be able to
run a command like.
nodetool attach '/mnt/disk6'. disk6 should contain all data an it's
vnode information.

Now this would be awesome for upgrades/migrations/etc.

Re: RFC: Cassandra Virtual Nodes

2012-03-21 Thread Edward Capriolo

On Wed, Mar 21, 2012 at 3:24 PM, Tom Wilkie t...@acunu.com wrote:
 Hi Edward

 1) No more raid 0. If a machine is responsible for 4 vnodes they
 should correspond to for JBOD.

 So each vnode corresponds to a disk?  I suppose we could have a
 separate data directory per disk, but I think this should be a
 separate, subsequent change.

I think having more micro-ranges makes the process much easier.
Image a token ring 1-30

Node1 | major range 0-10  | disk 10-2 , disk2 3-4, disk 3 5-7, disk 4 8-10
Node2 | major range 11-20| disk 1 11-12 , disk2 13-14, disk 3 15-17,
disk 4 18-20
Node3 | major range 21-30| disk 1 21-22 , disk2 23-24, disk 3 25-27,
disk 4 28-30

Adding a 4th node is easy:
If you are at the data center, just take disk 4 out of each node and
place it in new server :)
Software wise it is the same deal. Each node streams off only disk 4
to the new node.

Now at this point disk 4 is idle and each machine should re balance
its own data across its 4 disks.

 However, do note that making the vnode ~size of a disk (and only have
 4-8 per machine) would make any non-hotswap rebuilds slower.  To get
 the fast distributed rebuilds, you need to have at least as many
 vnodes per node as you do nodes in the cluster.  And you would still
 need the distributed rebuilds to deal with disk failure.

 2) Vnodes should be able to be hot pluged. My normal cassandra chassis
 would be a 2U with 6 drive bays. Imagine I have 10 nodes. Now if my
 chassis dies I should be able to take the disks out and physically
 plug them into another chassis. Then in cassandra I should be able to
 run a command like.
 nodetool attach '/mnt/disk6'. disk6 should contain all data an it's
 vnode information.

 Now this would be awesome for upgrades/migrations/etc.

 You know, your not the first person I've spoke to who has asked for
 this!  I do wonder whether it is optimising for the right thing though
 - in my experience, disks fail more often than machines.

 Thanks

 Tom

Re: RFC: Cassandra Virtual Nodes

2012-03-21 Thread Edward Capriolo

I just see vnodes as a way to make the problem smaller and by making the
problem smaller the overall system is more agile. Aka rather then 1 node
streaming 100 gb the 4 nodes stream 25gb. Moves by hand are not so bad
because the take 1/4th the time.

The most simple vnode implementation is vmware.  Just make sure that 3 vm
nodes consecutive nodes so not end up on the same host. This is wasteful
because we have 4 jvms.

I envision vnodes as Cassandra master being a shared cache,memtables, and
manager for what we today consider a Cassandra  instance. Makes it simple
to think about.

On Wednesday, March 21, 2012, Peter Schuller peter.schul...@infidyne.com
wrote:
 Software wise it is the same deal. Each node streams off only disk 4
 to the new node.

 I think an implication on software is that if you want to make
 specific selections of partitions to move, you are effectively
 incompatible with deterministically generating the mapping of
 partition to responsible node. I.e., it probably means the vnode
 information must be kept as state. It is probably difficult to
 reconcile with balancing solutions like consistent hashing/crush/etc.

 --
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: RFC: Cassandra Virtual Nodes

2012-03-19 Thread Edward Capriolo

On Mon, Mar 19, 2012 at 4:15 PM, Sam Overton s...@acunu.com wrote:
 On 19 March 2012 09:23, Radim Kolar h...@filez.com wrote:


 Hi Radim,

 The number of virtual nodes for each host would be configurable by the
 user, in much the same way that initial_token is configurable now. A host
 taking a larger number of virtual nodes (tokens) would have
 proportionately
 more of the data. This is how we anticipate support for heterogeneity in
 cluster hardware.

 Yes, but this is good only for random partitioner. For ordered you need to
 be able split token space on highly loaded servers. With virtual tokens it
 will move load to random node.
 What if random node will be also hotspot node? Administration will be more
 difficult because you don't know where workload lands after you reduce
 number of tokens held by node.

 For OPP we envisage an external management process performing active
 load balancing. The initial token assignment would be random within
 some user-specified range corresponding to the range of their keys.
 The load would then be monitored and hot-spots would be moved by
 reassigning virtual nodes to lightly loaded machines, or introducing
 new tokens into hot ranges. It makes sense that this would not be a
 manual process, but there would certainly be more control than just
 increasing or decreasing the number of tokens assigned to a node.

 --
 Sam Overton
 Acunu | http://www.acunu.com | @acunu

For OPP the problem of load balancing is more profound. Now you need
vnodes per keyspace because you can not expect each keyspace to have
the same distribution. With three keyspaces you are not unsure as to
which was is causing the hotness. I think OPP should just go away.

Re: RFC: Cassandra Virtual Nodes

2012-03-19 Thread Edward Capriolo

On Mon, Mar 19, 2012 at 4:24 PM, Sam Overton s...@acunu.com wrote:
 For OPP the problem of load balancing is more profound. Now you need
 vnodes per keyspace because you can not expect each keyspace to have
 the same distribution. With three keyspaces you are not unsure as to
 which was is causing the hotness. I think OPP should just go away.

 That's a good point, but isn't that the same problem with trying to
 balance tokens with OPP currently?

Yes. I was bringing this up because the external management process
you suggested performing active load balancing will have to be smart
enough to understand this. Right now since this is done manually it is
the users problem.

Re: RFC: Cassandra Virtual Nodes

2012-03-17 Thread Edward Capriolo

 I agree having smaller regions would help the rebalencing situation both
with rp and bop. However i an not sure if  dividing tables across disk s
will give any better performance. you will have more seeking spindles and
can possibly sub divide token ranges into separate files. But fs cache will
get shared across all disks so that is a wash.

On Saturday, March 17, 2012, Eric Evans eev...@acunu.com wrote:
 On Sat, Mar 17, 2012 at 11:15 AM, Radim Kolar h...@filez.com wrote:
 I don't like that every node will have same portion of data.

 1. We are using nodes with different HW sizes (number of disks)
 2.  especially with ordered partitioner there tends to be hotspots and
you
 must assign smaller portion of data to nodes holding hotspots

 Yes, these are exactly the sorts of problems that virtual nodes are
 meant to make easier.

 --
 Eric Evans
 Acunu | http://www.acunu.com | @acunu

Re: [VOTE] Release Apache Cassandra 1.0.8

2012-02-22 Thread Edward Capriolo

+1 (non binding) Great that 1.0.7 had a nice shelf life, and that
1.0.8 has just a couple minor patches.



On Wed, Feb 22, 2012 at 10:22 PM, Jonathan Ellis jbel...@gmail.com wrote:
 +1

 On Wed, Feb 22, 2012 at 8:57 AM, Sylvain Lebresne sylv...@datastax.com 
 wrote:
 Been some time since 1.0.7 and bugs have been fixed, I thus propose the
 following artifacts for release as 1.0.8.

 sha1: fe6980eb7d2df6cb53fd8a83226f91789beaced8
 Git: 
 http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/1.0.8-tentative
 Artifacts: 
 https://repository.apache.org/content/repositories/orgapachecassandra-008/org/apache/cassandra/apache-cassandra/1.0.8/
 Staging repository:
 https://repository.apache.org/content/repositories/orgapachecassandra-008/

 The artifacts as well as the debian package are also available here:
 http://people.apache.org/~slebresne/

 The vote will be open for 72 hours (longer if needed).

 [1]: http://goo.gl/vDKh9 (CHANGES.txt)
 [2]: http://goo.gl/sbvXW (NEWS.txt)



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com

Re: [VOTE] Release Apache Cassandra 1.1.0-beta1

2012-02-20 Thread Edward Capriolo

I have a github fork of hector that deals with the removed key and row
cache settings in the column family meta data.  I needed this because it
helped me build hive 0.8 and c* in the same jvm. But it is useful because
older Hector clients have trouble talking to 1.1 if you try anything meta
data related.

https://github.com/edwardcapriolo/hector

I have not checked in to see if hector has caught up with these changes,
but if you need a hector client for 1.1, just  to kick the tires, this code
does work.

On Monday, February 20, 2012, Sylvain Lebresne sylv...@datastax.com wrote:
 I count 3 binding +1's, one other +1 and no -1's. The vote passes.
 I'll get the artifacts published.

 --
 Sylvain

 On Fri, Feb 17, 2012 at 5:34 AM, Peter Schuller
 peter.schul...@infidyne.com wrote:
 +1

 --
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Discussion: release quality

2011-11-29 Thread Edward Capriolo

On Tue, Nov 29, 2011 at 6:16 PM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote:

 I'd like to start a discussion about ideas to improve release quality for
 Cassandra.  Specifically I wonder if the community can do more to help the
 project as a whole become more solid.  Cassandra has an active and vibrant
 community using Cassandra for a variety of things.  If we all pitch in a
 little bit, it seems like we can make a difference here.

 Release quality is difficult, especially for a distributed system like
 Cassandra.  The core devs have done an amazing job with this considering
 how complicated it is.  Currently, there are several things in place to
 make sure that a release is generally usable:
 - review-then-commit
 - 72 hour voting period
 - at least 3 binding +1 votes
 - unit tests
 - integration tests
 Then there is the personal responsibility aspect - testing a release in a
 staging environment before pushing it to production.

 I wonder if more could be done here to give more confidence in releases.
  I wanted to see if there might be ways that the community could help out
 without being too burdensome on either the core devs or the community.

 Some ideas:
 More automation: run YCSB and stress with various setups.  Maybe people
 can rotate donating cloud instances (or simply money for them) but have a
 common set of scripts to do this in the source.

 Dedicated distributed test suite: I know there has been work done on
 various distributed test suites (which is great!) but none have really
 caught on so far.

 I know what the apache guidelines say, but what if the community could
 help out with the testing effort in a more formal way.  For example, for
 each release to be finalized, what if there needed to be 3 community
 members that needed to try it out in their own environment?

 What if there was a post release +1 vote for the community to sign off on
 - sort of a works for me kind of thing to reassure others that it's safe
 to try.  So when the release email gets posted to the user list, start a
 tradition of people saying +1 in reply if they've tested it out and it
 works for them.  That's happening informally now when there are problems,
 but it might be nice to see a vote of confidence.  Just another idea.

 Any other ideas or variations?


I am no software engineering guru, but whenever I +1 a hive release I
actually do checkout the code and run a couple queries. Mostly I find that
because there is just so many things not unit testable like those gosh darn
bash scripts that launch Java applications. There have been times when even
after multiple patch revisions and passing unit tests something just does
not work in the real world. So I never +1 a binary release I don't spend an
hour with and if possible I try twisting the knobs on any new feature or at
least just trying the basics.Hive is aiming for something like quarterly
releases.

So possibly better to have Cassandra do time based releases. It does not
have to be quarterly but if people want bleeding edge features (something
committed 2 days ago) really they should go out and build something from
trunk.

It seems like Cassandra devs have the voting and releasing down to a
science but from my world the types of bugs I worry about are data file
corruption, and any weird bug that would result in data faults like
read_repair not working or writes not going to the write nodes, or bloom
filters giving a faulty result. New features are great and I love seeing
them but I can wait for those.

Updates now even trivial ones get political, you just never want to be the
guy that champions a update and then not have it go well :)

Most users of Cassandra are going to have large clusters and really the
project should not outstrip the common users ability to stay up to date.
You have to figure that a large cluster like 20 nodes with maybe 200Gb
data/node, doing a rolling restart without degrading performance is going
to take some time. This is more then 'yum update cassandra'
/etc/init.d/cassandra restart' and with risk of something going wrong
people need time to QA and time for ops. This type of person does not like
to fall many releases behind and likewise can not be updating too often
either.

I have never had to roll back a release but I do wait usually for a month
before running one to make sure there is not following soon.

Re: ByteBuffers and StorageProxy

2011-10-11 Thread Edward Capriolo

On Tue, Oct 11, 2011 at 12:25 PM, Todd Burruss bburr...@expedia.com wrote:

 My recent bug was that I was sending a zero length ByteBuffer (because I
 forgot to flip) for a column name.  The problem I have is that the insert
 was accepted by the server.  Should an exception be thrown?  The end result
 of allowing the insert is that the server will not restart if the data is
 still in the commit log (and maybe later too, not sure).


You may not have hit this at the thrift layer, because the AbstractTypes
have validation methods. Since StorageProxy is the expert interface having
redundant checks might cause code bloat. Open a jira and put in the patch.
The worse that can happen is it gets -1 ed.

Re: AW: interest in creating a cassandra-gossip library?

2011-09-02 Thread Edward Capriolo

Would this help?
http://svn.apache.org/viewvc?view=revisionrevision=1157967

On Fri, Sep 2, 2011 at 10:01 AM, Jake Farrell jfarr...@apache.org wrote:

 Roland, I was also interested in this for very similar reasons. I was
 planning on working on this after tackling some of the thrift build clean
 and transport requests i've been working on. I created CASSANDRA-3125 to
 help track the progress of this a little easier

 -Jake

 On Sep 2, 2011, at 5:05 AM, Roland Gude wrote:

  Hi,
 
  Is there any progress on this topic?
  I would really like to see such a library.
 
  -Ursprüngliche Nachricht-
  Von: matthew hawthorne [mailto:mhawtho...@gmail.com]
  Gesendet: Mittwoch, 22. Dezember 2010 20:07
  An: dev@cassandra.apache.org
  Betreff: interest in creating a cassandra-gossip library?
 
  hello,
 
  I'm starting a project at my day job to deploy a gossip protocol
  implementation.  part of my initial work is to evaluate existing
  implementations.
 
  being loosely familiar with Cassandra, I read
  http://wiki.apache.org/cassandra/ArchitectureGossip and have looked
  over the related code a bit.
 
  is there interest in breaking out the gossip-related portions of
  Cassandra into a library that could be reused by other projects?  I
  work on a team that is ready and willing to contribute heavily.  we'd
  just need some guidance as to how to structure the Cassandra
  subcomponent(s) and properly integrate them with the builds, tests,
  etc.
 
  here are a few examples of functionality we're looking to add:
 
  1) hierarchical state - our use case is cross data center gossip,
  where we don't want every node in the 2 clusters communicating, but do
  want a node from cluster1 to send a summary of the cluster's state to
  cluster2, and vice versa.  essentially I'm talking about rolling up
  the state of multiple nodes into a single virtual node
 
  2) mutual authentication - nodes verifying the identity of other nodes
  before gossipping
 
  3) encryption - encrypted traffic, especially for the cross data center
 case
 
  any opinions on this? thanks in advance for any feedback!
 
  -matt

Re: Announcements List

2011-07-20 Thread Edward Capriolo

On Wed, Jul 20, 2011 at 12:48 PM, David Boxenhorn da...@citypath.comwrote:

 I am not going to argue the point, because it's not really the point that I
 wanted to make. Maybe I'm an atypical user.

 The point I wanted to make is that there should be someplace for users to
 go
 to find out what's going on with Cassandra, with out all the noise of the
 user and dev lists, or JIRA. Upcoming features are important, because users
 make decisions based on them. We want to know what's in the pipeline. I'd
 argue that major architecture changes are important too - like the
 implementation of supercolumns or compaction - because Cassandra users have
 to know about these things sometimes, and make decisions because of them...
 but I won't argue for 100% of what I want if it will prevent me from
 getting
 90%!


 On Wed, Jul 20, 2011 at 7:34 PM, Jonathan Ellis jbel...@gmail.com wrote:

  You missed the point completely, then.
 
  We recommend not using Supercolumns if you don't know what you're
  doing, because almost everyone who does is doing it for the wrong
  reasons.
 
  If you are using Supercolumns for the right reasons (or even the wrong
  ones) you don't need to worry because the API is NOT going to change.
 
  On Wed, Jul 20, 2011 at 11:28 AM, David Boxenhorn da...@citypath.com
  wrote:
   Not true. I am a user. I consider this to be effectively the same as
   deprecating supercolumns (with support for the old API for backward
   compatibility). The fact that it is in the presentation that I linked
 to
  -
   from a DataStax employee! - with essentially the same message (i.e.
 don't
   use them if you're just starting), is more evidence that users should
  care
   about it.
  
   On Wed, Jul 20, 2011 at 7:21 PM, Jonathan Ellis jbel...@gmail.com
  wrote:
  
   That's exactly the kind of thing that *shouldn't* be on an announce
   list (and stay on the dev list), precisely because it deals with
   internals that users don't care about.
  
   On Wed, Jul 20, 2011 at 11:18 AM, David Boxenhorn da...@citypath.com
 
   wrote:
I would like to see this list also used for announcing upcoming
  features.
   At
some point a decision is made that some future version will include
  some
important feature. I don't want that information to be buried in a
  JIRA
ticket or a user/dev list discussion.
   
For example, I was surprised to learn, by accident, from
http://www.slideshare.net/mattdennis/cassandra-antipatterns , that
supercolumns will be replaced, internally, by composite columns.
 This
  is
something that we've discussed in the past, and that I have
 advocated
myself, but until now I have seen no indication that it would be
 done,
  or
that it was even viewed favorably by a consensus of decision makers.
   
   
On Mon, Jul 18, 2011 at 6:52 PM, Nick Bailey n...@datastax.com
  wrote:
   
DataStax has had requests for something like this. It seems like
something that would be generally useful for the community though.
   
Regarding twitter, I'm not sure a twitter account should be
 required
to get that information. I think you can follow a twitter account
 as
an rss feed though, so that might be a solution. That and the
  google
alert or email filter solutions just seem to be introducing more
difficulty for anyone trying to get that information. Perhaps the
demand for this isn't as high as I am imagining though.
   
My opinion on the list if we decide to go with that is that only
committers would be able to post to it and yes it would go to the
users list as well.
   
On Mon, Jul 18, 2011 at 10:32 AM, Sylvain Lebresne 
   sylv...@datastax.com
wrote:
 I have mixed feeling about that.

 On the one side, I agree with Gary that it doesn't add any real
  value.
 There is twitter,
 and we use consistent tagged subjects for release email, so it's
  easy
 to subscribe
 to the user list and set up a filter.

 That being said, I could understand that some people may find it
 cleaner to have a
 separate announce list and it is not something unheard of, so I'm
  ok
 with that if enough
 people thinks it's a good idea. But I think there is at least 2
 questions that come along:
  - should it be moderated ?
  - should announces still be sent to the user list ?

 --
 Sylvain

 On Mon, Jul 18, 2011 at 4:50 PM, Gary Dusbabek 
  gdusba...@gmail.com
wrote:
 Following @cassandra on twitter or a google alert would be
 simple
   enough
I
 think.

 Gary.

 On Mon, Jul 18, 2011 at 14:26, Nick Bailey n...@datastax.com
   wrote:

 What do we think about having a separate mailing list for just
 cassandra related announcements. The main purpose being
  announcing
   new
 releases once they pass a vote and are put up on the website. I
   think
 there is a desire for a way to be informed when new releases
 are

Re: [VOTE] Release Apache Cassandra 0.8.0 (take #3)

2011-05-31 Thread Edward Capriolo

On Tue, May 31, 2011 at 5:42 PM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote:

 +1 non-binding

 fwiw
 - I ran some basic pig scripts that I have with 0.8 and they worked fine
 including using a UDF, filtering data, and outputting to Cassandra.
 - I also tried the pig and word_count examples in the src download. For
 some reason the cassandra output reducer didn't output anything to Cassandra

 I don't think that problem is a showstopper because one, it's an example
 and two, it seems isolated to the example - I got pig to output to
 Cassandra, so it doesn't appear to be a bug in the CFOF or other things in
 core.  I did log a ticket on it as I am swamped at the moment and didn't see
 anything obvious.  It did output to the filesystem just fine though.  See
 https://issues.apache.org/jira/browse/CASSANDRA-2727

 On May 30, 2011, at 2:04 PM, Eric Evans wrote:

  OK, let's try this yet again; I propose the following artifacts for
 release
  as 0.8.0 (final).
 
  SVN:
 https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.8.0@r1129278
  Artifacts:
 https://repository.apache.org/content/repositories/orgapachecassandra-018/
  Driver Artifacts and Debian Package: http://people.apache.org/~eevans
 
  The vote will remain open for 72 hours (longer if needed).
 
  [1]: http://goo.gl/QY5dm (CHANGES.txt)
  [2]: http://goo.gl/CrJqJ (NEWS.txt)
 
  --
  Eric Evans
  eev...@rackspace.com
 
 
 


@Jeremy is this just because Pig's output does not use the OutputFormat?

If the OutputFormat does not work, then the cat is out of the bag, and I
would say it should get fixed. I would be a non binding -1

It could be significant because the OutputFormat was avro and now it has
been moved to thrift.

 switch to native Thrift for Hadoop map/reduce (CASSANDRA-2667)

So this would be a defect created by a feature of this release.

hinted handoff should use its own sstables

2011-04-26 Thread Edward Capriolo

So maybe this idea has been sent around before but I would like to
know what everyone thinks. We have a huge column family called bigdata
let's say 200 gb a node. We have used cass* as you would expect we
never read before writing and during our bulk loading we can get rates
like 2000 inserts per second per node. This morning I noticed this cf
on only some nodes had a lot of reads which went on for hours.

Since our apps should not have been reading I dove in. What was
happening was a node was down during the bulk load period. As a resukt
when it came alive the other node with hints went to deliver them. The
problem was the other node was high io trying to deliver hints. I see
why.

Cassandra does NOT write before read EXCEPT when writing a handoff.

This is not a good thing. It means the bigger big data cf gets the
more intensive delivering the hint will be on the sender side. Write
rate may be 2000 but they can not be read that fast.

I know you can now drop and throttle hh in 0.7.0 but this is not good
enough since this only takes longer to get consistent. Or you never
get consistent so here is my thinking...

Store hints in separate physical files and or possibly deliver those
file by streaming.

Maybe there is already a jira out there on this. I just work up so to
me it is an original idea :)

Re: current stable

2011-04-25 Thread Edward Capriolo

On Mon, Apr 25, 2011 at 12:53 PM, Jonathan Ellis jbel...@gmail.com wrote:
 Back when I used to run postgresql, I saw the same cycle:

  - most people don't bother testing until stable .0 is released
  - consequently, most people don't deploy to production until .1 is released

 I think moving the stable label would be futile since the majority
 will just wait that much longer to test. We did 4 RCs for 0.7; I don't
 think another (by whatever name) would have made much difference.

 It's worth pointing out that 0.7 was an unusually big release with a
 correspondingly unusually high bug count.  We had a much easier time
 getting the four previous releases to gel, and we've deliberately
 limited 0.8 to a similar scope as those.

 On Mon, Apr 25, 2011 at 8:00 AM, Jeremy Hanna
 jeremy.hanna1...@gmail.com wrote:
 As 0.8 approaches final status in the next few weeks, I wondered about how 
 releases receive the label, current stable.  I don't know if there's any 
 precedent for this, but I thought it might be nice to do a separate vote 
 when new major releases are out and weigh heavily those in the community 
 that can test the release against their use cases and perhaps client 
 developers (probably a subset of the former).  So for example, 0.8 comes out 
 and it is not labeled current stable until a separate vote has been taken 
 and it can be verified by a good portion of those doing testing against it 
 that it is in fact stable.

 I know that changes were put into place to get releases out faster, but I 
 think this change would be good so that current stable can have much more 
 meaning to people.  It's hard enough to pick up a new technology that has a 
 high learning curve without having to do testing on what is supposed to be 
 stable.

 Along with this, is it possible to separate out the releases in the apache 
 debian repo as David Strauss suggested so that we can have a stable line and 
 other labels for lines?

 Anyway, just wanted to propose something be done so that there could be more 
 credibility could be attached to current stable, and hopefully cassandra as 
 a whole could gain a more positive reputation for being stable as a result 
 (especially among new adopters).



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com


I can agree that 0.7.X was a big complex release. However calling
0.7.1 and 0.7.2 Generally Available like mysql does might make more
sense. This would let a know that this version should be ready for
battle, but it has not be proven in that context yet. This would also
let the user make the concious choice to use the newer edgy version,
rather then turning that user into tester.

Re: Welcome committer Jake Luciani

2011-01-13 Thread Edward Capriolo

Three cheers!

On Thu, Jan 13, 2011 at 1:45 PM, Jake Luciani jak...@gmail.com wrote:
 Thanks Jonathan and Cassandra PMC!
 Happy to help Cassandra take over the world!
 -Jake

 On Thu, Jan 13, 2011 at 1:41 PM, Jonathan Ellis jbel...@gmail.com wrote:

 The Cassandra PMC has voted to add Jake as a committer.  (Jake is also
 a committer on Thrift.)

 Welcome, Jake, and thanks for the hard work!

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com

1 2 >

1 - 100 of 102 matches

Mail list logo