Re: March 2015 QA retrospective

Ariel Weisberg Fri, 10 Apr 2015 13:41:43 -0700

Hi,

*CASSANDRA-8550 - Internal pagination in CQL3 index queries creating
substantial overhead*


Disabling things via flags would be a good safeguard like Java -XX flags to
enable experimental things. It's just more work. I am talking about users.
When you release monthly and discourage long running feature branches you
end up with functional, but not ready for primetime (although ready in the
regression free sense) things on trunk.

I have only done feature flags for things that are all or nothing such as
switching between two fundamentally different implementations of something,
one that is in progress and the other that is the legacy version  that
people use while the new one in progress.

Some times this is things like the flags for the various memtable
implementations or cache implementations. If you want to get militant about
how those played out we might actually say we don't let users configure
that sort of thing and then make sure we deliver an implementation that
meets everyone's needs and then force everyone to switch over.

The current arrangement where we have multiple implementations is not a
great outcome. We are splitting are efforts and not really taking each one
far enough.

Ariel

On Fri, Apr 10, 2015 at 3:00 PM, Tyler Hobbs <ty...@datastax.com> wrote:

> On Fri, Apr 10, 2015 at 10:56 AM, Ariel Weisberg <
> ariel.weisb...@datastax.com> wrote:
>
> > *CASSANDRA-7910- wildcard prepared statements are incorrect after a
> > column is added to the table*
> >
> > I was thinking of this more as an issue with how we test ALTER table than
> > with testing prepared statements. A lot of bugs arise from every kind of
> > schema change whether it add/drop keyspace/table or alter table because
> > most code is written assuming that things aren't changing.
> >
> > You would kind of hope that testing unprepared/statements everywhere you
> > test prepared statements would be redundant and that you would only have
> to
> > test the preparing process. Once the statement has been retrieved
> (prepared
> > or unprepared) everything should be the same. If it isn't then it's worth
> > thinking about what that implies for testing.
> >
>
> Aside from issues of result metadata (like 7910), prepared and unprepared
> statements have fairly different behavior due to how query parameters are
> handled.  (Technically you can pass binary query parameters with an
> unprepared statement, but in practice this doesn't work well for
> non-statically typed drivers like the python driver.)  So, we do really
> need good coverage of both prepared and unprepared statements.
>
> However, I do agree that it makes sense to organize those tests around
> ALTERing rather than prepared statements.
>
>
> >
> > *CASSANDRA-8264 - Problems with multicolumn relations and COMPACT
> STORAGE*
> >
> > I would look at the docs to find the settings then evaluate how they need
> > to be tested as if you were shipping it for the first time. That is for
> > filling in missing coverage for this kind of attribute.
> >
> > If the bug was in IN clauses permuted with compact storage maybe what we
> > are looking for is what are the meaningful things COMPACT STORAGE
> permutes
> > with. Identifying things that we should permute is helpful, but I think
> > trying to do it after the fact by reviewing code might not be very
> > productive. Maybe there is someone who worked on it that can comment on
> > existing and missing coverage.
> >
>
> COMPACT STORAGE changes the on-disk storage format as well as some of the
> classes that are used for querying and representing the data in memory.
> So, really, COMPACT STORAGE can affect all (supported) queries and schemas.
>
>
> >
> >
> > *CASSANDRA-8288 - cqlsh describe needs to show 'sstable_compression'*
> >
> > OK, what would the priority on this be? How important is DESCRIBE
> > correctness? Who is consuming it and making decisions off of it?
> >
>
> The most important use of DESCRIBE is probably saving schemas for backups
> or dev/staging clusters.  Correctness of the table options is pretty
> important.  Opened https://issues.apache.org/jira/browse/CASSANDRA-9169.
>
>
> >
> > *CASSANDRA-8302  - Filtering for CONTAINS (KEY) on frozen
> > collection clustering columns within a partition does not work*
> >
> > Which feature are we talking about, CONTAINS (KEY) or frozen collection?
> >
>
> I was referring to the frozen collections feature.
>
>
> > This looks like something that follows the form schema + data + query =
> > bug. It's a matter if figuring out how to explore the supported space of
> > schema, data, and queries. We know what the supported queries are, and we
> > know what the supported schemas are, and we can try and guess at what
> > representative data is. Then it's a matter of trying to permute this
> stuff
> > effectively rather then manually enumerating cases. I think the challenge
> > with the enumeration approach is knowing what answer the database should
> > give for a given query.
> >
> > What I have seen done that is "easy" with a SQL database you run the
> > queries against your implementation and your reference implementation
> > (whose syntax and behaviors you are imitating) and then compare the
> > results. For automatically generating this kind of thing we don't have a
> > reference CQL implementation we can use to know what the answer to
> queries
> > should be.
> >
> > We could build one, but that seems like a time sync and it would still
> > share a lot of code with the real implementation. Maybe something crazy
> > like mapping CQL to SQL? Nothing sounds very attractive.
> >
>
> Developing a rule set for what queries can be executed under what
> conditions seems like the sanest approach, although it's obviously not
> simple to do.  The upside is that it would be fairly easy to maintain as we
> expand the capabilities of CQL.
>
>
> >
> >
> > *CASSANDRA-8410 - Select with many IN values on clustering columns
> > can result in a StackOverflowError*
> >
> > OK can you file a ticket for the boundary conditions of IN. If you can
> link
> > to the existing test coverage just so we know what is already tested that
> > would be helpful.
> >
>
> The boundary coverage for IN is sufficient now, I think.  But I think it
> would still be useful to test boundary conditions for some of the other
> cases I mentioned, so I'll open a ticket for that.
>
>
> >
> >
> > *CASSANDRA-8490 - DISTINCT queries with LIMITs or paging are incorrect
> > when partitions are delete*
> >
> > OK, can you file and label a ticket for this?
> >
>
> Opened https://issues.apache.org/jira/browse/CASSANDRA-9171
>
>
> >
> > *CASSANDRA-8512 - cqlsh unusable after encountering schema mismatch*
> >
> > I think we should be able to test cqlsh against a degraded cluster using
> > the kitchen sink harness. It would just be one more test module runs
> > concurrently with everything else. Since the harness will degrade the
> > cluster the test module should be able to test how cqlsh handles that.
> >
> > Can you create and label a ticket for this?
> >
>
> Opened https://issues.apache.org/jira/browse/CASSANDRA-9172
>
>
> >
> > *CASSANDRA-8550 - Internal pagination in CQL3 index queries
> > creating substantial overhead*
> >
> > I bring this up mostly because the original implementor worked on a
> feature
> > that had performance as part of its success criteria, and I want to float
> > my thoughts on QA performance expectations for new features.
> >
> > Knowing the performance is part of it, but it's pretty common with
> monthly
> > releases that you will release a feature and the performance is not
> great,
> > or not great in some use case. Releasing the implementation of a feature
> > doesn't mean you have to announce and encourage its use.
> >
> > In fact with monthly releases I think we need to have a policy that
> people
> > not use in progress functionality that we don't document. When we release
> > off of trunk and develop off of trunk there is always going to be stuff
> > that isn't ready for prime time.
> >
>
> Wouldn't a feature-enabling flag be more effective than a policy?  Or by
> "have a policy that people not use in progress functionality", do you mean
> C* developers using it in other parts of the code, not users?
>
>
> >
> > Circling back to performance I think the criteria QA wise for a new
> feature
> > is that we don't have people use it until it is "fit for purpose" as in
> > performs adequately for some use cases and if there is a use case where
> it
> > is a bad idea that be part of the documentation.
> >
> > Performance tends to evolve because a lot of optimizations involve
> > additional code and complexity that an initial implementation won't
> require
> > to be correct or useful (for some use cases). Monthly releases make these
> > intermediate steps more visible.
> >
> > Once we have performance workloads in CI then having a workload and graph
> > based on the feature should probably be part of "done".
> >
>
> Seems reasonable to me, but others should probably weigh in here.
>
>
> >
> > *CASSANDRA-8563 - cqlsh broken for some thrift created tables*
> >
> > How much do we want to invest in this? Historically how bad has the
> > compatibility been? Minor issues like cqlsh not working or are there
> > corruption and loss issues associated with this?
> >
>
> No corruption or data loss, mostly schema metadata problems or failing
> queries like https://issues.apache.org/jira/browse/CASSANDRA-8178.
>
> Eventually we really need 100% Thrift compatibility or migration options
> for all of the corner cases.  (I think some of the schema work 3.0 and 8099
> will help, here.)
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Re: March 2015 QA retrospective

Reply via email to