Michaël Figuière created CASSANDRA-9202:
-------------------------------------------
Summary: Introduce CQL Error Codes
Key: CASSANDRA-9202
URL: https://issues.apache.org/jira/browse/CASSANDRA-9202
Project: Cassandra
Issue Type: Improvement
Reporter: Michaël Figuière
As features have been added or modified over the past years, the constraints
applied on the CQL language itself have evolved rapidly, making it challenging
for users, but also for documentation writers, testers and developers tools
developers to keep up with what's possible or not, but also to understand the
meaning of each error message and the rule behind it.
Besides it, as error messages for any single error may change over time to fix
a typo or to rephrase it, they cannot be used as a stable reference of errors.
It feels like the right time to make error handling more formal in Cassandra,
by introducing a very classic mechanism in the database world: error codes.
The purpose of this ticket is to introduce these codes, but *does not* cover
the way they should then be exposed. Bellow is my proposition along with the
strategy I used:
*Gathering Cassandra errors*
I've walked through the source of all the past releases of Cassandra its git
repo using a script in order to capture all the CQL related
{{InvalidRequestException}} that are thrown. Considering it represents most of
the CQL errors that may be returned by Cassandra.
Bellow is the list of all errors that have been introduced and modified between
Cassandra 1.2.0 and the current trunk:
https://gist.github.com/mfiguiere/3036c2a54af016bbeb58
The complete list of CQL errors declared in each Cassandra source file, along
with the range of versions in which they appeared is as follow:
https://gist.github.com/mfiguiere/42166586647c34b1a41c
That's a lot of them... Clearly we can only focus on Cassandra 3.0+ here, that
is on the current trunk.
*Categorizing errors*
It's common for database to categorize errors in order to make it simpler for
the user to understand its nature or to walk through a list of them:
* PostgreSQL (http://www.postgresql.org/docs/9.4/static/errcodes-appendix.html)
* MySQL (http://dev.mysql.com/doc/refman/5.6/en/error-messages-server.html)
* Oracle (http://docs.oracle.com/cd/B28359_01/server.111/b28278/toc.htm)
One issue that can be observed in these 3 examples is that the codes they use
are fairly cryptic and not really readable.
It felt to me that a categorization by feature would be helpful for Cassandra.
And rather than building hexadecimal prefixes, let's use readable string ones.
We then end up with the following list of CQL error codes:
https://gist.github.com/mfiguiere/7a19f8368b3ab4fbef3a
That's about 260 errors overall for the current trunk, but broken into
categories it feels to me that it remains very easy to browse and review.
*Native Protocol Error Codes vs. CQL Error Codes*
We actually already introduced the concept of error codes into the Native
Protocol specification. These codes represent execution exceptions and are used
by the clients to understand the kind of problem that occurred.
I propose to rename these error codes to "Execution Error Codes" and to
introduce with this ticket "CQL Error Codes", as they address two different
kind of issues and audiences.
*Introducing CQL Error Codes*
Once an agreement will be reached on the list of error the strategy to
introduce them into the codebase would be as follow:
1. We have to wait until CASSANDRA-8099 is merged, as it'll significantly
change the way Exceptions are manipulated in the CQL codebase.
2. We introduce a {{cql_errors.spec}} file that defines the list of all CQL
errors that may be thrown by Cassandra.
3. We modify the sources to introduce the appropriate cqlErrorCode along with
any error that is declared.
4. Once merged, any subsequent addition or modification of an error in the
sources in the future should lead to the appropriate mutation of the
{{cql_errors.spec}} file in order to keep it in sync.
*Benefits*
I see several benefits in this approach:
* Provides an immediate, comprehensive documentation of the CQL Errors (and
thus the corresponding rules and constraints).
* Easy to maintain. Easy to repair in case of missed update through some greps
in the codebase.
* Being guaranteed to be maintained, it can serve as a solid reference for any
of the more detailed documentation that are produced (CQL spec, Cassandra
doc,...).
* Provides a clear summary of the errors thrown by each features of Cassandra,
making it simple to catch inconsistencies, lack of normalization, and
duplicates.
* Will enable easier implementation of sophisticated features in monitoring and
developer tools such as counters of error codes, help for errors sent by
Cassandra, external CQL validation, ...
* SEO / StackOverflow friendly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)