Sylvain Lebresne created CASSANDRA-5226:
-------------------------------------------
Summary: CQL3 refactor to allow conversion function
Key: CASSANDRA-5226
URL: https://issues.apache.org/jira/browse/CASSANDRA-5226
Project: Cassandra
Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
In CASSANDRA-5198, we've fixed CQL3 type validation and talked about adding
conversion functions to ease working with the different data types. However,
the current CQL3 code makes it fairly hard to add such functions in a non-hacky
way. In fact, we already support a few conversion functions (token,
minTimeuuid, maxTimeuuid, now) but the way we support them is extremely ugly
(the token function is competely special cased and min/maxTimeuuid are ugly
hacks in TimeUUIDType that I'm really not proud of).
So I'm attaching a refactor that cleans that up, making it easy to add new
conversion functions. Now, said refactor is a big one. While the goal is to
make it easy to add functions, I think it also improve the code in the
following ways:
* It much more clearly separate the phase of validating the query from
executing it. And in particular, it moves more work in the preparing phase.
Typically, the parsing of constants is now done in the preparing phase, not the
execution one. It also groups validation code much more cleanly imo.
* It simplify UpdateStatement. The Operation business was not very clean and in
particular the same operations where not handled by the same code depending on
whether they were prepared or not, which was error prone. This is no longer the
case.
* It somewhat simplify the parser. A few parsing rules were a bit too
convoluted, trying to enforce invariants that are much more easily checked post
parsing (and doing it post parsing often allow better error messages, the
parser tends to send cryptic errors).
The first attached part is the initial refactor. It also adds some relatively
generic code for adding conversion functions (it would typically not be very
hard to allow user defined functions, though that's not part of the patch at
all) and uses that to handle the existing token, minTimeuuid and maxTimeuuid
functions.
It's also worth mentioning that this first patch introduces type casts. The
main reason is that it allows multiple overloads of the same function.
Typically, the minTimeuuid allows both strings arguments (for dates) or integer
ones (for timestamp), but so when you have:
{noformat}
SELECT * FROM foo WHERE t > minTimeuuid(?);
{noformat}
then the code doesn't know which function to use. So it complains. But you can
remove the ambiguity with
{noformat}
SELECT * FROM foo WHERE t > minTimeuuid((bigint)?);
{noformat}
The 2nd patch finishes what the first one started by extending this conversion
functions support to select clauses. So after this 2nd patch you can do stuff
like:
{noformat}
SELECT token(k), k FROM foo;
{noformat}
for instance.
The 3rd patch builds on that to actually add new conversion functions. Namely,
for every existing CQL3 <type> it adds a blobTo<type> and a <type>ToBlob
function that convert from and to blobs. And so you can do (not that this
example is particularly smart):
{noformat}
SELECT varintToBlob(v) FROM foo WHERE v > blobToVarint(bigintToBlob(3));
{noformat}
Honestly this last patch is more for demonstration purpose and we can discuss
this separately. In particular, we may want better different names for those
functions. But at least it should highlight that adding new function is easy
(this could be used to add methods to work with dates for instance).
Now, at least considering the 2 first patches, this is not a small amount of
code but I would still suggest pushing this in 1.2 (the patches are against
1.2) for the following reasons:
* It fixes a few existing bugs (CASSANDRA-5198 has broke prepared statement
for instance, which this patch fixes) and add missing validation in a few
places (we are allowing sets literals like \{ 1, 1 \} for instance, which is
kind of wrong as it suggests we support multisets). We could fix those
separatly but honestly I'm not we won't miss some.
* We do have a fair amount of CQL dtests and i've check all pass. The refactor
also cleans up some part of the code quite a bit imo. So overall I think I'm
almost more confident in the code post-refactor than the current one.
* We're early in 1.2 and it's an improvement after all. It would be a bit sad
to have to wait for 2.0 to get this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira