[
https://issues.apache.org/jira/browse/CASSANDRA-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sylvain Lebresne updated CASSANDRA-5198:
----------------------------------------
Attachment: 0003-Respect-partitioner-type-for-Token-function.txt
0002-Improve-printing-of-type-in-error-message.txt
0001-Respect-CQL3-constant-types.txt
Attached 3 patches related to the proposed changes above:
# the first one adds proper type validation. In other word, it rejects a string
value when the column is int, or reject an int value when the column is a blob
(instead of interpreting it as an hex value which I'm pretty sure is
counter-intuitive). This does however also reject a string value when the
column is a blob, because I'm far from convince than interpreting the content
of the string as an hex value is particularly intuitive. But to allow inserting
blobs, it allow a new type of hex constants (that must start with '0x'). In
other words, if b is a blob column:
{noformat}
UPDATE ... SET b = '00ff' ...
{noformat}
is not valid anymore, but
{noformat}
UPDATE ... SET b = 0x00ff ...
{noformat}
is. I note that the patch ain't tiny because it required a few refactoring here
and there
to be done properly, but overall I think those refactor actually improve the
code.
# the second patch is mainly of cosmetic and make sure we use CQL3 type in CQL3
error message. I.e. 'map<text, int>' rather than
'org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type.Int32Type)'.
# the third patch make sure we take the partitioner token type into account. So
if your partitioner is M3P you should provide a bigint value, if it's RP a
varint one and if it's OPP a blob one.
Those patches don't add yet support for the token function in select clause
that I talk above. I also want to add conversion function that allow to say
convert a string or a uuid to a blob, but I want to refactor a bit the
(currently ugly) handling of functions to do that so that will follow later
(and it can be done in another ticket).
> token () function automatically coerces types leading to confusing output
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-5198
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5198
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 1.2.1
> Reporter: Edward Capriolo
> Priority: Minor
> Attachments: 0001-Respect-CQL3-constant-types.txt,
> 0002-Improve-printing-of-type-in-error-message.txt,
> 0003-Respect-partitioner-type-for-Token-function.txt
>
>
> This works as it should.
> {noformat}
> cqlsh:movies> select * from users where token (username) > token('') ;
> username | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
> bsmith | null | null | bob | smith | null
> scapriolo | null | null | stacey | capriolo | null
> ecapriolo | null | null | edward | capriolo | null
> cqlsh:movies> select * from users where token (username) > token('bsmith') ;
> username | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
> scapriolo | null | null | stacey | capriolo | null
> ecapriolo | null | null | edward | capriolo | null
> cqlsh:movies> select * from users where token (username) > token('scapriolo')
> ;
> username | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
> ecapriolo | null | null | edward | capriolo | null
> {noformat}
> But look what happens when you supply numbers into the token function.
> {noformat}
> qlsh:movies> select * from users where token (username) > token(0) ;
> username | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
> ecapriolo | null | null | edward | capriolo | null
> cqlsh:movies> select * from users where token (username) > token(1134314) ;
> username | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
> bsmith | null | null | bob | smith | null
> scapriolo | null | null | stacey | capriolo | null
> ecapriolo | null | null | edward | capriolo | null
> cqlsh:movies> select * from users where token (username) > token(113431431) ;
> username | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
> scapriolo | null | null | stacey | capriolo | null
> ecapriolo | null | null | edward | capriolo | null
> cqlsh:movies> select * from users where token (username) > token(1134) ;
> username | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
> ecapriolo | null | null | edward | capriolo | null
> cqlsh:movies> select * from users where token (username) > token(1134434) ;
> username | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
> scapriolo | null | null | stacey | capriolo | null
> {noformat}
> This does not make sense to me. The token function is apparently converting
> integers to strings leading to seemingly unpredictable results.
> However I find this syntax odd, I feel like I should be able to say
> 'token(username) > 0 and token(username) < 10' because from a thrift side I
> can page tokens or I can page keys. In this case, I guess, I am only able to
> page keys because the token is not returned to the user.
> Is token 0 = ''? How do I arrive at the minimal token for and int column.
> Should the token() function at least be smart enough to reject integers for
> string columns?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira