[jira] [Updated] (CASSANDRA-5198) token () function automatically coerces types leading to confusing output

Sylvain Lebresne (JIRA) Wed, 30 Jan 2013 09:01:18 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sylvain Lebresne updated CASSANDRA-5198:
----------------------------------------

    Attachment: 0003-Respect-partitioner-type-for-Token-function.txt
                0002-Improve-printing-of-type-in-error-message.txt
                0001-Respect-CQL3-constant-types.txt

Attached 3 patches related to the proposed changes above:
# the first one adds proper type validation. In other word, it rejects a string 
value when the column is int, or reject an int value when the column is a blob 
(instead of interpreting it as an hex value which I'm pretty sure is 
counter-intuitive). This does however also reject a string value when the 
column is a blob, because I'm far from convince than interpreting the content 
of the string as an hex value is particularly intuitive. But to allow inserting 
blobs, it allow a new type of hex constants (that must start with '0x'). In 
other words, if b is a blob column:
{noformat}
UPDATE ... SET b = '00ff' ...
{noformat}
is not valid anymore, but
{noformat}
UPDATE ... SET b = 0x00ff ...
{noformat}
is. I note that the patch ain't tiny because it required a few refactoring here 
and there
to be done properly, but overall I think those refactor actually improve the
code.
# the second patch is mainly of cosmetic and make sure we use CQL3 type in CQL3 
error message. I.e. 'map<text, int>' rather than 
'org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type.Int32Type)'.
# the third patch make sure we take the partitioner token type into account. So 
if your partitioner is M3P you should provide a bigint value, if it's RP a 
varint one and if it's OPP a blob one.

Those patches don't add yet support for the token function in select clause 
that I talk above. I also want to add conversion function that allow to say 
convert a string or a uuid to a blob, but I want to refactor a bit the 
(currently ugly) handling of functions to do that so that will follow later 
(and it can be done in another ticket).

                
> token () function automatically coerces types leading to confusing output
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5198
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5198
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.1
>            Reporter: Edward Capriolo
>            Priority: Minor
>         Attachments: 0001-Respect-CQL3-constant-types.txt, 
> 0002-Improve-printing-of-type-in-error-message.txt, 
> 0003-Respect-partitioner-type-for-Token-function.txt
>
>
> This works as it should.
> {noformat}
> cqlsh:movies> select * from users where token (username) > token('') ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>     bsmith |         null |  null |       bob |    smith |     null
>  scapriolo |         null |  null |    stacey | capriolo |     null
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token('bsmith') ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  scapriolo |         null |  null |    stacey | capriolo |     null
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token('scapriolo') 
> ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  ecapriolo |         null |  null |    edward | capriolo |     null
> {noformat}
> But look what happens when you supply numbers into the token function.
> {noformat}
> qlsh:movies> select * from users where token (username) > token(0) ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token(1134314) ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>     bsmith |         null |  null |       bob |    smith |     null
>  scapriolo |         null |  null |    stacey | capriolo |     null
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token(113431431) ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  scapriolo |         null |  null |    stacey | capriolo |     null
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token(1134) ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token(1134434) ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  scapriolo |         null |  null |    stacey | capriolo |     null
> {noformat}
> This does not make sense to me. The token function is apparently converting 
> integers to strings leading to seemingly unpredictable results. 
> However I find this syntax odd, I feel like I should be able to say 
> 'token(username) > 0 and token(username) < 10' because from a thrift side I 
> can page tokens or I can page keys. In this case, I guess, I am only able to 
> page keys because the token is not returned to the user.
> Is token 0 = ''? How do I arrive at the minimal token for and int column. 
> Should the token() function at least be smart enough to reject integers for 
> string columns?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-5198) token () function automatically coerces types leading to confusing output

Reply via email to