[ 
https://issues.apache.org/jira/browse/CASSANDRA-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550990#comment-13550990
 ] 

Sylvain Lebresne edited comment on CASSANDRA-4936 at 1/11/13 9:00 AM:
----------------------------------------------------------------------

I really think the bug here is that TimeUUIDType.fromString() accepts a date as 
input. But a date is *not* a valid representation of a timeuuid, and the 
fromString method does *arbitrarily* pick some 0's for parts of the resulting 
UUID.

In other words, the SELECT query above should be invalid.

Now don't get me wrong, selecting timeuuid based on dates is useful but that is 
a slightly different problem. So what I think we should do is:
# refuse dates as valid timeuuid values because they just are not.
# add convenience methods to translate dates to precise timeuuid. For querying 
we would have 'startOf()' and 'endOf()' (where 'startOf(<date> )' (resp. 
'endOf(<date> )') would return the *smallest* (resp. *biggest*) possible 
timeuuid at time <date> ). And for insertion we could optionally add 
'random(<date> )' that would return a random timeuuid at time '<date>' (we 
could even accept 'now' as syntactic sugar for 'random(now)' if we feel like 
it).

That would also mean that cqlsh should stop this non-sense of displaying 
timeuuid like date. Again, I understand the intention of making it more 
readable but this will confuse generations of CQL3 users. I do am in favor of 
finding a non confusing way to make it readable for users. In fact one solution 
could be to handle that on the CQL side and to allow 'SELECT dateOf(x ) FROM 
...' that would return a date string from timeuuid x (but now it's clear, 
you've explicitly asked for a lossy representation of x).

I note that this suggestion pretty much fixes the problem discussed in 
CASSANDRA-4284 too.

I note that Tyler's solution of basically automatically generating the 
startOf() and endOf() method under the cover based on whether we've use an 
inclusive of exclusive operation may appear seductive but I don't think we 
should do that because:
# if you do that, what about SELECT ... WHERE activity_id = '2012-11-07 
18:18:22-0800'. You still have no solution for that and by doing magic under 
the carpet for < and >, you've in fact blurred what = really does.
# "it would just require passing along information about whether to create the 
highest or lowest TimeUUID representation for a given datestamp based on the 
comparison operator that's used" <- while this seem simple on principle, this 
will yield very *very* ugly special cases internally. This is *not* 2 lines of 
code.
# more generally, this doesn't solve the fact that date *are not* valid 
representation of timeuuid. For example, I still think the first point 
mentioned in CASSANDRA-4284 is a bug in its own right.

Allowing dates as valid representation of timeuuid is a bug, let's fix it.

                
      was (Author: slebresne):
    I really think the bug here is that TimeUUIDType.fromString() accepts a 
date as input. But a date is *not* a valid representation of a timeuuid, and 
the fromString method does *arbitrarily* pick some 0's for parts of the 
resulting UUID.

In other words, the SELECT query above should be invalid.

Now don't get me wrong, selecting timeuuid based on dates is useful but that is 
a slightly different problem. So what I think we should do is:
# refuse dates as valid timeuuid values because they just are not.
# add convenience methods (say 'startOf()' and 'endOf()') to translate dates to 
precise timeuuid. For querying we would have 'startOf()' and 'endOf()' (where 
'startOf(<date>)' (resp. 'endOf(<date>)') would return the *smallest* (resp. 
*biggest*) possible timeuuid at time <date>). And for insertion we could 
optionally add 'random(<date>)' that would return a random timeuuid at time 
<date> (we could even accept 'now' as syntactic sugar for 'random(now)' if we 
feel like it).

That would also mean that cqlsh should stop this non-sense of displaying 
timeuuid like date. Again, I understand the intention of making it more 
readable but this will confuse generations of CQL3 users. I do am in favor of 
finding a non confusing way to make it readable for users. In fact one solution 
could be to handle that on the CQL side and to allow 'SELECT dateOf(x) FROM 
...' that would return a date string from timeuuid x (but now it's clear, 
you've explicitly asked for a lossy representation of x).

I note that this suggestion pretty much fixes the problem discussed in 
CASSANDRA-4284 too.

I note that Tyler's solution of basically automatically generating the 
startOf() and endOf() method under the cover based on whether we've use an 
inclusive of exclusive operation may appear seductive but I don't think we 
should do that because:
# if you do that, what about SELECT ... WHERE activity_id = '2012-11-07 
18:18:22-0800'. You still have no solution for that and by doing magic under 
the carpet for < and >, you've in fact blurred what = really does.
# "it would just require passing along information about whether to create the 
highest or lowest TimeUUID representation for a given datestamp based on the 
comparison operator that's used" <- while this seem simple on principle, this 
will yield very *very* ugly special cases internally. This is *not* 2 lines of 
code.
# more generally, this doesn't solve the fact that date *are not* valid 
representation of timeuuid. For example, I still think the first point 
mentioned in CASSANDRA-4284 is a bug in its own right.

Allowing dates as valid representation of timeuuid is a bug, let's fix it.

                  
> Less than operator when comparing timeuuids behaves as less than equal.
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-4936
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4936
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.0
>         Environment: Linux CentOS.
> Linux localhost.localdomain 2.6.18-308.16.1.el5 #1 SMP Tue Oct 2 22:01:37 EDT 
> 2012 i686 i686 i386 GNU/Linux
>            Reporter: Cesar Lopez-Nataren
>             Fix For: 1.2.2
>
>
> If we define the following column family using CQL3:
> CREATE TABLE useractivity (
>   user_id int,
>   activity_id 'TimeUUIDType',
>   data text,
>   PRIMARY KEY (user_id, activity_id)
> );
> Add some values to it.
> And then query it like:
> SELECT * FROM useractivity WHERE user_id = '3' AND activity_id < '2012-11-07 
> 18:18:22-0800' ORDER BY activity_id DESC LIMIT 1;
> the record with timeuuid '2012-11-07 18:18:22-0800' returns in the results.
> According to the documentation, on CQL3 the '<' and '>' operators are strict, 
> meaning not inclusive, so this seems to be a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to