tl;dr +1 yup raise a jira to discuss how now() should behave in a single
statement (and possible extend to batch statements).

The values of now should be the same if you assume that now() works like it
does in relational databases such as postgres or mysql, however at the
moment it instead works like sysdate() in mysql. Given that CQL is supposed
to be SQL like, I think the assumption around the behaviour of now() was a
fair one to make.

I definitely agree that raising a jira ticket would be a great place to
discuss what the behaviour of now() should be for Cassandra. Personally I
would be in favour of seeing the deterministic component (the actual time
part) being the same across multiple calls in the one statement or multiple
statements in a batch.

Cassandra documentation does not make any claims as to how now() works
within a single statement and reading the code it shows the intent is to
work like sysdate() from MySQL rather than now(). One of the identified
dangers of making cql similar to sql is that, while yes it aids adoption,
users will find that SQL like things don't behave as expected. Of course as
a user, one shouldn't have to read the source code to determine correct
behaviour.

Given that a timeuuid is made up of deterministic and (pseudo)
non-deterministic components I can see why this issue has been largely
ignored and hasn't had a chance for the behaviour to be formally defined
(you would expect now to return the same time in the one statement despite
multiple calls, but you wouldn't expect the same behaviour for say a call
to rand()).







On Wed, 30 Nov 2016 at 19:54 Cody Yancey <yan...@uber.com> wrote:

>     This is not a bug, and in fact changing it would be a serious bug.
>
> False. Absolutely no consumer would be broken by a change to guarantee an
> identical time component that isn't broken already, for the simple reason
> your code already has to handle that case, as it is in fact the majority
> case RIGHT NOW. Users can hit this bug, in production, because unit tests
> might not experienced it! The time component should be the time that the
> command was processed by the coordinator node.
>
>      would one expect a java/py/bash script that loops
>
> Individual Cassandra writes (which is what OP is referring to
> specifically) are not loops. They are in almost every case atomic
> operations that either succeed completely or fail completely. Allowing a
> single atomic operation to witness multiple times in these corner cases is
> not only surprising, as this thread demonstrates, it is also needlessly
> restricting to what developers can use the database for, and provides NO
> BENEFIT.
>
>     Calling now PRIOR to initiating multiple inserts is in most cases
> exactly what one does...the ONLY practice is to set the value before
> initiating the sequence of calls
>
> Also false. Cassandra does not have a way of doing this on the coordinator
> node rather than the client device, and as I already showed, the client
> device is the wrong place to do it in situations where guaranteeing bounded
> clock-skew actually makes a difference one way or the other.
>
> Thanks,
> Cody
>
>
>
> On Wed, Nov 30, 2016 at 8:02 PM, daemeon reiydelle <daeme...@gmail.com>
> wrote:
>
> This is not a bug, and in fact changing it would be a serious bug.
>
> What it is is a wonderful case of bad coding: would one expect a
> java/py/bash script that loops on a bunch of read/execut/update calls where
> each iteration calls time to return the same exact time for the duration of
> the execution of the code? Whether the code runs for 5 seconds or 5 hours?
>
> Every call to a system call is unique, including within C*. Calling now
> PRIOR to initiating multiple inserts is in most cases exactly what one does
> to assure unique time stamps FOR THE BATCH OF INSERTS. To get a nearly
> identical system time as would be the uuid of the row, one tries to call
> time as close to just before the insert as possible. Then repeat.
>
> You have a logic issue in your code. If you want the same value for a set
> of calls, the ONLY practice is to set the value before initiating the
> sequence of calls.
>
>
>
> *.......*
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>
> On Wed, Nov 30, 2016 at 6:16 PM, Cody Yancey <yan...@uber.com> wrote:
>
> Getting the same TimeUUID values might be a major problem. Getting two
> different TimeUUIDs that at least have time component would not be a major
> problem as this is the main case today. Getting different time components
> is actually the corner case, and it is a corner case that breaks
> Internet-of-Things applications. We can tightly control clock skew in our
> cluster. We most definitely CANNOT control clock skew on the thousands of
> sensors that write to our cluster.
>
> Thanks,
> Cody
>
> On Wed, Nov 30, 2016 at 5:33 PM, Robert Wille <rwi...@fold3.com> wrote:
>
> In my opinion, this is not broken and “fixing” it would break existing
> code. Consider a batch that includes multiple inserts, each of which
> inserts the value returned by now(). Getting the same UUID for each insert
> would be a major problem.
>
> Cheers
>
> Robert
>
>
> On Nov 30, 2016, at 4:46 PM, Todd Fast <t...@digitalexistence.com> wrote:
>
> FWIW I'd suggest opening a bug--this behavior is certainly quite
> unexpected and more than just a documentation issue. In general I can't
> imagine any desirable properties of the current implementation, and there
> are likely a bunch of latent bugs sitting out there, so it should be fixed.
>
> Todd
>
> On Wed, Nov 30, 2016 at 12:37 PM Terry Liu <t...@turnitin.com> wrote:
>
> Sorry for my typo. Obviously, I meant:
> "It appears that a single query that calls Cassandra's`now()` time
> function *multiple times *may actually cause a query to write or return
> different times."
>
> Less of a surprise now that I realize more about the implementation, but I
> agree that more explicit documentation around when exactly the "execution"
> of each now() statement happens and what implications it has for the
> resulting timestamps would be helpful when running into this.
>
> Thanks for the quick responses!
>
> -Terry
>
>
>
> On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek <msval...@gmail.com> wrote:
>
> every now() call in statement is under the hood "replaced" with newly
> generated uuid.
>
> It can happen that they belong to  different milliseconds in time.
>
> If you need to have same timestamps you need to set them on the client
> side.
>
>
> @msvaljek <https://twitter.com/msvaljek>
>
> 2016-11-29 22:49 GMT+01:00 Terry Liu <t...@turnitin.com>:
>
> It appears that a single query that calls Cassandra's `now()` time
> function may actually cause a query to write or return different times.
>
> Is this the expected or defined behavior, and if so, why does it behave
> like this rather than evaluating `now()` once across an entire statement?
>
> This really affects UPDATE statements but to test it more easily, you
> could try something like:
>
> SELECT toTimestamp(now()) as a, toTimestamp(now()) as b
> FROM keyspace.table
> LIMIT 100;
>
> If you run that a few times, you should eventually see that the timestamp
> returned moves onto the next millisecond mid-query.
>
> --
> *Software Engineer*
> Turnitin - http://www.turnitin.com
> t...@turnitin.com
>
>
>
>
>
> --
> *Software Engineer*
> Turnitin - http://www.turnitin.com
> t...@turnitin.com
>
>
>
>
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Reply via email to