Re: Why does `now()` produce different times within the same query?

Benjamin Roth Wed, 30 Nov 2016 22:43:36 -0800

Great comment. +1

Am 01.12.2016 06:29 schrieb "Ben Bromhead" <b...@instaclustr.com>:


> tl;dr +1 yup raise a jira to discuss how now() should behave in a single
> statement (and possible extend to batch statements).
>
> The values of now should be the same if you assume that now() works like
> it does in relational databases such as postgres or mysql, however at the
> moment it instead works like sysdate() in mysql. Given that CQL is supposed
> to be SQL like, I think the assumption around the behaviour of now() was a
> fair one to make.
>
> I definitely agree that raising a jira ticket would be a great place to
> discuss what the behaviour of now() should be for Cassandra. Personally I
> would be in favour of seeing the deterministic component (the actual time
> part) being the same across multiple calls in the one statement or multiple
> statements in a batch.
>
> Cassandra documentation does not make any claims as to how now() works
> within a single statement and reading the code it shows the intent is to
> work like sysdate() from MySQL rather than now(). One of the identified
> dangers of making cql similar to sql is that, while yes it aids adoption,
> users will find that SQL like things don't behave as expected. Of course as
> a user, one shouldn't have to read the source code to determine correct
> behaviour.
>
> Given that a timeuuid is made up of deterministic and (pseudo)
> non-deterministic components I can see why this issue has been largely
> ignored and hasn't had a chance for the behaviour to be formally defined
> (you would expect now to return the same time in the one statement despite
> multiple calls, but you wouldn't expect the same behaviour for say a call
> to rand()).
>
>
>
>
>
>
>
> On Wed, 30 Nov 2016 at 19:54 Cody Yancey <yan...@uber.com> wrote:
>
>>     This is not a bug, and in fact changing it would be a serious bug.
>>
>> False. Absolutely no consumer would be broken by a change to guarantee an
>> identical time component that isn't broken already, for the simple reason
>> your code already has to handle that case, as it is in fact the majority
>> case RIGHT NOW. Users can hit this bug, in production, because unit tests
>> might not experienced it! The time component should be the time that the
>> command was processed by the coordinator node.
>>
>>      would one expect a java/py/bash script that loops
>>
>> Individual Cassandra writes (which is what OP is referring to
>> specifically) are not loops. They are in almost every case atomic
>> operations that either succeed completely or fail completely. Allowing a
>> single atomic operation to witness multiple times in these corner cases is
>> not only surprising, as this thread demonstrates, it is also needlessly
>> restricting to what developers can use the database for, and provides NO
>> BENEFIT.
>>
>>     Calling now PRIOR to initiating multiple inserts is in most cases
>> exactly what one does...the ONLY practice is to set the value before
>> initiating the sequence of calls
>>
>> Also false. Cassandra does not have a way of doing this on the
>> coordinator node rather than the client device, and as I already showed,
>> the client device is the wrong place to do it in situations where
>> guaranteeing bounded clock-skew actually makes a difference one way or the
>> other.
>>
>> Thanks,
>> Cody
>>
>>
>>
>> On Wed, Nov 30, 2016 at 8:02 PM, daemeon reiydelle <daeme...@gmail.com>
>> wrote:
>>
>> This is not a bug, and in fact changing it would be a serious bug.
>>
>> What it is is a wonderful case of bad coding: would one expect a
>> java/py/bash script that loops on a bunch of read/execut/update calls where
>> each iteration calls time to return the same exact time for the duration of
>> the execution of the code? Whether the code runs for 5 seconds or 5 hours?
>>
>> Every call to a system call is unique, including within C*. Calling now
>> PRIOR to initiating multiple inserts is in most cases exactly what one does
>> to assure unique time stamps FOR THE BATCH OF INSERTS. To get a nearly
>> identical system time as would be the uuid of the row, one tries to call
>> time as close to just before the insert as possible. Then repeat.
>>
>> You have a logic issue in your code. If you want the same value for a set
>> of calls, the ONLY practice is to set the value before initiating the
>> sequence of calls.
>>
>>
>>
>> *.......*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>
>> On Wed, Nov 30, 2016 at 6:16 PM, Cody Yancey <yan...@uber.com> wrote:
>>
>> Getting the same TimeUUID values might be a major problem. Getting two
>> different TimeUUIDs that at least have time component would not be a major
>> problem as this is the main case today. Getting different time components
>> is actually the corner case, and it is a corner case that breaks
>> Internet-of-Things applications. We can tightly control clock skew in our
>> cluster. We most definitely CANNOT control clock skew on the thousands of
>> sensors that write to our cluster.
>>
>> Thanks,
>> Cody
>>
>> On Wed, Nov 30, 2016 at 5:33 PM, Robert Wille <rwi...@fold3.com> wrote:
>>
>> In my opinion, this is not broken and “fixing” it would break existing
>> code. Consider a batch that includes multiple inserts, each of which
>> inserts the value returned by now(). Getting the same UUID for each insert
>> would be a major problem.
>>
>> Cheers
>>
>> Robert
>>
>>
>> On Nov 30, 2016, at 4:46 PM, Todd Fast <t...@digitalexistence.com> wrote:
>>
>> FWIW I'd suggest opening a bug--this behavior is certainly quite
>> unexpected and more than just a documentation issue. In general I can't
>> imagine any desirable properties of the current implementation, and there
>> are likely a bunch of latent bugs sitting out there, so it should be fixed.
>>
>> Todd
>>
>> On Wed, Nov 30, 2016 at 12:37 PM Terry Liu <t...@turnitin.com> wrote:
>>
>> Sorry for my typo. Obviously, I meant:
>> "It appears that a single query that calls Cassandra's`now()` time
>> function *multiple times *may actually cause a query to write or return
>> different times."
>>
>> Less of a surprise now that I realize more about the implementation, but
>> I agree that more explicit documentation around when exactly the
>> "execution" of each now() statement happens and what implications it has
>> for the resulting timestamps would be helpful when running into this.
>>
>> Thanks for the quick responses!
>>
>> -Terry
>>
>>
>>
>> On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek <msval...@gmail.com>
>> wrote:
>>
>> every now() call in statement is under the hood "replaced" with newly
>> generated uuid.
>>
>> It can happen that they belong to  different milliseconds in time.
>>
>> If you need to have same timestamps you need to set them on the client
>> side.
>>
>>
>> @msvaljek <https://twitter.com/msvaljek>
>>
>> 2016-11-29 22:49 GMT+01:00 Terry Liu <t...@turnitin.com>:
>>
>> It appears that a single query that calls Cassandra's `now()` time
>> function may actually cause a query to write or return different times.
>>
>> Is this the expected or defined behavior, and if so, why does it behave
>> like this rather than evaluating `now()` once across an entire statement?
>>
>> This really affects UPDATE statements but to test it more easily, you
>> could try something like:
>>
>> SELECT toTimestamp(now()) as a, toTimestamp(now()) as b
>> FROM keyspace.table
>> LIMIT 100;
>>
>> If you run that a few times, you should eventually see that the timestamp
>> returned moves onto the next millisecond mid-query.
>>
>> --
>> *Software Engineer*
>> Turnitin - http://www.turnitin.com
>> t...@turnitin.com
>>
>>
>>
>>
>>
>> --
>> *Software Engineer*
>> Turnitin - http://www.turnitin.com
>> t...@turnitin.com
>>
>>
>>
>>
>>
>> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692 <+1%20650-284-9692>
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>

Re: Why does `now()` produce different times within the same query?

Reply via email to