Re: Why does `now()` produce different times within the same query?

2016-12-03 Thread Jeff Jirsa


On 2016-12-03 08:44 (-0800), Edward Capriolo  wrote: 
> On Sat, Dec 3, 2016 at 11:01 AM, Edward Capriolo 
> wrote:
> 
> >
> >
> >  A new unique timeuuid (at the time where the statement using it is
> > executed).
> >
> > Indicates that each statement has one unique time uuid. Calling the udf
> > twice in one statement and getting different results dissagrees with the
> > documentation.
> >
> 
> https://issues.apache.org/jira/browse/CASSANDRA-12989
> 

Reasonable change to me. doc change committed to trunk - it'll make its way to 
the site soon'ish.



Re: Why does `now()` produce different times within the same query?

2016-12-03 Thread Edward Capriolo
On Sat, Dec 3, 2016 at 11:01 AM, Edward Capriolo 
wrote:

>
>
> On Saturday, December 3, 2016, Edward Capriolo 
> wrote:
>
>>
>>
>> On Saturday, December 3, 2016, Jonathan Haddad  wrote:
>>
>>> That isn't what the original thread is about. The thread is about the
>>> timestamp portion of the UUID being different.
>>>
>>> Having UUID() return the same thing for all rows in a batch would be the
>>> unexpected thing virtually every time.
>>> On Sat, Dec 3, 2016 at 7:09 AM Edward Capriolo 
>>> wrote:
>>>


 On Friday, December 2, 2016, Jonathan Haddad  wrote:

> This isn't about using the same UUID though. It's about the timestamp
> bits in the UUID.
>
> What the use case is for generating multiple UUIDs in a single row?
> Why do you need to extract the timestamp out of both?
> On Fri, Dec 2, 2016 at 10:24 AM Edward Capriolo 
> wrote:
>
>>
>> On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne <
>> sylv...@datastax.com> wrote:
>>
>>> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo <
>>> edlinuxg...@gmail.com> wrote:
>>>

 I am not sure you saw my reply on thread but I believe everyone's
 needs can be met I will copy that here:

>>>
>>> I saw it, but the real problem that was raised initially was not
>>> that of UDF and of allowing both behavior. It's a matter of people being
>>> confused by the behavior of a non-UDF function, now(), and suggesting it
>>> should be changed.
>>>
>>> The Hive idea is interesting I guess, and we can switch to
>>> discussing that, but it's a different problem really and I'm not a fond 
>>> of
>>> derailing threads. I will just note though that if we're not talking 
>>> about
>>> a confusion issue but rather how to get a timeuuid to be fixed within a
>>> statement, then there is much much more trivial solution: generate it
>>> client side. The `now()` function is a small convenience but there is
>>> nothing you cannot do without it client side, and that actually 
>>> basically
>>> stands for almost any use of (non aggregate) function in Cassandra
>>> currently.
>>>
>>>


 "Food for thought: Hive's UDFs introduced an annotation
 @UDFType(deterministic = false)

 http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map
 -and-reduce-side-in-hive/

 The effect is the query planner can see when such a UDF is in use
 and determine the value once at the start of a very long query."

 Essentially hive had a similar if not identical problem, during a
 long running distributed process like map/reduce some users wanted the
 semantics of:

 1) Each call should have a new timestamps

 While other users wanted the semantics of:

 2) Each call should generate the same timestamp

 The solution implemented was to add an annotation to udf such that
 the query planner would pick up the annotation and act accordingly.

 (Here is a related issue https://issues.apache.or
 g/jira/browse/HIVE-1986

 As a result you can essentially implement two UDFS

 @UDFType(deterministic = false)
 public class UDFNow

 and for the other people

 @UDFType(deterministic = true)
 public class UDFNowOnce extends UDFNow

 Both user cases are met in a sensible way.

>>>
>>>
>> The `now()` function is a small convenience but there is nothing you
>> cannot do without it client side, and that actually basically stands for
>> almost any use of (non aggregate) function in Cassandra currently.
>>
>> Casandra's changing philosophy over which entity should create such
>> information client/server/driver does not make this problem easy.
>>
>> If you take into account that you have users who do not understand
>> all the intricacy of uuid the problem is compounded. IE How does one
>> generate a UUID each c#, python, java etc? with the 47 random bits of bla
>> bla. That is not super easy information to find. Maybe you find a stack
>> overflow post that actually gives bad advice etc.
>>
>> Many times in Cassandra you are using a uuid because you do not have
>> a unique key in the insert and you wish to create one. If you are 
>> inserting
>> more then a single record using that same UUID and you do not want the
>> burden of wanting to do it yourself you would have to do 
>> write>>read>>write
>> which is an anti-pattern.
>>
>
 Not multiple ids for a single row. The same id for multiple inserts in
 a batch.


Re: Why does `now()` produce different times within the same query?

2016-12-03 Thread Edward Capriolo
On Saturday, December 3, 2016, Edward Capriolo 
wrote:

>
>
> On Saturday, December 3, 2016, Jonathan Haddad  > wrote:
>
>> That isn't what the original thread is about. The thread is about the
>> timestamp portion of the UUID being different.
>>
>> Having UUID() return the same thing for all rows in a batch would be the
>> unexpected thing virtually every time.
>> On Sat, Dec 3, 2016 at 7:09 AM Edward Capriolo 
>> wrote:
>>
>>>
>>>
>>> On Friday, December 2, 2016, Jonathan Haddad  wrote:
>>>
 This isn't about using the same UUID though. It's about the timestamp
 bits in the UUID.

 What the use case is for generating multiple UUIDs in a single row? Why
 do you need to extract the timestamp out of both?
 On Fri, Dec 2, 2016 at 10:24 AM Edward Capriolo 
 wrote:

>
> On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne <
> sylv...@datastax.com> wrote:
>
>> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo <
>> edlinuxg...@gmail.com> wrote:
>>
>>>
>>> I am not sure you saw my reply on thread but I believe everyone's
>>> needs can be met I will copy that here:
>>>
>>
>> I saw it, but the real problem that was raised initially was not that
>> of UDF and of allowing both behavior. It's a matter of people being
>> confused by the behavior of a non-UDF function, now(), and suggesting it
>> should be changed.
>>
>> The Hive idea is interesting I guess, and we can switch to discussing
>> that, but it's a different problem really and I'm not a fond of derailing
>> threads. I will just note though that if we're not talking about a
>> confusion issue but rather how to get a timeuuid to be fixed within a
>> statement, then there is much much more trivial solution: generate it
>> client side. The `now()` function is a small convenience but there is
>> nothing you cannot do without it client side, and that actually basically
>> stands for almost any use of (non aggregate) function in Cassandra
>> currently.
>>
>>
>>>
>>>
>>> "Food for thought: Hive's UDFs introduced an annotation
>>> @UDFType(deterministic = false)
>>>
>>> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map
>>> -and-reduce-side-in-hive/
>>>
>>> The effect is the query planner can see when such a UDF is in use
>>> and determine the value once at the start of a very long query."
>>>
>>> Essentially hive had a similar if not identical problem, during a
>>> long running distributed process like map/reduce some users wanted the
>>> semantics of:
>>>
>>> 1) Each call should have a new timestamps
>>>
>>> While other users wanted the semantics of:
>>>
>>> 2) Each call should generate the same timestamp
>>>
>>> The solution implemented was to add an annotation to udf such that
>>> the query planner would pick up the annotation and act accordingly.
>>>
>>> (Here is a related issue https://issues.apache.or
>>> g/jira/browse/HIVE-1986
>>>
>>> As a result you can essentially implement two UDFS
>>>
>>> @UDFType(deterministic = false)
>>> public class UDFNow
>>>
>>> and for the other people
>>>
>>> @UDFType(deterministic = true)
>>> public class UDFNowOnce extends UDFNow
>>>
>>> Both user cases are met in a sensible way.
>>>
>>
>>
> The `now()` function is a small convenience but there is nothing you
> cannot do without it client side, and that actually basically stands for
> almost any use of (non aggregate) function in Cassandra currently.
>
> Casandra's changing philosophy over which entity should create such
> information client/server/driver does not make this problem easy.
>
> If you take into account that you have users who do not understand all
> the intricacy of uuid the problem is compounded. IE How does one generate 
> a
> UUID each c#, python, java etc? with the 47 random bits of bla bla. That 
> is
> not super easy information to find. Maybe you find a stack overflow post
> that actually gives bad advice etc.
>
> Many times in Cassandra you are using a uuid because you do not have a
> unique key in the insert and you wish to create one. If you are inserting
> more then a single record using that same UUID and you do not want the
> burden of wanting to do it yourself you would have to do 
> write>>read>>write
> which is an anti-pattern.
>

>>> Not multiple ids for a single row. The same id for multiple inserts in a
>>> batch.
>>>
>>> For example lets say I have an application where my data has no unique
>>> key.
>>>
>>> Table poke
>>> Poker, pokee, time
>>>
>>> Suppose i consume pokes from kafka build 

Re: Why does `now()` produce different times within the same query?

2016-12-03 Thread Edward Capriolo
On Saturday, December 3, 2016, Jonathan Haddad  wrote:

> That isn't what the original thread is about. The thread is about the
> timestamp portion of the UUID being different.
>
> Having UUID() return the same thing for all rows in a batch would be the
> unexpected thing virtually every time.
> On Sat, Dec 3, 2016 at 7:09 AM Edward Capriolo  > wrote:
>
>>
>>
>> On Friday, December 2, 2016, Jonathan Haddad > > wrote:
>>
>>> This isn't about using the same UUID though. It's about the timestamp
>>> bits in the UUID.
>>>
>>> What the use case is for generating multiple UUIDs in a single row? Why
>>> do you need to extract the timestamp out of both?
>>> On Fri, Dec 2, 2016 at 10:24 AM Edward Capriolo 
>>> wrote:
>>>

 On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne  wrote:

> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo  > wrote:
>
>>
>> I am not sure you saw my reply on thread but I believe everyone's
>> needs can be met I will copy that here:
>>
>
> I saw it, but the real problem that was raised initially was not that
> of UDF and of allowing both behavior. It's a matter of people being
> confused by the behavior of a non-UDF function, now(), and suggesting it
> should be changed.
>
> The Hive idea is interesting I guess, and we can switch to discussing
> that, but it's a different problem really and I'm not a fond of derailing
> threads. I will just note though that if we're not talking about a
> confusion issue but rather how to get a timeuuid to be fixed within a
> statement, then there is much much more trivial solution: generate it
> client side. The `now()` function is a small convenience but there is
> nothing you cannot do without it client side, and that actually basically
> stands for almost any use of (non aggregate) function in Cassandra
> currently.
>
>
>>
>>
>> "Food for thought: Hive's UDFs introduced an annotation  
>> @UDFType(deterministic
>> = false)
>>
>> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-
>> map-and-reduce-side-in-hive/
>>
>> The effect is the query planner can see when such a UDF is in use and
>> determine the value once at the start of a very long query."
>>
>> Essentially hive had a similar if not identical problem, during a
>> long running distributed process like map/reduce some users wanted the
>> semantics of:
>>
>> 1) Each call should have a new timestamps
>>
>> While other users wanted the semantics of:
>>
>> 2) Each call should generate the same timestamp
>>
>> The solution implemented was to add an annotation to udf such that
>> the query planner would pick up the annotation and act accordingly.
>>
>> (Here is a related issue https://issues.apache.
>> org/jira/browse/HIVE-1986
>>
>> As a result you can essentially implement two UDFS
>>
>> @UDFType(deterministic = false)
>> public class UDFNow
>>
>> and for the other people
>>
>> @UDFType(deterministic = true)
>> public class UDFNowOnce extends UDFNow
>>
>> Both user cases are met in a sensible way.
>>
>
>
 The `now()` function is a small convenience but there is nothing you
 cannot do without it client side, and that actually basically stands for
 almost any use of (non aggregate) function in Cassandra currently.

 Casandra's changing philosophy over which entity should create such
 information client/server/driver does not make this problem easy.

 If you take into account that you have users who do not understand all
 the intricacy of uuid the problem is compounded. IE How does one generate a
 UUID each c#, python, java etc? with the 47 random bits of bla bla. That is
 not super easy information to find. Maybe you find a stack overflow post
 that actually gives bad advice etc.

 Many times in Cassandra you are using a uuid because you do not have a
 unique key in the insert and you wish to create one. If you are inserting
 more then a single record using that same UUID and you do not want the
 burden of wanting to do it yourself you would have to do write>>read>>write
 which is an anti-pattern.

>>>
>> Not multiple ids for a single row. The same id for multiple inserts in a
>> batch.
>>
>> For example lets say I have an application where my data has no unique
>> key.
>>
>> Table poke
>> Poker, pokee, time
>>
>> Suppose i consume pokes from kafka build a batch of 30k and insert them.
>> You probably want to denormalize into two tables:
>> Primary key (poker, time)
>> Primary key (pokee,time)
>>
>> It makes 

Re: Why does `now()` produce different times within the same query?

2016-12-03 Thread Jonathan Haddad
That isn't what the original thread is about. The thread is about the
timestamp portion of the UUID being different.

Having UUID() return the same thing for all rows in a batch would be the
unexpected thing virtually every time.
On Sat, Dec 3, 2016 at 7:09 AM Edward Capriolo 
wrote:

>
>
> On Friday, December 2, 2016, Jonathan Haddad  wrote:
>
> This isn't about using the same UUID though. It's about the timestamp bits
> in the UUID.
>
> What the use case is for generating multiple UUIDs in a single row? Why do
> you need to extract the timestamp out of both?
> On Fri, Dec 2, 2016 at 10:24 AM Edward Capriolo 
> wrote:
>
>
> On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne 
> wrote:
>
> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo 
> wrote:
>
>
> I am not sure you saw my reply on thread but I believe everyone's needs
> can be met I will copy that here:
>
>
> I saw it, but the real problem that was raised initially was not that of
> UDF and of allowing both behavior. It's a matter of people being confused
> by the behavior of a non-UDF function, now(), and suggesting it should be
> changed.
>
> The Hive idea is interesting I guess, and we can switch to discussing
> that, but it's a different problem really and I'm not a fond of derailing
> threads. I will just note though that if we're not talking about a
> confusion issue but rather how to get a timeuuid to be fixed within a
> statement, then there is much much more trivial solution: generate it
> client side. The `now()` function is a small convenience but there is
> nothing you cannot do without it client side, and that actually basically
> stands for almost any use of (non aggregate) function in Cassandra
> currently.
>
>
>
>
> "Food for thought: Hive's UDFs introduced an annotation  
> @UDFType(deterministic
> = false)
>
>
> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map-and-reduce-side-in-hive/
>
> The effect is the query planner can see when such a UDF is in use and
> determine the value once at the start of a very long query."
>
> Essentially hive had a similar if not identical problem, during a long
> running distributed process like map/reduce some users wanted the semantics
> of:
>
> 1) Each call should have a new timestamps
>
> While other users wanted the semantics of:
>
> 2) Each call should generate the same timestamp
>
> The solution implemented was to add an annotation to udf such that the
> query planner would pick up the annotation and act accordingly.
>
> (Here is a related issue https://issues.apache.org/jira/browse/HIVE-1986
>
> As a result you can essentially implement two UDFS
>
> @UDFType(deterministic = false)
> public class UDFNow
>
> and for the other people
>
> @UDFType(deterministic = true)
> public class UDFNowOnce extends UDFNow
>
> Both user cases are met in a sensible way.
>
>
>
> The `now()` function is a small convenience but there is nothing you
> cannot do without it client side, and that actually basically stands for
> almost any use of (non aggregate) function in Cassandra currently.
>
> Casandra's changing philosophy over which entity should create such
> information client/server/driver does not make this problem easy.
>
> If you take into account that you have users who do not understand all the
> intricacy of uuid the problem is compounded. IE How does one generate a
> UUID each c#, python, java etc? with the 47 random bits of bla bla. That is
> not super easy information to find. Maybe you find a stack overflow post
> that actually gives bad advice etc.
>
> Many times in Cassandra you are using a uuid because you do not have a
> unique key in the insert and you wish to create one. If you are inserting
> more then a single record using that same UUID and you do not want the
> burden of wanting to do it yourself you would have to do write>>read>>write
> which is an anti-pattern.
>
>
> Not multiple ids for a single row. The same id for multiple inserts in a
> batch.
>
> For example lets say I have an application where my data has no unique
> key.
>
> Table poke
> Poker, pokee, time
>
> Suppose i consume pokes from kafka build a batch of 30k and insert them.
> You probably want to denormalize into two tables:
> Primary key (poker, time)
> Primary key (pokee,time)
>
> It makes sense that they all have the same uuid if you want it to be the
> uuid of the batch. This would make it easy to correlate all the events.
> Easy to delete them all as well.
>
> The do it client side argument is totally valid, but has been a
> justification for not adding features many of which are eventually added
> anyway.
>
>
>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>


Re: Why does `now()` produce different times within the same query?

2016-12-03 Thread Edward Capriolo
On Friday, December 2, 2016, Jonathan Haddad  wrote:

> This isn't about using the same UUID though. It's about the timestamp bits
> in the UUID.
>
> What the use case is for generating multiple UUIDs in a single row? Why do
> you need to extract the timestamp out of both?
> On Fri, Dec 2, 2016 at 10:24 AM Edward Capriolo  > wrote:
>
>>
>> On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne > > wrote:
>>
>>> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo >> > wrote:
>>>

 I am not sure you saw my reply on thread but I believe everyone's needs
 can be met I will copy that here:

>>>
>>> I saw it, but the real problem that was raised initially was not that of
>>> UDF and of allowing both behavior. It's a matter of people being confused
>>> by the behavior of a non-UDF function, now(), and suggesting it should be
>>> changed.
>>>
>>> The Hive idea is interesting I guess, and we can switch to discussing
>>> that, but it's a different problem really and I'm not a fond of derailing
>>> threads. I will just note though that if we're not talking about a
>>> confusion issue but rather how to get a timeuuid to be fixed within a
>>> statement, then there is much much more trivial solution: generate it
>>> client side. The `now()` function is a small convenience but there is
>>> nothing you cannot do without it client side, and that actually basically
>>> stands for almost any use of (non aggregate) function in Cassandra
>>> currently.
>>>
>>>


 "Food for thought: Hive's UDFs introduced an annotation  
 @UDFType(deterministic
 = false)

 http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-
 map-and-reduce-side-in-hive/

 The effect is the query planner can see when such a UDF is in use and
 determine the value once at the start of a very long query."

 Essentially hive had a similar if not identical problem, during a long
 running distributed process like map/reduce some users wanted the semantics
 of:

 1) Each call should have a new timestamps

 While other users wanted the semantics of:

 2) Each call should generate the same timestamp

 The solution implemented was to add an annotation to udf such that the
 query planner would pick up the annotation and act accordingly.

 (Here is a related issue https://issues.apache.
 org/jira/browse/HIVE-1986

 As a result you can essentially implement two UDFS

 @UDFType(deterministic = false)
 public class UDFNow

 and for the other people

 @UDFType(deterministic = true)
 public class UDFNowOnce extends UDFNow

 Both user cases are met in a sensible way.

>>>
>>>
>> The `now()` function is a small convenience but there is nothing you
>> cannot do without it client side, and that actually basically stands for
>> almost any use of (non aggregate) function in Cassandra currently.
>>
>> Casandra's changing philosophy over which entity should create such
>> information client/server/driver does not make this problem easy.
>>
>> If you take into account that you have users who do not understand all
>> the intricacy of uuid the problem is compounded. IE How does one generate a
>> UUID each c#, python, java etc? with the 47 random bits of bla bla. That is
>> not super easy information to find. Maybe you find a stack overflow post
>> that actually gives bad advice etc.
>>
>> Many times in Cassandra you are using a uuid because you do not have a
>> unique key in the insert and you wish to create one. If you are inserting
>> more then a single record using that same UUID and you do not want the
>> burden of wanting to do it yourself you would have to do write>>read>>write
>> which is an anti-pattern.
>>
>
Not multiple ids for a single row. The same id for multiple inserts in a
batch.

For example lets say I have an application where my data has no unique key.

Table poke
Poker, pokee, time

Suppose i consume pokes from kafka build a batch of 30k and insert them.
You probably want to denormalize into two tables:
Primary key (poker, time)
Primary key (pokee,time)

It makes sense that they all have the same uuid if you want it to be the
uuid of the batch. This would make it easy to correlate all the events.
Easy to delete them all as well.

The do it client side argument is totally valid, but has been a
justification for not adding features many of which are eventually added
anyway.




-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: Why does `now()` produce different times within the same query?

2016-12-02 Thread Jonathan Haddad
This isn't about using the same UUID though. It's about the timestamp bits
in the UUID.

What the use case is for generating multiple UUIDs in a single row? Why do
you need to extract the timestamp out of both?
On Fri, Dec 2, 2016 at 10:24 AM Edward Capriolo 
wrote:

>
> On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne 
> wrote:
>
> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo 
> wrote:
>
>
> I am not sure you saw my reply on thread but I believe everyone's needs
> can be met I will copy that here:
>
>
> I saw it, but the real problem that was raised initially was not that of
> UDF and of allowing both behavior. It's a matter of people being confused
> by the behavior of a non-UDF function, now(), and suggesting it should be
> changed.
>
> The Hive idea is interesting I guess, and we can switch to discussing
> that, but it's a different problem really and I'm not a fond of derailing
> threads. I will just note though that if we're not talking about a
> confusion issue but rather how to get a timeuuid to be fixed within a
> statement, then there is much much more trivial solution: generate it
> client side. The `now()` function is a small convenience but there is
> nothing you cannot do without it client side, and that actually basically
> stands for almost any use of (non aggregate) function in Cassandra
> currently.
>
>
>
>
> "Food for thought: Hive's UDFs introduced an annotation  
> @UDFType(deterministic
> = false)
>
>
> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map-and-reduce-side-in-hive/
>
> The effect is the query planner can see when such a UDF is in use and
> determine the value once at the start of a very long query."
>
> Essentially hive had a similar if not identical problem, during a long
> running distributed process like map/reduce some users wanted the semantics
> of:
>
> 1) Each call should have a new timestamps
>
> While other users wanted the semantics of:
>
> 2) Each call should generate the same timestamp
>
> The solution implemented was to add an annotation to udf such that the
> query planner would pick up the annotation and act accordingly.
>
> (Here is a related issue https://issues.apache.org/jira/browse/HIVE-1986
>
> As a result you can essentially implement two UDFS
>
> @UDFType(deterministic = false)
> public class UDFNow
>
> and for the other people
>
> @UDFType(deterministic = true)
> public class UDFNowOnce extends UDFNow
>
> Both user cases are met in a sensible way.
>
>
>
> The `now()` function is a small convenience but there is nothing you
> cannot do without it client side, and that actually basically stands for
> almost any use of (non aggregate) function in Cassandra currently.
>
> Casandra's changing philosophy over which entity should create such
> information client/server/driver does not make this problem easy.
>
> If you take into account that you have users who do not understand all the
> intricacy of uuid the problem is compounded. IE How does one generate a
> UUID each c#, python, java etc? with the 47 random bits of bla bla. That is
> not super easy information to find. Maybe you find a stack overflow post
> that actually gives bad advice etc.
>
> Many times in Cassandra you are using a uuid because you do not have a
> unique key in the insert and you wish to create one. If you are inserting
> more then a single record using that same UUID and you do not want the
> burden of wanting to do it yourself you would have to do write>>read>>write
> which is an anti-pattern.
>


Re: Why does `now()` produce different times within the same query?

2016-12-02 Thread Edward Capriolo
On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne 
wrote:

> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo 
> wrote:
>
>>
>> I am not sure you saw my reply on thread but I believe everyone's needs
>> can be met I will copy that here:
>>
>
> I saw it, but the real problem that was raised initially was not that of
> UDF and of allowing both behavior. It's a matter of people being confused
> by the behavior of a non-UDF function, now(), and suggesting it should be
> changed.
>
> The Hive idea is interesting I guess, and we can switch to discussing
> that, but it's a different problem really and I'm not a fond of derailing
> threads. I will just note though that if we're not talking about a
> confusion issue but rather how to get a timeuuid to be fixed within a
> statement, then there is much much more trivial solution: generate it
> client side. The `now()` function is a small convenience but there is
> nothing you cannot do without it client side, and that actually basically
> stands for almost any use of (non aggregate) function in Cassandra
> currently.
>
>
>>
>>
>> "Food for thought: Hive's UDFs introduced an annotation
>> @UDFType(deterministic = false)
>>
>> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map
>> -and-reduce-side-in-hive/
>>
>> The effect is the query planner can see when such a UDF is in use and
>> determine the value once at the start of a very long query."
>>
>> Essentially hive had a similar if not identical problem, during a long
>> running distributed process like map/reduce some users wanted the semantics
>> of:
>>
>> 1) Each call should have a new timestamps
>>
>> While other users wanted the semantics of:
>>
>> 2) Each call should generate the same timestamp
>>
>> The solution implemented was to add an annotation to udf such that the
>> query planner would pick up the annotation and act accordingly.
>>
>> (Here is a related issue https://issues.apache.org/jira/browse/HIVE-1986
>>
>> As a result you can essentially implement two UDFS
>>
>> @UDFType(deterministic = false)
>> public class UDFNow
>>
>> and for the other people
>>
>> @UDFType(deterministic = true)
>> public class UDFNowOnce extends UDFNow
>>
>> Both user cases are met in a sensible way.
>>
>
>
The `now()` function is a small convenience but there is nothing you cannot
do without it client side, and that actually basically stands for almost
any use of (non aggregate) function in Cassandra currently.

Casandra's changing philosophy over which entity should create such
information client/server/driver does not make this problem easy.

If you take into account that you have users who do not understand all the
intricacy of uuid the problem is compounded. IE How does one generate a
UUID each c#, python, java etc? with the 47 random bits of bla bla. That is
not super easy information to find. Maybe you find a stack overflow post
that actually gives bad advice etc.

Many times in Cassandra you are using a uuid because you do not have a
unique key in the insert and you wish to create one. If you are inserting
more then a single record using that same UUID and you do not want the
burden of wanting to do it yourself you would have to do write>>read>>write
which is an anti-pattern.


Re: Why does `now()` produce different times within the same query?

2016-12-01 Thread Ben Bromhead
>
>
>
> I will note that Ben seems to suggest keeping the return of now() unique
> across
> call while keeping the time component equals, thus varying the rest of the
> uuid
> bytes. However:
>  - I'm starting to wonder what this would buy us. Why would someone be
> super
>confused by the time changing across calls (in a single
> statement/batch), but
>be totally not confused by the actual full return to not be equal?
>
Given that a common way of interacting with timeuuids is with toTimestamp I
can see the confusion and assumptions around behaviour.

And how is
>that actually useful: you're having different result anyway and you're
>letting the server pick the timestamp in the first place, so you're
> probably
>not caring about milliseconds precision of that timestamp in the first
> place.
>
If you want consistency of timestamps within your query as OP did I can see
how this is useful. Postgres claims this is a "feature".

 - This would basically be a violation of the timeuuid spec
>

Not quite... Type 1 uuids let you swap out the low 47 bits of the node
component with other randomly generated bits (
https://www.ietf.org/rfc/rfc4122.txt)

 - This would be a big pain in the code and make of now() a special case
> among functions. I'm unconvinced special cases are making things easier
> in general.
>

On reflection, I have to agree here, now() has been around for ever and
this is the first anecdote I've seen of someone getting caught out.

However with my user advocate hat on I think it would be worth
investigating further beyond a documentation update if others found it a
sticking point in Cassandra adoption.


> So I'm all for improving the documentation if this confuses users due to
> expectations (mistakenly) carried from prior experiences, and please
> feel free to open a JIRA for that. I'm a lot less in agreement that there
> is
> something wrong with the way the function behave in principle.
>


> > I can see why this issue has been largely ignored and hasn't had a
> chance for
> > the behaviour to be formally defined
>
> Don't make too much assumptions. The behavior is perfectly well defined:
> now()
> is a "normal" function and is evaluated whenever it's called according to
> the
> timeuuid spec (or as close to it as we can make it).
>
Maybe formally defined is the wrong term... Formally documented?

>
> On Thu, Dec 1, 2016 at 7:25 AM, Benjamin Roth 
> wrote:
>
> Great comment. +1
>
> Am 01.12.2016 06:29 schrieb "Ben Bromhead" :
>
> tl;dr +1 yup raise a jira to discuss how now() should behave in a single
> statement (and possible extend to batch statements).
>
> The values of now should be the same if you assume that now() works like
> it does in relational databases such as postgres or mysql, however at the
> moment it instead works like sysdate() in mysql. Given that CQL is supposed
> to be SQL like, I think the assumption around the behaviour of now() was a
> fair one to make.
>
> I definitely agree that raising a jira ticket would be a great place to
> discuss what the behaviour of now() should be for Cassandra. Personally I
> would be in favour of seeing the deterministic component (the actual time
> part) being the same across multiple calls in the one statement or multiple
> statements in a batch.
>
> Cassandra documentation does not make any claims as to how now() works
> within a single statement and reading the code it shows the intent is to
> work like sysdate() from MySQL rather than now(). One of the identified
> dangers of making cql similar to sql is that, while yes it aids adoption,
> users will find that SQL like things don't behave as expected. Of course as
> a user, one shouldn't have to read the source code to determine correct
> behaviour.
>
> Given that a timeuuid is made up of deterministic and (pseudo)
> non-deterministic components I can see why this issue has been largely
> ignored and hasn't had a chance for the behaviour to be formally defined
> (you would expect now to return the same time in the one statement despite
> multiple calls, but you wouldn't expect the same behaviour for say a call
> to rand()).
>
>
>
>
>
>
>
> On Wed, 30 Nov 2016 at 19:54 Cody Yancey  wrote:
>
> This is not a bug, and in fact changing it would be a serious bug.
>
> False. Absolutely no consumer would be broken by a change to guarantee an
> identical time component that isn't broken already, for the simple reason
> your code already has to handle that case, as it is in fact the majority
> case RIGHT NOW. Users can hit this bug, in production, because unit tests
> might not experienced it! The time component should be the time that the
> command was processed by the coordinator node.
>
>  would one expect a java/py/bash script that loops
>
> Individual Cassandra writes (which is what OP is referring to
> specifically) are not loops. They are in almost every case atomic
> operations that either 

Re: Why does `now()` produce different times within the same query?

2016-12-01 Thread Marko Švaljek
One millisecond is not an issue in most of Internet of Things projects out
there. There are lots of connection related things that add far more
latency to the requests than that. Especially if you take into account the
time it takes for the data to actually come to a cassandra node in the
background etc. I'm simply not aware of any larger projects where edge
devices write directly to cassandra.

Requests almost always come in to some sort of gateway before that. The
usual pattern is storing the timestamp measured on the device (if it even
has own clock) and timestamp when it was received on the platform side.
Having two same timestamps on millisecond level in one insert statement
generated by now() simply doesn't add that much to the table.

Only case that comes to my mind would be when there is time series
bucketing of inserts and placing measurements in partitions based on some
sort of a mapping function with the results of now() but then again this is
usually done on the server side, I'm not sure it would be best practice to
do it within the insert.

Even if it would be done that way, analytics (be it near real time or
batch) usually takes that kind of things into account and compensates -
reports rarely show millisecond level dynamics.

In the end it just wouldn't be a good idea to change behaviour of a
function being around for quite some time.


@msvaljek 

2016-12-01 18:10 GMT+01:00 Cody Yancey :

> On Thu, Dec 1, 2016 at 11:09 AM Sylvain Lebresne 
> wrote:
>
>> there is much much more trivial solution: generate it client side. The
>> `now()` function is a small convenience but there is nothing you cannot do
>> without it client side
>>
>
> Please see my post above as to why this is a bad idea for inserts based on
> request time where knowing the time the request was made is actually
> important.
>
> Cody
>
>>
>


Re: Why does `now()` produce different times within the same query?

2016-12-01 Thread Cody Yancey
On Thu, Dec 1, 2016 at 11:09 AM Sylvain Lebresne 
wrote:

> there is much much more trivial solution: generate it client side. The
> `now()` function is a small convenience but there is nothing you cannot do
> without it client side
>

Please see my post above as to why this is a bad idea for inserts based on
request time where knowing the time the request was made is actually
important.

Cody

>


Re: Why does `now()` produce different times within the same query?

2016-12-01 Thread Edward Capriolo
On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne 
wrote:

> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo 
> wrote:
>
>>
>> I am not sure you saw my reply on thread but I believe everyone's needs
>> can be met I will copy that here:
>>
>
> I saw it, but the real problem that was raised initially was not that of
> UDF and of allowing both behavior. It's a matter of people being confused
> by the behavior of a non-UDF function, now(), and suggesting it should be
> changed.
>
> The Hive idea is interesting I guess, and we can switch to discussing
> that, but it's a different problem really and I'm not a fond of derailing
> threads. I will just note though that if we're not talking about a
> confusion issue but rather how to get a timeuuid to be fixed within a
> statement, then there is much much more trivial solution: generate it
> client side. The `now()` function is a small convenience but there is
> nothing you cannot do without it client side, and that actually basically
> stands for almost any use of (non aggregate) function in Cassandra
> currently.
>
>
>>
>>
>> "Food for thought: Hive's UDFs introduced an annotation
>> @UDFType(deterministic = false)
>>
>> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map
>> -and-reduce-side-in-hive/
>>
>> The effect is the query planner can see when such a UDF is in use and
>> determine the value once at the start of a very long query."
>>
>> Essentially hive had a similar if not identical problem, during a long
>> running distributed process like map/reduce some users wanted the semantics
>> of:
>>
>> 1) Each call should have a new timestamps
>>
>> While other users wanted the semantics of:
>>
>> 2) Each call should generate the same timestamp
>>
>> The solution implemented was to add an annotation to udf such that the
>> query planner would pick up the annotation and act accordingly.
>>
>> (Here is a related issue https://issues.apache.org/jira/browse/HIVE-1986
>>
>> As a result you can essentially implement two UDFS
>>
>> @UDFType(deterministic = false)
>> public class UDFNow
>>
>> and for the other people
>>
>> @UDFType(deterministic = true)
>> public class UDFNowOnce extends UDFNow
>>
>> Both user cases are met in a sensible way.
>>
>
>
I agree that changing the semantics of something already in existence is a
bad idea. What is there "now" no pun on works should stay working as is.

I will also point out that presto addresses this issue with specific
functions:

https://prestodb.io/docs/current/functions/datetime.html

localtime -> time

Returns the current time as of the start of the query.
localtimestamp -> timestamp

Returns the current timestamp as of the start of the query.


Re: Why does `now()` produce different times within the same query?

2016-12-01 Thread Jonathan Haddad
+1 to everything Sylvan said.
On Thu, Dec 1, 2016 at 11:09 AM Sylvain Lebresne 
wrote:

> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo 
> wrote:
>
>
> I am not sure you saw my reply on thread but I believe everyone's needs
> can be met I will copy that here:
>
>
> I saw it, but the real problem that was raised initially was not that of
> UDF and of allowing both behavior. It's a matter of people being confused
> by the behavior of a non-UDF function, now(), and suggesting it should be
> changed.
>
> The Hive idea is interesting I guess, and we can switch to discussing
> that, but it's a different problem really and I'm not a fond of derailing
> threads. I will just note though that if we're not talking about a
> confusion issue but rather how to get a timeuuid to be fixed within a
> statement, then there is much much more trivial solution: generate it
> client side. The `now()` function is a small convenience but there is
> nothing you cannot do without it client side, and that actually basically
> stands for almost any use of (non aggregate) function in Cassandra
> currently.
>
>
>
>
> "Food for thought: Hive's UDFs introduced an annotation  
> @UDFType(deterministic
> = false)
>
>
> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map-and-reduce-side-in-hive/
>
> The effect is the query planner can see when such a UDF is in use and
> determine the value once at the start of a very long query."
>
> Essentially hive had a similar if not identical problem, during a long
> running distributed process like map/reduce some users wanted the semantics
> of:
>
> 1) Each call should have a new timestamps
>
> While other users wanted the semantics of:
>
> 2) Each call should generate the same timestamp
>
> The solution implemented was to add an annotation to udf such that the
> query planner would pick up the annotation and act accordingly.
>
> (Here is a related issue https://issues.apache.org/jira/browse/HIVE-1986
>
> As a result you can essentially implement two UDFS
>
> @UDFType(deterministic = false)
> public class UDFNow
>
> and for the other people
>
> @UDFType(deterministic = true)
> public class UDFNowOnce extends UDFNow
>
> Both user cases are met in a sensible way.
>
>


Re: Why does `now()` produce different times within the same query?

2016-12-01 Thread Sylvain Lebresne
On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo 
wrote:

>
> I am not sure you saw my reply on thread but I believe everyone's needs
> can be met I will copy that here:
>

I saw it, but the real problem that was raised initially was not that of
UDF and of allowing both behavior. It's a matter of people being confused
by the behavior of a non-UDF function, now(), and suggesting it should be
changed.

The Hive idea is interesting I guess, and we can switch to discussing that,
but it's a different problem really and I'm not a fond of derailing
threads. I will just note though that if we're not talking about a
confusion issue but rather how to get a timeuuid to be fixed within a
statement, then there is much much more trivial solution: generate it
client side. The `now()` function is a small convenience but there is
nothing you cannot do without it client side, and that actually basically
stands for almost any use of (non aggregate) function in Cassandra
currently.


>
>
> "Food for thought: Hive's UDFs introduced an annotation
> @UDFType(deterministic = false)
>
> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map
> -and-reduce-side-in-hive/
>
> The effect is the query planner can see when such a UDF is in use and
> determine the value once at the start of a very long query."
>
> Essentially hive had a similar if not identical problem, during a long
> running distributed process like map/reduce some users wanted the semantics
> of:
>
> 1) Each call should have a new timestamps
>
> While other users wanted the semantics of:
>
> 2) Each call should generate the same timestamp
>
> The solution implemented was to add an annotation to udf such that the
> query planner would pick up the annotation and act accordingly.
>
> (Here is a related issue https://issues.apache.org/jira/browse/HIVE-1986
>
> As a result you can essentially implement two UDFS
>
> @UDFType(deterministic = false)
> public class UDFNow
>
> and for the other people
>
> @UDFType(deterministic = true)
> public class UDFNowOnce extends UDFNow
>
> Both user cases are met in a sensible way.
>


Re: Why does `now()` produce different times within the same query?

2016-12-01 Thread Bruce Heath
Get Outlook for Android<https://aka.ms/ghei36>


From: Edward Capriolo <edlinuxg...@gmail.com>
Sent: Thursday, December 1, 2016 10:44:10 AM
To: user@cassandra.apache.org
Subject: Re: Why does `now()` produce different times within the same query?



On Thu, Dec 1, 2016 at 4:06 AM, Sylvain Lebresne 
<sylv...@datastax.com<mailto:sylv...@datastax.com>> wrote:
One can of course always open a JIRA, but I'm going to strongly disagree with a
change here (outside of a documentation one that is).

The now() function is a timeuuid generator, and it thus generates a unique
timeuuid on every call, as specified by the timeuuid spec. I'll note that
document lists it under "Timeuuid functions", and has sentences like
"the value returned by now() is guaranteed to be unique", so while I'm sure the
documentation can be further clarified, I think it's pretty clear it's not the
now() of SQL, and getting unique values on every call shouldn't be *that*
surprising.

Also, now() was primarily meant for use on timeuuid clustering columns for a
time-series like table, something like:
  CREATE TABLE ts (
k int,
t timeuuid,
v text,
PRIMARY KEY (k, t)
  )
and if you use it multiple times in a batch, this would look something like:
  BEGIN BATCH
INSERT INTO ts (k, t, v) VALUES (0, now(), 'foo');
INSERT INTO ts (k, t, v) VALUES (0, now(), 'bar');
  APPLY BATCH
and you definitively want that to insert 2 "events", not just one.

This is also why changing the behavior of this method *would* be a breaking
change.

Another reason this work the way it is is that functions in CQL are just that,
functions. Each execution is unique and they have no notion of being executed in
the same statement/batch/whatever. I actually think this is sensible, assuming
one stops being obsessed with what other databases that aren't Apache Cassandra
do.

I will note that Ben seems to suggest keeping the return of now() unique across
call while keeping the time component equals, thus varying the rest of the uuid
bytes. However:
 - I'm starting to wonder what this would buy us. Why would someone be super
   confused by the time changing across calls (in a single statement/batch), but
   be totally not confused by the actual full return to not be equal? And how is
   that actually useful: you're having different result anyway and you're
   letting the server pick the timestamp in the first place, so you're probably
   not caring about milliseconds precision of that timestamp in the first place.
 - This would basically be a violation of the timeuuid spec
 - This would be a big pain in the code and make of now() a special case
among functions. I'm unconvinced special cases are making things easier
in general.

So I'm all for improving the documentation if this confuses users due to
expectations (mistakenly) carried from prior experiences, and please
feel free to open a JIRA for that. I'm a lot less in agreement that there is
something wrong with the way the function behave in principle.

> I can see why this issue has been largely ignored and hasn't had a chance for
> the behaviour to be formally defined

Don't make too much assumptions. The behavior is perfectly well defined: now()
is a "normal" function and is evaluated whenever it's called according to the
timeuuid spec (or as close to it as we can make it).

On Thu, Dec 1, 2016 at 7:25 AM, Benjamin Roth 
<benjamin.r...@jaumo.com<mailto:benjamin.r...@jaumo.com>> wrote:

Great comment. +1

Am 01.12.2016 06:29 schrieb "Ben Bromhead" 
<b...@instaclustr.com<mailto:b...@instaclustr.com>>:
tl;dr +1 yup raise a jira to discuss how now() should behave in a single 
statement (and possible extend to batch statements).

The values of now should be the same if you assume that now() works like it 
does in relational databases such as postgres or mysql, however at the moment 
it instead works like sysdate() in mysql. Given that CQL is supposed to be SQL 
like, I think the assumption around the behaviour of now() was a fair one to 
make.

I definitely agree that raising a jira ticket would be a great place to discuss 
what the behaviour of now() should be for Cassandra. Personally I would be in 
favour of seeing the deterministic component (the actual time part) being the 
same across multiple calls in the one statement or multiple statements in a 
batch.

Cassandra documentation does not make any claims as to how now() works within a 
single statement and reading the code it shows the intent is to work like 
sysdate() from MySQL rather than now(). One of the identified dangers of making 
cql similar to sql is that, while yes it aids adoption, users will find that 
SQL like things don't behave as expected. Of course as a user, one shouldn't 
have to read the source code to determine correct behaviour.

Given that a timeuuid is made up of deterministic and 

Re: Why does `now()` produce different times within the same query?

2016-12-01 Thread Edward Capriolo
On Thu, Dec 1, 2016 at 4:06 AM, Sylvain Lebresne 
wrote:

> One can of course always open a JIRA, but I'm going to strongly disagree
> with a
> change here (outside of a documentation one that is).
>
> The now() function is a timeuuid generator, and it thus generates a unique
> timeuuid on every call, as specified by the timeuuid spec. I'll note that
> document lists it under "Timeuuid functions", and has sentences like
> "the value returned by now() is guaranteed to be unique", so while I'm
> sure the
> documentation can be further clarified, I think it's pretty clear it's not
> the
> now() of SQL, and getting unique values on every call shouldn't be *that*
> surprising.
>
> Also, now() was primarily meant for use on timeuuid clustering columns for
> a
> time-series like table, something like:
>   CREATE TABLE ts (
> k int,
> t timeuuid,
> v text,
> PRIMARY KEY (k, t)
>   )
> and if you use it multiple times in a batch, this would look something
> like:
>   BEGIN BATCH
> INSERT INTO ts (k, t, v) VALUES (0, now(), 'foo');
> INSERT INTO ts (k, t, v) VALUES (0, now(), 'bar');
>   APPLY BATCH
> and you definitively want that to insert 2 "events", not just one.
>
> This is also why changing the behavior of this method *would* be a breaking
> change.
>
> Another reason this work the way it is is that functions in CQL are just
> that,
> functions. Each execution is unique and they have no notion of being
> executed in
> the same statement/batch/whatever. I actually think this is sensible,
> assuming
> one stops being obsessed with what other databases that aren't Apache
> Cassandra
> do.
>
> I will note that Ben seems to suggest keeping the return of now() unique
> across
> call while keeping the time component equals, thus varying the rest of the
> uuid
> bytes. However:
>  - I'm starting to wonder what this would buy us. Why would someone be
> super
>confused by the time changing across calls (in a single
> statement/batch), but
>be totally not confused by the actual full return to not be equal? And
> how is
>that actually useful: you're having different result anyway and you're
>letting the server pick the timestamp in the first place, so you're
> probably
>not caring about milliseconds precision of that timestamp in the first
> place.
>  - This would basically be a violation of the timeuuid spec
>  - This would be a big pain in the code and make of now() a special case
> among functions. I'm unconvinced special cases are making things easier
> in general.
>
> So I'm all for improving the documentation if this confuses users due to
> expectations (mistakenly) carried from prior experiences, and please
> feel free to open a JIRA for that. I'm a lot less in agreement that there
> is
> something wrong with the way the function behave in principle.
>
> > I can see why this issue has been largely ignored and hasn't had a
> chance for
> > the behaviour to be formally defined
>
> Don't make too much assumptions. The behavior is perfectly well defined:
> now()
> is a "normal" function and is evaluated whenever it's called according to
> the
> timeuuid spec (or as close to it as we can make it).
>
> On Thu, Dec 1, 2016 at 7:25 AM, Benjamin Roth 
> wrote:
>
>> Great comment. +1
>>
>> Am 01.12.2016 06:29 schrieb "Ben Bromhead" :
>>
>>> tl;dr +1 yup raise a jira to discuss how now() should behave in a single
>>> statement (and possible extend to batch statements).
>>>
>>> The values of now should be the same if you assume that now() works like
>>> it does in relational databases such as postgres or mysql, however at the
>>> moment it instead works like sysdate() in mysql. Given that CQL is supposed
>>> to be SQL like, I think the assumption around the behaviour of now() was a
>>> fair one to make.
>>>
>>> I definitely agree that raising a jira ticket would be a great place to
>>> discuss what the behaviour of now() should be for Cassandra. Personally I
>>> would be in favour of seeing the deterministic component (the actual time
>>> part) being the same across multiple calls in the one statement or multiple
>>> statements in a batch.
>>>
>>> Cassandra documentation does not make any claims as to how now() works
>>> within a single statement and reading the code it shows the intent is to
>>> work like sysdate() from MySQL rather than now(). One of the identified
>>> dangers of making cql similar to sql is that, while yes it aids adoption,
>>> users will find that SQL like things don't behave as expected. Of course as
>>> a user, one shouldn't have to read the source code to determine correct
>>> behaviour.
>>>
>>> Given that a timeuuid is made up of deterministic and (pseudo)
>>> non-deterministic components I can see why this issue has been largely
>>> ignored and hasn't had a chance for the behaviour to be formally defined
>>> (you would expect now to return the same time in the one statement despite

Re: Why does `now()` produce different times within the same query?

2016-12-01 Thread Sylvain Lebresne
One can of course always open a JIRA, but I'm going to strongly disagree
with a
change here (outside of a documentation one that is).

The now() function is a timeuuid generator, and it thus generates a unique
timeuuid on every call, as specified by the timeuuid spec. I'll note that
document lists it under "Timeuuid functions", and has sentences like
"the value returned by now() is guaranteed to be unique", so while I'm sure
the
documentation can be further clarified, I think it's pretty clear it's not
the
now() of SQL, and getting unique values on every call shouldn't be *that*
surprising.

Also, now() was primarily meant for use on timeuuid clustering columns for a
time-series like table, something like:
  CREATE TABLE ts (
k int,
t timeuuid,
v text,
PRIMARY KEY (k, t)
  )
and if you use it multiple times in a batch, this would look something like:
  BEGIN BATCH
INSERT INTO ts (k, t, v) VALUES (0, now(), 'foo');
INSERT INTO ts (k, t, v) VALUES (0, now(), 'bar');
  APPLY BATCH
and you definitively want that to insert 2 "events", not just one.

This is also why changing the behavior of this method *would* be a breaking
change.

Another reason this work the way it is is that functions in CQL are just
that,
functions. Each execution is unique and they have no notion of being
executed in
the same statement/batch/whatever. I actually think this is sensible,
assuming
one stops being obsessed with what other databases that aren't Apache
Cassandra
do.

I will note that Ben seems to suggest keeping the return of now() unique
across
call while keeping the time component equals, thus varying the rest of the
uuid
bytes. However:
 - I'm starting to wonder what this would buy us. Why would someone be super
   confused by the time changing across calls (in a single
statement/batch), but
   be totally not confused by the actual full return to not be equal? And
how is
   that actually useful: you're having different result anyway and you're
   letting the server pick the timestamp in the first place, so you're
probably
   not caring about milliseconds precision of that timestamp in the first
place.
 - This would basically be a violation of the timeuuid spec
 - This would be a big pain in the code and make of now() a special case
among functions. I'm unconvinced special cases are making things easier
in general.

So I'm all for improving the documentation if this confuses users due to
expectations (mistakenly) carried from prior experiences, and please
feel free to open a JIRA for that. I'm a lot less in agreement that there is
something wrong with the way the function behave in principle.

> I can see why this issue has been largely ignored and hasn't had a chance
for
> the behaviour to be formally defined

Don't make too much assumptions. The behavior is perfectly well defined:
now()
is a "normal" function and is evaluated whenever it's called according to
the
timeuuid spec (or as close to it as we can make it).

On Thu, Dec 1, 2016 at 7:25 AM, Benjamin Roth 
wrote:

> Great comment. +1
>
> Am 01.12.2016 06:29 schrieb "Ben Bromhead" :
>
>> tl;dr +1 yup raise a jira to discuss how now() should behave in a single
>> statement (and possible extend to batch statements).
>>
>> The values of now should be the same if you assume that now() works like
>> it does in relational databases such as postgres or mysql, however at the
>> moment it instead works like sysdate() in mysql. Given that CQL is supposed
>> to be SQL like, I think the assumption around the behaviour of now() was a
>> fair one to make.
>>
>> I definitely agree that raising a jira ticket would be a great place to
>> discuss what the behaviour of now() should be for Cassandra. Personally I
>> would be in favour of seeing the deterministic component (the actual time
>> part) being the same across multiple calls in the one statement or multiple
>> statements in a batch.
>>
>> Cassandra documentation does not make any claims as to how now() works
>> within a single statement and reading the code it shows the intent is to
>> work like sysdate() from MySQL rather than now(). One of the identified
>> dangers of making cql similar to sql is that, while yes it aids adoption,
>> users will find that SQL like things don't behave as expected. Of course as
>> a user, one shouldn't have to read the source code to determine correct
>> behaviour.
>>
>> Given that a timeuuid is made up of deterministic and (pseudo)
>> non-deterministic components I can see why this issue has been largely
>> ignored and hasn't had a chance for the behaviour to be formally defined
>> (you would expect now to return the same time in the one statement despite
>> multiple calls, but you wouldn't expect the same behaviour for say a call
>> to rand()).
>>
>>
>>
>>
>>
>>
>>
>> On Wed, 30 Nov 2016 at 19:54 Cody Yancey  wrote:
>>
>>> This is not a bug, and in fact changing it would be a serious bug.
>>>
>>> 

Re: Why does `now()` produce different times within the same query?

2016-11-30 Thread Benjamin Roth
Great comment. +1

Am 01.12.2016 06:29 schrieb "Ben Bromhead" :

> tl;dr +1 yup raise a jira to discuss how now() should behave in a single
> statement (and possible extend to batch statements).
>
> The values of now should be the same if you assume that now() works like
> it does in relational databases such as postgres or mysql, however at the
> moment it instead works like sysdate() in mysql. Given that CQL is supposed
> to be SQL like, I think the assumption around the behaviour of now() was a
> fair one to make.
>
> I definitely agree that raising a jira ticket would be a great place to
> discuss what the behaviour of now() should be for Cassandra. Personally I
> would be in favour of seeing the deterministic component (the actual time
> part) being the same across multiple calls in the one statement or multiple
> statements in a batch.
>
> Cassandra documentation does not make any claims as to how now() works
> within a single statement and reading the code it shows the intent is to
> work like sysdate() from MySQL rather than now(). One of the identified
> dangers of making cql similar to sql is that, while yes it aids adoption,
> users will find that SQL like things don't behave as expected. Of course as
> a user, one shouldn't have to read the source code to determine correct
> behaviour.
>
> Given that a timeuuid is made up of deterministic and (pseudo)
> non-deterministic components I can see why this issue has been largely
> ignored and hasn't had a chance for the behaviour to be formally defined
> (you would expect now to return the same time in the one statement despite
> multiple calls, but you wouldn't expect the same behaviour for say a call
> to rand()).
>
>
>
>
>
>
>
> On Wed, 30 Nov 2016 at 19:54 Cody Yancey  wrote:
>
>> This is not a bug, and in fact changing it would be a serious bug.
>>
>> False. Absolutely no consumer would be broken by a change to guarantee an
>> identical time component that isn't broken already, for the simple reason
>> your code already has to handle that case, as it is in fact the majority
>> case RIGHT NOW. Users can hit this bug, in production, because unit tests
>> might not experienced it! The time component should be the time that the
>> command was processed by the coordinator node.
>>
>>  would one expect a java/py/bash script that loops
>>
>> Individual Cassandra writes (which is what OP is referring to
>> specifically) are not loops. They are in almost every case atomic
>> operations that either succeed completely or fail completely. Allowing a
>> single atomic operation to witness multiple times in these corner cases is
>> not only surprising, as this thread demonstrates, it is also needlessly
>> restricting to what developers can use the database for, and provides NO
>> BENEFIT.
>>
>> Calling now PRIOR to initiating multiple inserts is in most cases
>> exactly what one does...the ONLY practice is to set the value before
>> initiating the sequence of calls
>>
>> Also false. Cassandra does not have a way of doing this on the
>> coordinator node rather than the client device, and as I already showed,
>> the client device is the wrong place to do it in situations where
>> guaranteeing bounded clock-skew actually makes a difference one way or the
>> other.
>>
>> Thanks,
>> Cody
>>
>>
>>
>> On Wed, Nov 30, 2016 at 8:02 PM, daemeon reiydelle 
>> wrote:
>>
>> This is not a bug, and in fact changing it would be a serious bug.
>>
>> What it is is a wonderful case of bad coding: would one expect a
>> java/py/bash script that loops on a bunch of read/execut/update calls where
>> each iteration calls time to return the same exact time for the duration of
>> the execution of the code? Whether the code runs for 5 seconds or 5 hours?
>>
>> Every call to a system call is unique, including within C*. Calling now
>> PRIOR to initiating multiple inserts is in most cases exactly what one does
>> to assure unique time stamps FOR THE BATCH OF INSERTS. To get a nearly
>> identical system time as would be the uuid of the row, one tries to call
>> time as close to just before the insert as possible. Then repeat.
>>
>> You have a logic issue in your code. If you want the same value for a set
>> of calls, the ONLY practice is to set the value before initiating the
>> sequence of calls.
>>
>>
>>
>> *...*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>
>> On Wed, Nov 30, 2016 at 6:16 PM, Cody Yancey  wrote:
>>
>> Getting the same TimeUUID values might be a major problem. Getting two
>> different TimeUUIDs that at least have time component would not be a major
>> problem as this is the main case today. Getting different time components
>> is actually the corner case, and it is a corner case that breaks
>> Internet-of-Things applications. We can tightly control clock skew in our
>> cluster. We most 

Re: Why does `now()` produce different times within the same query?

2016-11-30 Thread Ben Bromhead
tl;dr +1 yup raise a jira to discuss how now() should behave in a single
statement (and possible extend to batch statements).

The values of now should be the same if you assume that now() works like it
does in relational databases such as postgres or mysql, however at the
moment it instead works like sysdate() in mysql. Given that CQL is supposed
to be SQL like, I think the assumption around the behaviour of now() was a
fair one to make.

I definitely agree that raising a jira ticket would be a great place to
discuss what the behaviour of now() should be for Cassandra. Personally I
would be in favour of seeing the deterministic component (the actual time
part) being the same across multiple calls in the one statement or multiple
statements in a batch.

Cassandra documentation does not make any claims as to how now() works
within a single statement and reading the code it shows the intent is to
work like sysdate() from MySQL rather than now(). One of the identified
dangers of making cql similar to sql is that, while yes it aids adoption,
users will find that SQL like things don't behave as expected. Of course as
a user, one shouldn't have to read the source code to determine correct
behaviour.

Given that a timeuuid is made up of deterministic and (pseudo)
non-deterministic components I can see why this issue has been largely
ignored and hasn't had a chance for the behaviour to be formally defined
(you would expect now to return the same time in the one statement despite
multiple calls, but you wouldn't expect the same behaviour for say a call
to rand()).







On Wed, 30 Nov 2016 at 19:54 Cody Yancey  wrote:

> This is not a bug, and in fact changing it would be a serious bug.
>
> False. Absolutely no consumer would be broken by a change to guarantee an
> identical time component that isn't broken already, for the simple reason
> your code already has to handle that case, as it is in fact the majority
> case RIGHT NOW. Users can hit this bug, in production, because unit tests
> might not experienced it! The time component should be the time that the
> command was processed by the coordinator node.
>
>  would one expect a java/py/bash script that loops
>
> Individual Cassandra writes (which is what OP is referring to
> specifically) are not loops. They are in almost every case atomic
> operations that either succeed completely or fail completely. Allowing a
> single atomic operation to witness multiple times in these corner cases is
> not only surprising, as this thread demonstrates, it is also needlessly
> restricting to what developers can use the database for, and provides NO
> BENEFIT.
>
> Calling now PRIOR to initiating multiple inserts is in most cases
> exactly what one does...the ONLY practice is to set the value before
> initiating the sequence of calls
>
> Also false. Cassandra does not have a way of doing this on the coordinator
> node rather than the client device, and as I already showed, the client
> device is the wrong place to do it in situations where guaranteeing bounded
> clock-skew actually makes a difference one way or the other.
>
> Thanks,
> Cody
>
>
>
> On Wed, Nov 30, 2016 at 8:02 PM, daemeon reiydelle 
> wrote:
>
> This is not a bug, and in fact changing it would be a serious bug.
>
> What it is is a wonderful case of bad coding: would one expect a
> java/py/bash script that loops on a bunch of read/execut/update calls where
> each iteration calls time to return the same exact time for the duration of
> the execution of the code? Whether the code runs for 5 seconds or 5 hours?
>
> Every call to a system call is unique, including within C*. Calling now
> PRIOR to initiating multiple inserts is in most cases exactly what one does
> to assure unique time stamps FOR THE BATCH OF INSERTS. To get a nearly
> identical system time as would be the uuid of the row, one tries to call
> time as close to just before the insert as possible. Then repeat.
>
> You have a logic issue in your code. If you want the same value for a set
> of calls, the ONLY practice is to set the value before initiating the
> sequence of calls.
>
>
>
> *...*
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>
> On Wed, Nov 30, 2016 at 6:16 PM, Cody Yancey  wrote:
>
> Getting the same TimeUUID values might be a major problem. Getting two
> different TimeUUIDs that at least have time component would not be a major
> problem as this is the main case today. Getting different time components
> is actually the corner case, and it is a corner case that breaks
> Internet-of-Things applications. We can tightly control clock skew in our
> cluster. We most definitely CANNOT control clock skew on the thousands of
> sensors that write to our cluster.
>
> Thanks,
> Cody
>
> On Wed, Nov 30, 2016 at 5:33 PM, Robert Wille  wrote:
>
> In my opinion, this is not broken 

Re: Why does `now()` produce different times within the same query?

2016-11-30 Thread Edward Capriolo
On Wed, Nov 30, 2016 at 10:53 PM, Cody Yancey  wrote:

> This is not a bug, and in fact changing it would be a serious bug.
>
> False. Absolutely no consumer would be broken by a change to guarantee an
> identical time component that isn't broken already, for the simple reason
> your code already has to handle that case, as it is in fact the majority
> case RIGHT NOW. Users can hit this bug, in production, because unit tests
> might not experienced it! The time component should be the time that the
> command was processed by the coordinator node.
>
>  would one expect a java/py/bash script that loops
>
> Individual Cassandra writes (which is what OP is referring to
> specifically) are not loops. They are in almost every case atomic
> operations that either succeed completely or fail completely. Allowing a
> single atomic operation to witness multiple times in these corner cases is
> not only surprising, as this thread demonstrates, it is also needlessly
> restricting to what developers can use the database for, and provides NO
> BENEFIT.
>
> Calling now PRIOR to initiating multiple inserts is in most cases
> exactly what one does...the ONLY practice is to set the value before
> initiating the sequence of calls
>
> Also false. Cassandra does not have a way of doing this on the coordinator
> node rather than the client device, and as I already showed, the client
> device is the wrong place to do it in situations where guaranteeing bounded
> clock-skew actually makes a difference one way or the other.
>
> Thanks,
> Cody
>
>
>
> On Wed, Nov 30, 2016 at 8:02 PM, daemeon reiydelle 
> wrote:
>
>> This is not a bug, and in fact changing it would be a serious bug.
>>
>> What it is is a wonderful case of bad coding: would one expect a
>> java/py/bash script that loops on a bunch of read/execut/update calls where
>> each iteration calls time to return the same exact time for the duration of
>> the execution of the code? Whether the code runs for 5 seconds or 5 hours?
>>
>> Every call to a system call is unique, including within C*. Calling now
>> PRIOR to initiating multiple inserts is in most cases exactly what one does
>> to assure unique time stamps FOR THE BATCH OF INSERTS. To get a nearly
>> identical system time as would be the uuid of the row, one tries to call
>> time as close to just before the insert as possible. Then repeat.
>>
>> You have a logic issue in your code. If you want the same value for a set
>> of calls, the ONLY practice is to set the value before initiating the
>> sequence of calls.
>>
>>
>>
>> *...*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>
>> On Wed, Nov 30, 2016 at 6:16 PM, Cody Yancey  wrote:
>>
>>> Getting the same TimeUUID values might be a major problem. Getting two
>>> different TimeUUIDs that at least have time component would not be a major
>>> problem as this is the main case today. Getting different time components
>>> is actually the corner case, and it is a corner case that breaks
>>> Internet-of-Things applications. We can tightly control clock skew in our
>>> cluster. We most definitely CANNOT control clock skew on the thousands of
>>> sensors that write to our cluster.
>>>
>>> Thanks,
>>> Cody
>>>
>>> On Wed, Nov 30, 2016 at 5:33 PM, Robert Wille  wrote:
>>>
 In my opinion, this is not broken and “fixing” it would break existing
 code. Consider a batch that includes multiple inserts, each of which
 inserts the value returned by now(). Getting the same UUID for each insert
 would be a major problem.

 Cheers

 Robert


 On Nov 30, 2016, at 4:46 PM, Todd Fast 
 wrote:

 FWIW I'd suggest opening a bug--this behavior is certainly quite
 unexpected and more than just a documentation issue. In general I can't
 imagine any desirable properties of the current implementation, and there
 are likely a bunch of latent bugs sitting out there, so it should be fixed.

 Todd

 On Wed, Nov 30, 2016 at 12:37 PM Terry Liu  wrote:

> Sorry for my typo. Obviously, I meant:
> "It appears that a single query that calls Cassandra's`now()` time
> function *multiple times *may actually cause a query to write or
> return different times."
>
> Less of a surprise now that I realize more about the implementation,
> but I agree that more explicit documentation around when exactly the
> "execution" of each now() statement happens and what implications it has
> for the resulting timestamps would be helpful when running into this.
>
> Thanks for the quick responses!
>
> -Terry
>
>
>
> On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek 
> wrote:
>
> every now() call in statement is under the hood 

Re: Why does `now()` produce different times within the same query?

2016-11-30 Thread Cody Yancey
This is not a bug, and in fact changing it would be a serious bug.

False. Absolutely no consumer would be broken by a change to guarantee an
identical time component that isn't broken already, for the simple reason
your code already has to handle that case, as it is in fact the majority
case RIGHT NOW. Users can hit this bug, in production, because unit tests
might not experienced it! The time component should be the time that the
command was processed by the coordinator node.

 would one expect a java/py/bash script that loops

Individual Cassandra writes (which is what OP is referring to specifically)
are not loops. They are in almost every case atomic operations that either
succeed completely or fail completely. Allowing a single atomic operation
to witness multiple times in these corner cases is not only surprising, as
this thread demonstrates, it is also needlessly restricting to what
developers can use the database for, and provides NO BENEFIT.

Calling now PRIOR to initiating multiple inserts is in most cases
exactly what one does...the ONLY practice is to set the value before
initiating the sequence of calls

Also false. Cassandra does not have a way of doing this on the coordinator
node rather than the client device, and as I already showed, the client
device is the wrong place to do it in situations where guaranteeing bounded
clock-skew actually makes a difference one way or the other.

Thanks,
Cody



On Wed, Nov 30, 2016 at 8:02 PM, daemeon reiydelle 
wrote:

> This is not a bug, and in fact changing it would be a serious bug.
>
> What it is is a wonderful case of bad coding: would one expect a
> java/py/bash script that loops on a bunch of read/execut/update calls where
> each iteration calls time to return the same exact time for the duration of
> the execution of the code? Whether the code runs for 5 seconds or 5 hours?
>
> Every call to a system call is unique, including within C*. Calling now
> PRIOR to initiating multiple inserts is in most cases exactly what one does
> to assure unique time stamps FOR THE BATCH OF INSERTS. To get a nearly
> identical system time as would be the uuid of the row, one tries to call
> time as close to just before the insert as possible. Then repeat.
>
> You have a logic issue in your code. If you want the same value for a set
> of calls, the ONLY practice is to set the value before initiating the
> sequence of calls.
>
>
>
> *...*
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>
> On Wed, Nov 30, 2016 at 6:16 PM, Cody Yancey  wrote:
>
>> Getting the same TimeUUID values might be a major problem. Getting two
>> different TimeUUIDs that at least have time component would not be a major
>> problem as this is the main case today. Getting different time components
>> is actually the corner case, and it is a corner case that breaks
>> Internet-of-Things applications. We can tightly control clock skew in our
>> cluster. We most definitely CANNOT control clock skew on the thousands of
>> sensors that write to our cluster.
>>
>> Thanks,
>> Cody
>>
>> On Wed, Nov 30, 2016 at 5:33 PM, Robert Wille  wrote:
>>
>>> In my opinion, this is not broken and “fixing” it would break existing
>>> code. Consider a batch that includes multiple inserts, each of which
>>> inserts the value returned by now(). Getting the same UUID for each insert
>>> would be a major problem.
>>>
>>> Cheers
>>>
>>> Robert
>>>
>>>
>>> On Nov 30, 2016, at 4:46 PM, Todd Fast 
>>> wrote:
>>>
>>> FWIW I'd suggest opening a bug--this behavior is certainly quite
>>> unexpected and more than just a documentation issue. In general I can't
>>> imagine any desirable properties of the current implementation, and there
>>> are likely a bunch of latent bugs sitting out there, so it should be fixed.
>>>
>>> Todd
>>>
>>> On Wed, Nov 30, 2016 at 12:37 PM Terry Liu  wrote:
>>>
 Sorry for my typo. Obviously, I meant:
 "It appears that a single query that calls Cassandra's`now()` time
 function *multiple times *may actually cause a query to write or
 return different times."

 Less of a surprise now that I realize more about the implementation,
 but I agree that more explicit documentation around when exactly the
 "execution" of each now() statement happens and what implications it has
 for the resulting timestamps would be helpful when running into this.

 Thanks for the quick responses!

 -Terry



 On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek 
 wrote:

 every now() call in statement is under the hood "replaced" with newly
 generated uuid.

 It can happen that they belong to  different milliseconds in time.

 If you need to have same timestamps you need to set them on the client
 side.



Re: Why does `now()` produce different times within the same query?

2016-11-30 Thread daemeon reiydelle
This is not a bug, and in fact changing it would be a serious bug.

What it is is a wonderful case of bad coding: would one expect a
java/py/bash script that loops on a bunch of read/execut/update calls where
each iteration calls time to return the same exact time for the duration of
the execution of the code? Whether the code runs for 5 seconds or 5 hours?

Every call to a system call is unique, including within C*. Calling now
PRIOR to initiating multiple inserts is in most cases exactly what one does
to assure unique time stamps FOR THE BATCH OF INSERTS. To get a nearly
identical system time as would be the uuid of the row, one tries to call
time as close to just before the insert as possible. Then repeat.

You have a logic issue in your code. If you want the same value for a set
of calls, the ONLY practice is to set the value before initiating the
sequence of calls.



*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Nov 30, 2016 at 6:16 PM, Cody Yancey  wrote:

> Getting the same TimeUUID values might be a major problem. Getting two
> different TimeUUIDs that at least have time component would not be a major
> problem as this is the main case today. Getting different time components
> is actually the corner case, and it is a corner case that breaks
> Internet-of-Things applications. We can tightly control clock skew in our
> cluster. We most definitely CANNOT control clock skew on the thousands of
> sensors that write to our cluster.
>
> Thanks,
> Cody
>
> On Wed, Nov 30, 2016 at 5:33 PM, Robert Wille  wrote:
>
>> In my opinion, this is not broken and “fixing” it would break existing
>> code. Consider a batch that includes multiple inserts, each of which
>> inserts the value returned by now(). Getting the same UUID for each insert
>> would be a major problem.
>>
>> Cheers
>>
>> Robert
>>
>>
>> On Nov 30, 2016, at 4:46 PM, Todd Fast  wrote:
>>
>> FWIW I'd suggest opening a bug--this behavior is certainly quite
>> unexpected and more than just a documentation issue. In general I can't
>> imagine any desirable properties of the current implementation, and there
>> are likely a bunch of latent bugs sitting out there, so it should be fixed.
>>
>> Todd
>>
>> On Wed, Nov 30, 2016 at 12:37 PM Terry Liu  wrote:
>>
>>> Sorry for my typo. Obviously, I meant:
>>> "It appears that a single query that calls Cassandra's`now()` time
>>> function *multiple times *may actually cause a query to write or return
>>> different times."
>>>
>>> Less of a surprise now that I realize more about the implementation, but
>>> I agree that more explicit documentation around when exactly the
>>> "execution" of each now() statement happens and what implications it has
>>> for the resulting timestamps would be helpful when running into this.
>>>
>>> Thanks for the quick responses!
>>>
>>> -Terry
>>>
>>>
>>>
>>> On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek 
>>> wrote:
>>>
>>> every now() call in statement is under the hood "replaced" with newly
>>> generated uuid.
>>>
>>> It can happen that they belong to  different milliseconds in time.
>>>
>>> If you need to have same timestamps you need to set them on the client
>>> side.
>>>
>>>
>>> @msvaljek 
>>>
>>> 2016-11-29 22:49 GMT+01:00 Terry Liu :
>>>
>>> It appears that a single query that calls Cassandra's `now()` time
>>> function may actually cause a query to write or return different times.
>>>
>>> Is this the expected or defined behavior, and if so, why does it behave
>>> like this rather than evaluating `now()` once across an entire statement?
>>>
>>> This really affects UPDATE statements but to test it more easily, you
>>> could try something like:
>>>
>>> SELECT toTimestamp(now()) as a, toTimestamp(now()) as b
>>> FROM keyspace.table
>>> LIMIT 100;
>>>
>>> If you run that a few times, you should eventually see that the
>>> timestamp returned moves onto the next millisecond mid-query.
>>>
>>> --
>>> *Software Engineer*
>>> Turnitin - http://www.turnitin.com
>>> t...@turnitin.com
>>>
>>>
>>>
>>>
>>>
>>> --
>>> *Software Engineer*
>>> Turnitin - http://www.turnitin.com
>>> t...@turnitin.com
>>>
>>
>>
>


Re: Why does `now()` produce different times within the same query?

2016-11-30 Thread Cody Yancey
Getting the same TimeUUID values might be a major problem. Getting two
different TimeUUIDs that at least have time component would not be a major
problem as this is the main case today. Getting different time components
is actually the corner case, and it is a corner case that breaks
Internet-of-Things applications. We can tightly control clock skew in our
cluster. We most definitely CANNOT control clock skew on the thousands of
sensors that write to our cluster.

Thanks,
Cody

On Wed, Nov 30, 2016 at 5:33 PM, Robert Wille  wrote:

> In my opinion, this is not broken and “fixing” it would break existing
> code. Consider a batch that includes multiple inserts, each of which
> inserts the value returned by now(). Getting the same UUID for each insert
> would be a major problem.
>
> Cheers
>
> Robert
>
>
> On Nov 30, 2016, at 4:46 PM, Todd Fast  wrote:
>
> FWIW I'd suggest opening a bug--this behavior is certainly quite
> unexpected and more than just a documentation issue. In general I can't
> imagine any desirable properties of the current implementation, and there
> are likely a bunch of latent bugs sitting out there, so it should be fixed.
>
> Todd
>
> On Wed, Nov 30, 2016 at 12:37 PM Terry Liu  wrote:
>
>> Sorry for my typo. Obviously, I meant:
>> "It appears that a single query that calls Cassandra's`now()` time
>> function *multiple times *may actually cause a query to write or return
>> different times."
>>
>> Less of a surprise now that I realize more about the implementation, but
>> I agree that more explicit documentation around when exactly the
>> "execution" of each now() statement happens and what implications it has
>> for the resulting timestamps would be helpful when running into this.
>>
>> Thanks for the quick responses!
>>
>> -Terry
>>
>>
>>
>> On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek 
>> wrote:
>>
>> every now() call in statement is under the hood "replaced" with newly
>> generated uuid.
>>
>> It can happen that they belong to  different milliseconds in time.
>>
>> If you need to have same timestamps you need to set them on the client
>> side.
>>
>>
>> @msvaljek 
>>
>> 2016-11-29 22:49 GMT+01:00 Terry Liu :
>>
>> It appears that a single query that calls Cassandra's `now()` time
>> function may actually cause a query to write or return different times.
>>
>> Is this the expected or defined behavior, and if so, why does it behave
>> like this rather than evaluating `now()` once across an entire statement?
>>
>> This really affects UPDATE statements but to test it more easily, you
>> could try something like:
>>
>> SELECT toTimestamp(now()) as a, toTimestamp(now()) as b
>> FROM keyspace.table
>> LIMIT 100;
>>
>> If you run that a few times, you should eventually see that the timestamp
>> returned moves onto the next millisecond mid-query.
>>
>> --
>> *Software Engineer*
>> Turnitin - http://www.turnitin.com
>> t...@turnitin.com
>>
>>
>>
>>
>>
>> --
>> *Software Engineer*
>> Turnitin - http://www.turnitin.com
>> t...@turnitin.com
>>
>
>


Re: Why does `now()` produce different times within the same query?

2016-11-30 Thread Robert Wille
In my opinion, this is not broken and “fixing” it would break existing code. 
Consider a batch that includes multiple inserts, each of which inserts the 
value returned by now(). Getting the same UUID for each insert would be a major 
problem.

Cheers

Robert

On Nov 30, 2016, at 4:46 PM, Todd Fast 
> wrote:

FWIW I'd suggest opening a bug--this behavior is certainly quite unexpected and 
more than just a documentation issue. In general I can't imagine any desirable 
properties of the current implementation, and there are likely a bunch of 
latent bugs sitting out there, so it should be fixed.

Todd

On Wed, Nov 30, 2016 at 12:37 PM Terry Liu 
> wrote:
Sorry for my typo. Obviously, I meant:
"It appears that a single query that calls Cassandra's`now()` time function 
multiple times may actually cause a query to write or return different times."

Less of a surprise now that I realize more about the implementation, but I 
agree that more explicit documentation around when exactly the "execution" of 
each now() statement happens and what implications it has for the resulting 
timestamps would be helpful when running into this.

Thanks for the quick responses!

-Terry



On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek 
> wrote:
every now() call in statement is under the hood "replaced" with newly generated 
uuid.

It can happen that they belong to  different milliseconds in time.

If you need to have same timestamps you need to set them on the client side.


@msvaljek

2016-11-29 22:49 GMT+01:00 Terry Liu 
>:
It appears that a single query that calls Cassandra's `now()` time function may 
actually cause a query to write or return different times.

Is this the expected or defined behavior, and if so, why does it behave like 
this rather than evaluating `now()` once across an entire statement?

This really affects UPDATE statements but to test it more easily, you could try 
something like:

SELECT toTimestamp(now()) as a, toTimestamp(now()) as b
FROM keyspace.table
LIMIT 100;

If you run that a few times, you should eventually see that the timestamp 
returned moves onto the next millisecond mid-query.

--
Software Engineer
Turnitin - http://www.turnitin.com
t...@turnitin.com




--
Software Engineer
Turnitin - http://www.turnitin.com
t...@turnitin.com



Re: Why does `now()` produce different times within the same query?

2016-11-30 Thread Todd Fast
FWIW I'd suggest opening a bug--this behavior is certainly quite unexpected
and more than just a documentation issue. In general I can't imagine any
desirable properties of the current implementation, and there are likely a
bunch of latent bugs sitting out there, so it should be fixed.

Todd

On Wed, Nov 30, 2016 at 12:37 PM Terry Liu  wrote:

> Sorry for my typo. Obviously, I meant:
> "It appears that a single query that calls Cassandra's`now()` time
> function *multiple times *may actually cause a query to write or return
> different times."
>
> Less of a surprise now that I realize more about the implementation, but I
> agree that more explicit documentation around when exactly the "execution"
> of each now() statement happens and what implications it has for the
> resulting timestamps would be helpful when running into this.
>
> Thanks for the quick responses!
>
> -Terry
>
>
>
> On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek  wrote:
>
> every now() call in statement is under the hood "replaced" with newly
> generated uuid.
>
> It can happen that they belong to  different milliseconds in time.
>
> If you need to have same timestamps you need to set them on the client
> side.
>
>
> @msvaljek 
>
> 2016-11-29 22:49 GMT+01:00 Terry Liu :
>
> It appears that a single query that calls Cassandra's `now()` time
> function may actually cause a query to write or return different times.
>
> Is this the expected or defined behavior, and if so, why does it behave
> like this rather than evaluating `now()` once across an entire statement?
>
> This really affects UPDATE statements but to test it more easily, you
> could try something like:
>
> SELECT toTimestamp(now()) as a, toTimestamp(now()) as b
> FROM keyspace.table
> LIMIT 100;
>
> If you run that a few times, you should eventually see that the timestamp
> returned moves onto the next millisecond mid-query.
>
> --
> *Software Engineer*
> Turnitin - http://www.turnitin.com
> t...@turnitin.com
>
>
>
>
>
> --
> *Software Engineer*
> Turnitin - http://www.turnitin.com
> t...@turnitin.com
>


Re: Why does `now()` produce different times within the same query?

2016-11-30 Thread Terry Liu
Sorry for my typo. Obviously, I meant:
"It appears that a single query that calls Cassandra's`now()` time
function *multiple
times *may actually cause a query to write or return different times."

Less of a surprise now that I realize more about the implementation, but I
agree that more explicit documentation around when exactly the "execution"
of each now() statement happens and what implications it has for the
resulting timestamps would be helpful when running into this.

Thanks for the quick responses!

-Terry



On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek  wrote:

> every now() call in statement is under the hood "replaced" with newly
> generated uuid.
>
> It can happen that they belong to  different milliseconds in time.
>
> If you need to have same timestamps you need to set them on the client
> side.
>
>
> @msvaljek 
>
> 2016-11-29 22:49 GMT+01:00 Terry Liu :
>
>> It appears that a single query that calls Cassandra's `now()` time
>> function may actually cause a query to write or return different times.
>>
>> Is this the expected or defined behavior, and if so, why does it behave
>> like this rather than evaluating `now()` once across an entire statement?
>>
>> This really affects UPDATE statements but to test it more easily, you
>> could try something like:
>>
>> SELECT toTimestamp(now()) as a, toTimestamp(now()) as b
>> FROM keyspace.table
>> LIMIT 100;
>>
>> If you run that a few times, you should eventually see that the timestamp
>> returned moves onto the next millisecond mid-query.
>>
>> --
>> *Software Engineer*
>> Turnitin - http://www.turnitin.com
>> t...@turnitin.com
>>
>
>


-- 
*Software Engineer*
Turnitin - http://www.turnitin.com
t...@turnitin.com


Re: Why does `now()` produce different times within the same query?

2016-11-29 Thread Marko Švaljek
every now() call in statement is under the hood "replaced" with newly
generated uuid.

It can happen that they belong to  different milliseconds in time.

If you need to have same timestamps you need to set them on the client side.


@msvaljek 

2016-11-29 22:49 GMT+01:00 Terry Liu :

> It appears that a single query that calls Cassandra's `now()` time
> function may actually cause a query to write or return different times.
>
> Is this the expected or defined behavior, and if so, why does it behave
> like this rather than evaluating `now()` once across an entire statement?
>
> This really affects UPDATE statements but to test it more easily, you
> could try something like:
>
> SELECT toTimestamp(now()) as a, toTimestamp(now()) as b
> FROM keyspace.table
> LIMIT 100;
>
> If you run that a few times, you should eventually see that the timestamp
> returned moves onto the next millisecond mid-query.
>
> --
> *Software Engineer*
> Turnitin - http://www.turnitin.com
> t...@turnitin.com
>


Re: Why does `now()` produce different times within the same query?

2016-11-29 Thread Ariel Weisberg
Hi,



The function is defined here[1]. I hope my email client isn't
butchering the code.


public static final Function *nowFct *= new NativeScalarFunction("now",
TimeUUIDType.*instance*)
{


public ByteBuffer execute(ProtocolVersion protocolVersion,
List parameters)
{


return ByteBuffer.*wrap*(UUIDGen.*getTimeUUIDBytes*());
}


};




It's documented as
http://cassandra.apache.org/doc/latest/cql/functions.html#timeuuid-functions:
> The now function takes no arguments and generates, on the coordinator
> node, a new unique timeuuid (at the time where the statement using it
> is executed).
> Now is the behavior consistent with the documentation? Well it depends
> on how you define statement (CQL statement, or function call) I
> suppose. I do think the doc needs to be updated because in terms of
> principle of least surprise yes this is a little surprising.


I know of a couple of systems that associate a timestamp with each
transaction and will always return the same time when you request the
current time. However you aren't requesting the current time you are
requesting a UUID using a function named now().  I think we are stuck
with the behavior and need an additional function that does what would
expect from a function named now().


As a work around you can pass the time in as a parameter and then you
can guarantee it will be the same in each position.


There is also the implicit
https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWritetime.html for
each column. Writetime didn't seem have hits in the Apache docs so I
linked to the Datastax docs. I'll see about getting them updated.


Regards,

Ariel



On Tue, Nov 29, 2016, at 04:49 PM, Terry Liu wrote:

> It appears that a single query that calls Cassandra's `now()`
> time function may actually cause a query to write or return
> different times.
> 

> Is this the expected or defined behavior, and if so, why does it
> behave like this rather than evaluating `now()` once across an entire
> statement?
> 

> This really affects UPDATE statements but to test it more easily, you
> could try something like:
> 

> SELECT toTimestamp(now()) as a, toTimestamp(now()) as b

> FROM keyspace.table

> LIMIT 100;

> 

> If you run that a few times, you should eventually see that the
> timestamp returned moves onto the next millisecond mid-query.
> 

> -- 

> *Software Engineer*

> Turnitin - http://www.turnitin.com[2]

> t...@turnitin.com




Links:

  1. 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/TimeFcts.java#L54
  2. http://www.turnitin.com/