On Sat, Dec 3, 2016 at 11:01 AM, Edward Capriolo <edlinuxg...@gmail.com> wrote:
> > > On Saturday, December 3, 2016, Edward Capriolo <edlinuxg...@gmail.com> > wrote: > >> >> >> On Saturday, December 3, 2016, Jonathan Haddad <j...@jonhaddad.com> wrote: >> >>> That isn't what the original thread is about. The thread is about the >>> timestamp portion of the UUID being different. >>> >>> Having UUID() return the same thing for all rows in a batch would be the >>> unexpected thing virtually every time. >>> On Sat, Dec 3, 2016 at 7:09 AM Edward Capriolo <edlinuxg...@gmail.com> >>> wrote: >>> >>>> >>>> >>>> On Friday, December 2, 2016, Jonathan Haddad <j...@jonhaddad.com> wrote: >>>> >>>>> This isn't about using the same UUID though. It's about the timestamp >>>>> bits in the UUID. >>>>> >>>>> What the use case is for generating multiple UUIDs in a single row? >>>>> Why do you need to extract the timestamp out of both? >>>>> On Fri, Dec 2, 2016 at 10:24 AM Edward Capriolo <edlinuxg...@gmail.com> >>>>> wrote: >>>>> >>>>>> >>>>>> On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne < >>>>>> sylv...@datastax.com> wrote: >>>>>> >>>>>>> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo < >>>>>>> edlinuxg...@gmail.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> I am not sure you saw my reply on thread but I believe everyone's >>>>>>>> needs can be met I will copy that here: >>>>>>>> >>>>>>> >>>>>>> I saw it, but the real problem that was raised initially was not >>>>>>> that of UDF and of allowing both behavior. It's a matter of people being >>>>>>> confused by the behavior of a non-UDF function, now(), and suggesting it >>>>>>> should be changed. >>>>>>> >>>>>>> The Hive idea is interesting I guess, and we can switch to >>>>>>> discussing that, but it's a different problem really and I'm not a fond >>>>>>> of >>>>>>> derailing threads. I will just note though that if we're not talking >>>>>>> about >>>>>>> a confusion issue but rather how to get a timeuuid to be fixed within a >>>>>>> statement, then there is much much more trivial solution: generate it >>>>>>> client side. The `now()` function is a small convenience but there is >>>>>>> nothing you cannot do without it client side, and that actually >>>>>>> basically >>>>>>> stands for almost any use of (non aggregate) function in Cassandra >>>>>>> currently. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> "Food for thought: Hive's UDFs introduced an annotation >>>>>>>> @UDFType(deterministic = false) >>>>>>>> >>>>>>>> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map >>>>>>>> -and-reduce-side-in-hive/ >>>>>>>> >>>>>>>> The effect is the query planner can see when such a UDF is in use >>>>>>>> and determine the value once at the start of a very long query." >>>>>>>> >>>>>>>> Essentially hive had a similar if not identical problem, during a >>>>>>>> long running distributed process like map/reduce some users wanted the >>>>>>>> semantics of: >>>>>>>> >>>>>>>> 1) Each call should have a new timestamps >>>>>>>> >>>>>>>> While other users wanted the semantics of: >>>>>>>> >>>>>>>> 2) Each call should generate the same timestamp >>>>>>>> >>>>>>>> The solution implemented was to add an annotation to udf such that >>>>>>>> the query planner would pick up the annotation and act accordingly. >>>>>>>> >>>>>>>> (Here is a related issue https://issues.apache.or >>>>>>>> g/jira/browse/HIVE-1986 >>>>>>>> >>>>>>>> As a result you can essentially implement two UDFS >>>>>>>> >>>>>>>> @UDFType(deterministic = false) >>>>>>>> public class UDFNow >>>>>>>> >>>>>>>> and for the other people >>>>>>>> >>>>>>>> @UDFType(deterministic = true) >>>>>>>> public class UDFNowOnce extends UDFNow >>>>>>>> >>>>>>>> Both user cases are met in a sensible way. >>>>>>>> >>>>>>> >>>>>>> >>>>>> The `now()` function is a small convenience but there is nothing you >>>>>> cannot do without it client side, and that actually basically stands for >>>>>> almost any use of (non aggregate) function in Cassandra currently. >>>>>> >>>>>> Casandra's changing philosophy over which entity should create such >>>>>> information client/server/driver does not make this problem easy. >>>>>> >>>>>> If you take into account that you have users who do not understand >>>>>> all the intricacy of uuid the problem is compounded. IE How does one >>>>>> generate a UUID each c#, python, java etc? with the 47 random bits of bla >>>>>> bla. That is not super easy information to find. Maybe you find a stack >>>>>> overflow post that actually gives bad advice etc. >>>>>> >>>>>> Many times in Cassandra you are using a uuid because you do not have >>>>>> a unique key in the insert and you wish to create one. If you are >>>>>> inserting >>>>>> more then a single record using that same UUID and you do not want the >>>>>> burden of wanting to do it yourself you would have to do >>>>>> write>>read>>write >>>>>> which is an anti-pattern. >>>>>> >>>>> >>>> Not multiple ids for a single row. The same id for multiple inserts in >>>> a batch. >>>> >>>> For example lets say I have an application where my data has no unique >>>> key. >>>> >>>> Table poke >>>> Poker, pokee, time >>>> >>>> Suppose i consume pokes from kafka build a batch of 30k and insert them. >>>> You probably want to denormalize into two tables: >>>> Primary key (poker, time) >>>> Primary key (pokee,time) >>>> >>>> It makes sense that they all have the same uuid if you want it to be >>>> the uuid of the batch. This would make it easy to correlate all the events. >>>> Easy to delete them all as well. >>>> >>>> The do it client side argument is totally valid, but has been a >>>> justification for not adding features many of which are eventually added >>>> anyway. >>>> >>>> >>>> >>>> >>>> -- >>>> Sorry this was sent from mobile. Will do less grammar and spell check >>>> than usual. >>>> >>> >> Debateable. >> >> Cassandra for example always said batch mutations happen.. all at >> once..but it was not until snaptree that you could see effects of half a >> batch. Even now a multi partition batch does not happen all at once. >> >> What people is expect does not always align with reality. Point me to a >> unit test that documents said behaivor and proves it does not change. >> >> Maybe people expect a query planner to fold constants, many people might >> think a smart query engine could memorize calls to the same function with >> no args, many expect that thinga happen in isolation. >> >> >> -- >> Sorry this was sent from mobile. Will do less grammar and spell check >> than usual. >> > > A new unique timeuuid (at the time where the statement using it is > executed). > > Indicates that each statement has one unique time uuid. Calling the udf > twice in one statement and getting different results dissagrees with the > documentation. > > > > -- > Sorry this was sent from mobile. Will do less grammar and spell check than > usual. > https://issues.apache.org/jira/browse/CASSANDRA-12989