On Sat, Dec 3, 2016 at 11:01 AM, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

>
>
> On Saturday, December 3, 2016, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>>
>>
>> On Saturday, December 3, 2016, Jonathan Haddad <j...@jonhaddad.com> wrote:
>>
>>> That isn't what the original thread is about. The thread is about the
>>> timestamp portion of the UUID being different.
>>>
>>> Having UUID() return the same thing for all rows in a batch would be the
>>> unexpected thing virtually every time.
>>> On Sat, Dec 3, 2016 at 7:09 AM Edward Capriolo <edlinuxg...@gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Friday, December 2, 2016, Jonathan Haddad <j...@jonhaddad.com> wrote:
>>>>
>>>>> This isn't about using the same UUID though. It's about the timestamp
>>>>> bits in the UUID.
>>>>>
>>>>> What the use case is for generating multiple UUIDs in a single row?
>>>>> Why do you need to extract the timestamp out of both?
>>>>> On Fri, Dec 2, 2016 at 10:24 AM Edward Capriolo <edlinuxg...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On Thu, Dec 1, 2016 at 11:09 AM, Sylvain Lebresne <
>>>>>> sylv...@datastax.com> wrote:
>>>>>>
>>>>>>> On Thu, Dec 1, 2016 at 4:44 PM, Edward Capriolo <
>>>>>>> edlinuxg...@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> I am not sure you saw my reply on thread but I believe everyone's
>>>>>>>> needs can be met I will copy that here:
>>>>>>>>
>>>>>>>
>>>>>>> I saw it, but the real problem that was raised initially was not
>>>>>>> that of UDF and of allowing both behavior. It's a matter of people being
>>>>>>> confused by the behavior of a non-UDF function, now(), and suggesting it
>>>>>>> should be changed.
>>>>>>>
>>>>>>> The Hive idea is interesting I guess, and we can switch to
>>>>>>> discussing that, but it's a different problem really and I'm not a fond 
>>>>>>> of
>>>>>>> derailing threads. I will just note though that if we're not talking 
>>>>>>> about
>>>>>>> a confusion issue but rather how to get a timeuuid to be fixed within a
>>>>>>> statement, then there is much much more trivial solution: generate it
>>>>>>> client side. The `now()` function is a small convenience but there is
>>>>>>> nothing you cannot do without it client side, and that actually 
>>>>>>> basically
>>>>>>> stands for almost any use of (non aggregate) function in Cassandra
>>>>>>> currently.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> "Food for thought: Hive's UDFs introduced an annotation
>>>>>>>> @UDFType(deterministic = false)
>>>>>>>>
>>>>>>>> http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map
>>>>>>>> -and-reduce-side-in-hive/
>>>>>>>>
>>>>>>>> The effect is the query planner can see when such a UDF is in use
>>>>>>>> and determine the value once at the start of a very long query."
>>>>>>>>
>>>>>>>> Essentially hive had a similar if not identical problem, during a
>>>>>>>> long running distributed process like map/reduce some users wanted the
>>>>>>>> semantics of:
>>>>>>>>
>>>>>>>> 1) Each call should have a new timestamps
>>>>>>>>
>>>>>>>> While other users wanted the semantics of:
>>>>>>>>
>>>>>>>> 2) Each call should generate the same timestamp
>>>>>>>>
>>>>>>>> The solution implemented was to add an annotation to udf such that
>>>>>>>> the query planner would pick up the annotation and act accordingly.
>>>>>>>>
>>>>>>>> (Here is a related issue https://issues.apache.or
>>>>>>>> g/jira/browse/HIVE-1986
>>>>>>>>
>>>>>>>> As a result you can essentially implement two UDFS
>>>>>>>>
>>>>>>>> @UDFType(deterministic = false)
>>>>>>>> public class UDFNow
>>>>>>>>
>>>>>>>> and for the other people
>>>>>>>>
>>>>>>>> @UDFType(deterministic = true)
>>>>>>>> public class UDFNowOnce extends UDFNow
>>>>>>>>
>>>>>>>> Both user cases are met in a sensible way.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> The `now()` function is a small convenience but there is nothing you
>>>>>> cannot do without it client side, and that actually basically stands for
>>>>>> almost any use of (non aggregate) function in Cassandra currently.
>>>>>>
>>>>>> Casandra's changing philosophy over which entity should create such
>>>>>> information client/server/driver does not make this problem easy.
>>>>>>
>>>>>> If you take into account that you have users who do not understand
>>>>>> all the intricacy of uuid the problem is compounded. IE How does one
>>>>>> generate a UUID each c#, python, java etc? with the 47 random bits of bla
>>>>>> bla. That is not super easy information to find. Maybe you find a stack
>>>>>> overflow post that actually gives bad advice etc.
>>>>>>
>>>>>> Many times in Cassandra you are using a uuid because you do not have
>>>>>> a unique key in the insert and you wish to create one. If you are 
>>>>>> inserting
>>>>>> more then a single record using that same UUID and you do not want the
>>>>>> burden of wanting to do it yourself you would have to do 
>>>>>> write>>read>>write
>>>>>> which is an anti-pattern.
>>>>>>
>>>>>
>>>> Not multiple ids for a single row. The same id for multiple inserts in
>>>> a batch.
>>>>
>>>> For example lets say I have an application where my data has no unique
>>>> key.
>>>>
>>>> Table poke
>>>> Poker, pokee, time
>>>>
>>>> Suppose i consume pokes from kafka build a batch of 30k and insert them.
>>>> You probably want to denormalize into two tables:
>>>> Primary key (poker, time)
>>>> Primary key (pokee,time)
>>>>
>>>> It makes sense that they all have the same uuid if you want it to be
>>>> the uuid of the batch. This would make it easy to correlate all the events.
>>>> Easy to delete them all as well.
>>>>
>>>> The do it client side argument is totally valid, but has been a
>>>> justification for not adding features many of which are eventually added
>>>> anyway.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sorry this was sent from mobile. Will do less grammar and spell check
>>>> than usual.
>>>>
>>>
>> Debateable.
>>
>> Cassandra for example always said batch mutations happen.. all at
>> once..but it was not until snaptree that you could see effects of half a
>> batch. Even now a multi partition batch does not happen all at once.
>>
>> What people is expect does not always align with reality. Point me to a
>> unit test that documents said behaivor and proves it does not change.
>>
>> Maybe people expect a query planner to fold constants, many people might
>> think a smart query engine could memorize calls to the same function with
>> no args, many expect that thinga happen in isolation.
>>
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check
>> than usual.
>>
>
>  A new unique timeuuid (at the time where the statement using it is
> executed).
>
> Indicates that each statement has one unique time uuid. Calling the udf
> twice in one statement and getting different results dissagrees with the
> documentation.
>
>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>

https://issues.apache.org/jira/browse/CASSANDRA-12989

Reply via email to