Re: Generating unique id for a column in Row without breaking into RDD and joining back

Tony Lane Fri, 05 Aug 2016 09:08:57 -0700

Mike.

I have figured how to do this .  Thanks for the suggestion. It works
great.  I am trying to figure out the performance impact of this.


thanks again


On Fri, Aug 5, 2016 at 9:25 PM, Tony Lane <tonylane....@gmail.com> wrote:

> @mike  - this looks great. How can i do this in java ?   what is the
> performance implication on a large dataset  ?
>
> @sonal  - I can't have a collision in the values.
>
> On Fri, Aug 5, 2016 at 9:15 PM, Mike Metzger <m...@flexiblecreations.com>
> wrote:
>
>> You can use the monotonically_increasing_id method to generate guaranteed
>> unique (but not necessarily consecutive) IDs.  Calling something like:
>>
>> df.withColumn("id", monotonically_increasing_id())
>>
>> You don't mention which language you're using but you'll need to pull in
>> the sql.functions library.
>>
>> Mike
>>
>> On Aug 5, 2016, at 9:11 AM, Tony Lane <tonylane....@gmail.com> wrote:
>>
>> Ayan - basically i have a dataset with structure, where bid are unique
>> string values
>>
>> bid: String
>> val : integer
>>
>> I need unique int values for these string bid''s to do some processing in
>> the dataset
>>
>> like
>>
>> id:int   (unique integer id for each bid)
>> bid:String
>> val:integer
>>
>>
>>
>> -Tony
>>
>> On Fri, Aug 5, 2016 at 6:35 PM, ayan guha <guha.a...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> Can you explain a little further?
>>>
>>> best
>>> Ayan
>>>
>>> On Fri, Aug 5, 2016 at 10:14 PM, Tony Lane <tonylane....@gmail.com>
>>> wrote:
>>>
>>>> I have a row with structure like
>>>>
>>>> identifier: String
>>>> value: int
>>>>
>>>> All identifier are unique and I want to generate a unique long id for
>>>> the data and get a row object back for further processing.
>>>>
>>>> I understand using the zipWithUniqueId function on RDD, but that would
>>>> mean first converting to RDD and then joining back the RDD and dataset
>>>>
>>>> What is the best way to do this ?
>>>>
>>>> -Tony
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>
>

Re: Generating unique id for a column in Row without breaking into RDD and joining back

Reply via email to