Not that I've seen, at least not in any worker independent way.  To guarantee 
consecutive values you'd have to create a udf or some such that provided a new 
row id.  This probably isn't an issue on small data sets but would cause a lot 
of added communication on larger clusters / datasets.

Mike

> On Aug 5, 2016, at 11:21 AM, janardhan shetty <janardhan...@gmail.com> wrote:
> 
> Mike,
> 
> Any suggestions on doing it for consequitive id's?
> 
>> On Aug 5, 2016 9:08 AM, "Tony Lane" <tonylane....@gmail.com> wrote:
>> Mike.
>> 
>> I have figured how to do this .  Thanks for the suggestion. It works great.  
>> I am trying to figure out the performance impact of this. 
>> 
>> thanks again
>> 
>> 
>>> On Fri, Aug 5, 2016 at 9:25 PM, Tony Lane <tonylane....@gmail.com> wrote:
>>> @mike  - this looks great. How can i do this in java ?   what is the 
>>> performance implication on a large dataset  ? 
>>> 
>>> @sonal  - I can't have a collision in the values. 
>>> 
>>>> On Fri, Aug 5, 2016 at 9:15 PM, Mike Metzger <m...@flexiblecreations.com> 
>>>> wrote:
>>>> You can use the monotonically_increasing_id method to generate guaranteed 
>>>> unique (but not necessarily consecutive) IDs.  Calling something like:
>>>> 
>>>> df.withColumn("id", monotonically_increasing_id())
>>>> 
>>>> You don't mention which language you're using but you'll need to pull in 
>>>> the sql.functions library.
>>>> 
>>>> Mike
>>>> 
>>>>> On Aug 5, 2016, at 9:11 AM, Tony Lane <tonylane....@gmail.com> wrote:
>>>>> 
>>>>> Ayan - basically i have a dataset with structure, where bid are unique 
>>>>> string values
>>>>> 
>>>>> bid: String
>>>>> val : integer
>>>>> 
>>>>> I need unique int values for these string bid''s to do some processing in 
>>>>> the dataset
>>>>> 
>>>>> like 
>>>>> 
>>>>> id:int   (unique integer id for each bid)
>>>>> bid:String
>>>>> val:integer
>>>>> 
>>>>> 
>>>>> 
>>>>> -Tony
>>>>> 
>>>>>> On Fri, Aug 5, 2016 at 6:35 PM, ayan guha <guha.a...@gmail.com> wrote:
>>>>>> Hi
>>>>>> 
>>>>>> Can you explain a little further? 
>>>>>> 
>>>>>> best
>>>>>> Ayan
>>>>>> 
>>>>>>> On Fri, Aug 5, 2016 at 10:14 PM, Tony Lane <tonylane....@gmail.com> 
>>>>>>> wrote:
>>>>>>> I have a row with structure like
>>>>>>> 
>>>>>>> identifier: String
>>>>>>> value: int
>>>>>>> 
>>>>>>> All identifier are unique and I want to generate a unique long id for 
>>>>>>> the data and get a row object back for further processing. 
>>>>>>> 
>>>>>>> I understand using the zipWithUniqueId function on RDD, but that would 
>>>>>>> mean first converting to RDD and then joining back the RDD and dataset
>>>>>>> 
>>>>>>> What is the best way to do this ? 
>>>>>>> 
>>>>>>> -Tony 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Best Regards,
>>>>>> Ayan Guha

Reply via email to