Mike. I have figured how to do this . Thanks for the suggestion. It works great. I am trying to figure out the performance impact of this.
thanks again On Fri, Aug 5, 2016 at 9:25 PM, Tony Lane <tonylane....@gmail.com> wrote: > @mike - this looks great. How can i do this in java ? what is the > performance implication on a large dataset ? > > @sonal - I can't have a collision in the values. > > On Fri, Aug 5, 2016 at 9:15 PM, Mike Metzger <m...@flexiblecreations.com> > wrote: > >> You can use the monotonically_increasing_id method to generate guaranteed >> unique (but not necessarily consecutive) IDs. Calling something like: >> >> df.withColumn("id", monotonically_increasing_id()) >> >> You don't mention which language you're using but you'll need to pull in >> the sql.functions library. >> >> Mike >> >> On Aug 5, 2016, at 9:11 AM, Tony Lane <tonylane....@gmail.com> wrote: >> >> Ayan - basically i have a dataset with structure, where bid are unique >> string values >> >> bid: String >> val : integer >> >> I need unique int values for these string bid''s to do some processing in >> the dataset >> >> like >> >> id:int (unique integer id for each bid) >> bid:String >> val:integer >> >> >> >> -Tony >> >> On Fri, Aug 5, 2016 at 6:35 PM, ayan guha <guha.a...@gmail.com> wrote: >> >>> Hi >>> >>> Can you explain a little further? >>> >>> best >>> Ayan >>> >>> On Fri, Aug 5, 2016 at 10:14 PM, Tony Lane <tonylane....@gmail.com> >>> wrote: >>> >>>> I have a row with structure like >>>> >>>> identifier: String >>>> value: int >>>> >>>> All identifier are unique and I want to generate a unique long id for >>>> the data and get a row object back for further processing. >>>> >>>> I understand using the zipWithUniqueId function on RDD, but that would >>>> mean first converting to RDD and then joining back the RDD and dataset >>>> >>>> What is the best way to do this ? >>>> >>>> -Tony >>>> >>>> >>> >>> >>> -- >>> Best Regards, >>> Ayan Guha >>> >> >> >