Not sure what is the equivalent of python difflib (SequenceMatcher) in R. If you have one, it will work.
Sent from a handheld device. Pardon the brevity and typos. On Aug 25, 2020, 20:09 +0530, [email protected] <[email protected]>, wrote: > Hi, > > I have collected hospital data from multiple sources. However, each source > have different name. Trying to clean list with no duplicates. I am using R > and couldn't resolve with stringdist_join . Appreciate you suggesting some > approach. > > For example, Guntur (A.P) is listed with following names. Can we mark (or > eliminate) duplicate? > > Example 1 > SANKARA EYE HOSPITAL(GUNTUR) > SANKARA EYE HOSPITAL > SANKARA EYE HOSPITAL ( A UNIT OF SRI KANCHI KAMA KOTI MEDICAL TRUST) > > > Example 2 > ASHIRWAD HEART HOSPITAL ( GHATKOPAR ) > Ashirwad Heart Hospital > ASHIRWAD HEART HOSPITAL ( GHATKOPAR ) > Ashirwad Heart Hospita-Ghatkopar > > Thanks > Ram > -- > Datameet is a community of Data Science enthusiasts in India. Know more about > us by visiting http://datameet.org > --- > You received this message because you are subscribed to the Google Groups > "datameet" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/datameet/19ee8101-84ec-42b0-974a-43035b5902f1n%40googlegroups.com. -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/datameet/f69e252d-a5fb-4a34-afc3-67958614c8f3%40Spark.
