Hi Ram, Faced with similar issues, the following worked for me -
1. Make everything lower or upper case using tolower/ toupper 2. Grep to match the common pattern of name Best, Sudatta > On Aug 25, 2020, at 7:52 AM, Rahul Gupta <[email protected]> wrote: > > Hi Ram, > > Not sure if there is something very similar to FuzzyWuzzy (Python) in R. But > you can try this link > https://astrostatistics.psu.edu/su07/R/html/base/html/agrep.html > > It is similar kind of approximate string matching. You can set your own > threshold criteria and filter data accordingly. > >> On Tue, 25 Aug, 2020, 8:09 pm [email protected], >> <[email protected]> wrote: >> Hi, >> >> I have collected hospital data from multiple sources. However, each source >> have different name. Trying to clean list with no duplicates. I am using R >> and couldn't resolve with stringdist_join . Appreciate you suggesting some >> approach. >> >> For example, Guntur (A.P) is listed with following names. Can we mark (or >> eliminate) duplicate? >> >> Example 1 >> SANKARA EYE HOSPITAL(GUNTUR) >> SANKARA EYE HOSPITAL >> SANKARA EYE HOSPITAL ( A UNIT OF SRI KANCHI KAMA KOTI MEDICAL TRUST) >> >> >> Example 2 >> ASHIRWAD HEART HOSPITAL ( GHATKOPAR ) >> Ashirwad Heart Hospital >> ASHIRWAD HEART HOSPITAL ( GHATKOPAR ) >> Ashirwad Heart Hospita-Ghatkopar >> >> Thanks >> Ram >> -- >> Datameet is a community of Data Science enthusiasts in India. Know more >> about us by visiting http://datameet.org >> --- >> You received this message because you are subscribed to the Google Groups >> "datameet" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/datameet/19ee8101-84ec-42b0-974a-43035b5902f1n%40googlegroups.com. > > -- > Datameet is a community of Data Science enthusiasts in India. Know more about > us by visiting http://datameet.org > --- > You received this message because you are subscribed to the Google Groups > "datameet" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/datameet/CAKxLuZeB5_2K4Td%3DP8-_AjFob9Wp2Vc9jic649HD%2BV1itEpYfg%40mail.gmail.com. -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/datameet/782F4548-3EF7-4CF6-8AB7-43A1E467BD7F%40gmail.com.
