Re: [datameet] Help with R logic - near similar name

[email protected] Tue, 25 Aug 2020 17:52:11 -0700

Hi Ram

In addition to the helpful suggestions made above, here are some R-specific 
pointers:
— stringr is an extremely helpful package with which to do most of the 
string manipulation actions (whitespace removal, tokenisation, regex 
matching) recommended above.
— you may also need a package that helps you compute ‘distances’ between 
the strings you are comparing. stringdist is one such package. However, 
with Indian names, I found some of the phonetic distance algorithms 
(rogerroot, soundex) in the phonics package much more helpful.


Hope this helps! Good luck!
Madhu

On Wednesday, 26 August 2020 at 00:48:45 UTC+5:30 [email protected] wrote:

> Hi Ram,
>
> Faced with similar issues, the following worked for me - 
>
> 1. Make everything lower or upper case using tolower/ toupper
> 2. Grep to match the common pattern of name
>
> Best,
> Sudatta
>
> On Aug 25, 2020, at 7:52 AM, Rahul Gupta <[email protected]> wrote:
>
> Hi Ram,
>
> Not sure if there is something very similar to FuzzyWuzzy (Python) in R. 
> But you can try this link
> https://astrostatistics.psu.edu/su07/R/html/base/html/agrep.html
>
> It is similar kind of approximate string matching. You can set your own 
> threshold criteria and filter data accordingly.
>
> On Tue, 25 Aug, 2020, 8:09 pm [email protected], <[email protected]> 
> wrote:
>
>> Hi,
>>
>> I have collected hospital data from multiple sources. However, each 
>> source have different name. Trying to clean list with no duplicates. I am 
>> using R and couldn't resolve with stringdist_join . Appreciate you 
>> suggesting some approach. 
>>
>> For example, Guntur (A.P) is listed with following names. Can we mark (or 
>> eliminate) duplicate?
>>
>> Example 1
>> SANKARA EYE HOSPITAL(GUNTUR) 
>> SANKARA EYE HOSPITAL 
>> SANKARA EYE HOSPITAL ( A UNIT OF SRI KANCHI KAMA KOTI MEDICAL TRUST)   
>>
>>
>> Example 2
>> ASHIRWAD HEART HOSPITAL ( GHATKOPAR ) 
>> Ashirwad Heart Hospital 
>> ASHIRWAD HEART HOSPITAL ( GHATKOPAR ) 
>> Ashirwad Heart Hospita-Ghatkopar   
>>
>> Thanks
>> Ram
>>
>> -- 
>> Datameet is a community of Data Science enthusiasts in India. Know more 
>> about us by visiting http://datameet.org
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "datameet" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/datameet/19ee8101-84ec-42b0-974a-43035b5902f1n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/datameet/19ee8101-84ec-42b0-974a-43035b5902f1n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more 
> about us by visiting http://datameet.org
> --- 
> You received this message because you are subscribed to the Google Groups 
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
>
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/datameet/CAKxLuZeB5_2K4Td%3DP8-_AjFob9Wp2Vc9jic649HD%2BV1itEpYfg%40mail.gmail.com
>  
> <https://groups.google.com/d/msgid/datameet/CAKxLuZeB5_2K4Td%3DP8-_AjFob9Wp2Vc9jic649HD%2BV1itEpYfg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/ccf8287d-4b7e-4fe3-8efd-b15614f7f056n%40googlegroups.com.

Re: [datameet] Help with R logic - near similar name

Reply via email to