Thanks !

I will have a look and update here.

On Tue, Aug 14, 2018 at 3:53 PM Nikhil VJ <[email protected]> wrote:

> Hi Pradeep,
>
> If you have all the words in one column of an excel, then *OpenRefine*
> tool can help you "iron out" the differences. It will show you a cluster of
> similar looking cells, and you can decide which will be the one to go with
> (you can even type in a new standardised value if all options are wrong).
> It will then over-write all those cells with the one standardised value.
> The rest of your data remains intact. No need of sorting, filtering etc.
>
> You can read a basic walkthrough for this specific use case here:
> http://datameet.org/2018/06/13/openrefine-bus-stop/
>
> It uses multiple algorithms to detect similar words, similar to what
> search engines and dictionaries do when you make a typo. You can modify the
> algorithm options and do new scans to catch the hard-to-find ones. If there
> is a false-positive, you can just ignore that and no changes will be done
> to those values.
>
>
> --
> Cheers,
> Nikhil VJ
> +91-966-583-1250
> Pune, India
> Website <http://nikhilvj.co.in>
> DataMeet Pune chapter <https://datameet-pune.github.io/>
> Self-designed learner at Swaraj University <
> http://www.swarajuniversity.org>
> Payment / Contribute <https://nikhilvj.benow.in/pay>
>
> On Tue, Aug 14, 2018 at 8:07 AM, Venkata Pingali <[email protected]>
> wrote:
>
>> Soundex is not enough. We went through metaphone and
>> double-metaphone as well. The last showed the best
>> performance when combined with simple ways to reduce
>> the search space (e.g., names that start with the same
>> alphabet).
>>
>> But it still had too many false positives and negatives. We ended up
>> using a much simpler approach of manually labeling Top N most
>> frequent names.
>>
>>
>>
>> On Tue, Aug 14, 2018 at 7:58 AM, Pradeep Bhatt <[email protected]>
>> wrote:
>>
>>> Hi All,
>>>
>>> What is the best way to know if two words are phonetically similar
>>>
>>> e.g *Some similar *words
>>>
>>> Pradeep - Pradip
>>> Thakkkar - Thakkar
>>> Rathod - Rathor
>>> Swetha - Sweta
>>> bhen - ben
>>> Sumandev - Sumandeb
>>>
>>> *Non - Similar*
>>> Ramesh - Rajesh
>>>
>>> This is needed for spelling mistakes introduced when translating from
>>> indian languages to English.
>>>
>>> Does Soundex work well for Indian names ?
>>>
>>> Regards,
>>> Pradeep
>>>
>>>
>>>
>>> --
>>> Datameet is a community of Data Science enthusiasts in India. Know more
>>> about us by visiting http://datameet.org
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "datameet" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> Datameet is a community of Data Science enthusiasts in India. Know more
>> about us by visiting http://datameet.org
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "datameet" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to