Hi Pradeep, If you have all the words in one column of an excel, then *OpenRefine* tool can help you "iron out" the differences. It will show you a cluster of similar looking cells, and you can decide which will be the one to go with (you can even type in a new standardised value if all options are wrong). It will then over-write all those cells with the one standardised value. The rest of your data remains intact. No need of sorting, filtering etc.
You can read a basic walkthrough for this specific use case here: http://datameet.org/2018/06/13/openrefine-bus-stop/ It uses multiple algorithms to detect similar words, similar to what search engines and dictionaries do when you make a typo. You can modify the algorithm options and do new scans to catch the hard-to-find ones. If there is a false-positive, you can just ignore that and no changes will be done to those values. -- Cheers, Nikhil VJ +91-966-583-1250 Pune, India Website <http://nikhilvj.co.in> DataMeet Pune chapter <https://datameet-pune.github.io/> Self-designed learner at Swaraj University <http://www.swarajuniversity.org> Payment / Contribute <https://nikhilvj.benow.in/pay> On Tue, Aug 14, 2018 at 8:07 AM, Venkata Pingali <[email protected]> wrote: > Soundex is not enough. We went through metaphone and > double-metaphone as well. The last showed the best > performance when combined with simple ways to reduce > the search space (e.g., names that start with the same > alphabet). > > But it still had too many false positives and negatives. We ended up > using a much simpler approach of manually labeling Top N most > frequent names. > > > > On Tue, Aug 14, 2018 at 7:58 AM, Pradeep Bhatt <[email protected]> > wrote: > >> Hi All, >> >> What is the best way to know if two words are phonetically similar >> >> e.g *Some similar *words >> >> Pradeep - Pradip >> Thakkkar - Thakkar >> Rathod - Rathor >> Swetha - Sweta >> bhen - ben >> Sumandev - Sumandeb >> >> *Non - Similar* >> Ramesh - Rajesh >> >> This is needed for spelling mistakes introduced when translating from >> indian languages to English. >> >> Does Soundex work well for Indian names ? >> >> Regards, >> Pradeep >> >> >> >> -- >> Datameet is a community of Data Science enthusiasts in India. Know more >> about us by visiting http://datameet.org >> --- >> You received this message because you are subscribed to the Google Groups >> "datameet" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > Datameet is a community of Data Science enthusiasts in India. Know more > about us by visiting http://datameet.org > --- > You received this message because you are subscribed to the Google Groups > "datameet" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
