Thanks ! I will have a look and update here.
On Tue, Aug 14, 2018 at 3:53 PM Nikhil VJ <[email protected]> wrote: > Hi Pradeep, > > If you have all the words in one column of an excel, then *OpenRefine* > tool can help you "iron out" the differences. It will show you a cluster of > similar looking cells, and you can decide which will be the one to go with > (you can even type in a new standardised value if all options are wrong). > It will then over-write all those cells with the one standardised value. > The rest of your data remains intact. No need of sorting, filtering etc. > > You can read a basic walkthrough for this specific use case here: > http://datameet.org/2018/06/13/openrefine-bus-stop/ > > It uses multiple algorithms to detect similar words, similar to what > search engines and dictionaries do when you make a typo. You can modify the > algorithm options and do new scans to catch the hard-to-find ones. If there > is a false-positive, you can just ignore that and no changes will be done > to those values. > > > -- > Cheers, > Nikhil VJ > +91-966-583-1250 > Pune, India > Website <http://nikhilvj.co.in> > DataMeet Pune chapter <https://datameet-pune.github.io/> > Self-designed learner at Swaraj University < > http://www.swarajuniversity.org> > Payment / Contribute <https://nikhilvj.benow.in/pay> > > On Tue, Aug 14, 2018 at 8:07 AM, Venkata Pingali <[email protected]> > wrote: > >> Soundex is not enough. We went through metaphone and >> double-metaphone as well. The last showed the best >> performance when combined with simple ways to reduce >> the search space (e.g., names that start with the same >> alphabet). >> >> But it still had too many false positives and negatives. We ended up >> using a much simpler approach of manually labeling Top N most >> frequent names. >> >> >> >> On Tue, Aug 14, 2018 at 7:58 AM, Pradeep Bhatt <[email protected]> >> wrote: >> >>> Hi All, >>> >>> What is the best way to know if two words are phonetically similar >>> >>> e.g *Some similar *words >>> >>> Pradeep - Pradip >>> Thakkkar - Thakkar >>> Rathod - Rathor >>> Swetha - Sweta >>> bhen - ben >>> Sumandev - Sumandeb >>> >>> *Non - Similar* >>> Ramesh - Rajesh >>> >>> This is needed for spelling mistakes introduced when translating from >>> indian languages to English. >>> >>> Does Soundex work well for Indian names ? >>> >>> Regards, >>> Pradeep >>> >>> >>> >>> -- >>> Datameet is a community of Data Science enthusiasts in India. Know more >>> about us by visiting http://datameet.org >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "datameet" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> Datameet is a community of Data Science enthusiasts in India. Know more >> about us by visiting http://datameet.org >> --- >> You received this message because you are subscribed to the Google Groups >> "datameet" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > Datameet is a community of Data Science enthusiasts in India. Know more > about us by visiting http://datameet.org > --- > You received this message because you are subscribed to the Google Groups > "datameet" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
