I need to determine syllable stress for the top 60,000+ lemmas in a 14 billion word web-based corpus that I'm creating. This will allow users an additional way to search the corpus, in addition to word, lemma, PoS, synonyms, customized wordlists, etc.
------- Using the Carnegie Mellon Pronouncing Dictionary and 3-4 online dictionaries, I'm able to get the data for about 47,000 of these 60,000 lemmas, e.g. http://www.speech.cs.cmu.edu/cgi-bin/cmudict?in=mechanical&stress=-s M AH0 K AE1 N IH0 K AH0 L https://www.merriam-webster.com/dictionary/mechanical mi-'ka-ni-k?l But this still leaves about 13,000 (mostly lower-frequency) lemmas with no information on word stress. I suppose I could go through these one by one an indicate stress myself, but I'm wondering if anyone is aware of another tool that could do this. (BTW, I've also tried http://www.speech.cs.cmu.edu/tools/lextool.html, but it doesn't show syllable stress). Thanks in advance. ============================================ Mark Davies Professor of Linguistics / Brigham Young University http://davies-linguistics.byu.edu/ ** Corpus design and use // Linguistic databases ** ** Historical linguistics // Language variation ** ** English, Spanish, and Portuguese ** ============================================
_______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora@uib.no https://mailman.uib.no/listinfo/corpora