We have just released the new 14 billion word iWeb 
corpus<https://corpus.byu.edu/iweb/>, which complements other BYU 
corpora<https://corpus.byu.edu/> such as COCA, COHA, NOW, BYU-BNC, GloWbE, 
Wikipedia, and EEBO.

At 14 billion words, iWeb is more than 25 times as large as the 560 million 
word COCA corpus. iWeb also has a much wider range of web-based materials than 
does COCA, since it is based on 22 million web pages in nearly 100,000 
carefully selected websites (based on 
Alexa.com<https://www.alexa.com/topsites>, from Amazon).

New in iWeb is the ability to browse through the top 60,000 words in the 
corpus, and to search this list by word form, part of speech, rank (#1-60,000), 
and even pronunciation.

Most importantly, you can then see detailed information on each of the top 
60,000 words in the corpus – definition, frequency information, synonyms and 
other related words (from WordNet, word families, MRC, etc), collocates (in a 
much improved format), related “topics” (perhaps much more useful than 
collocates), “clusters” (new in iWeb), relevant websites, and sample 
concordance/KWIC lines. There are extensive hyperlinks on each page, which 
allow you to quickly and easily move from one word to a number of related words.

In addition, for each of these 60,000 words, there are “quick links” to related 
data from other websites – pronunciation, additional definitions, images, 
videos, and translations (for more than 100 languages).

iWeb also allows you to quickly and easily create “virtual corpora” on nearly 
any topic, and these virtual corpora can then be searched as their own 
“stand-alone” corpora, or compared to other virtual corpora that you have 

Finally, in terms of “standard” corpus searches, we note that (due to 
improvements in the corpus architecture) iWeb is faster than any of the other 
BYU corpora, and in most cases it is also much faster than other large, 10-20 
billion word online corpora.

For a short overview of the corpus (in graphical format, with an emphasis on 
the new features), please see:


We hope that this new corpus is useful to you in your teaching, learning, and 


Mark Davies

BYU Corpora


Mark Davies

Professor of Linguistics / Brigham Young University


** Corpus design and use // Linguistic databases **

** Historical linguistics // Language variation **

** English, Spanish, and Portuguese **


UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list

Reply via email to