Those who are interested in how well the predictions of AI / LLMs compare to actual corpus data may be interested in:
*https://www.english-corpora.org/ai-llms/ <https://www.english-corpora.org/ai-llms/>* The data includes seven in-depth comparisons of LLMs and corpora (about 90 pages of discussion and examples) for the following topics: *word frequency <https://www.english-corpora.org/ai-llms/words.pdf>, phrase frequency <https://www.english-corpora.org/ai-llms/phrases.pdf>, collocates <https://www.english-corpora.org/ai-llms/collocates.pdf>, comparing words <https://www.english-corpora.org/ai-llms/compare-words.pdf>* (via collocates), *genre-based variation <https://www.english-corpora.org/ai-llms/genres.pdf>, historical variation <https://www.english-corpora.org/ai-llms/historical.pdf>*, and *dialectal variation* <https://www.english-corpora.org/ai-llms/dialects.pdf>. Best, Mark Davies <https://www.mark-davies.org/> English-Corpora.org
_______________________________________________ Corpora mailing list -- [email protected] https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to [email protected]
