[OT] All your medians are belong to me

Andrei Alexandrescu via Digitalmars-d Mon, 21 Nov 2016 09:42:17 -0800

Hey folks, I'm working on a paper for fast median computation andhttps://issues.dlang.org/show_bug.cgi?id=16517 came to mind. I see theGoogle ngram corpus has occurrences of n-grams per year. Is dataaggregated for all years available somewhere? I'd like to compute e.g."the word (1-gram) with the median frequency across all English books"so I don't need the frequencies per year, only totals.

Of course I can download the entire corpus and then do some processing,but that would take a long time.

Also, if you can think of any large corpus that would be pertinent formedian computation, please let me know!



Thanks,

Andrei

[OT] All your medians are belong to me

Reply via email to