Re: [OT] All your medians are belong to me

Patrick Schluter via Digitalmars-d Mon, 21 Nov 2016 12:40:55 -0800

On Monday, 21 November 2016 at 18:39:26 UTC, Andrei Alexandrescuwrote:

On 11/21/2016 01:18 PM, jmh530 wrote:
I would just generate a bunch of integers randomly and usethat, but I
don't know if you specifically need to work with strings.
I have that, too, but was looking for some real data as well.It would be a nice addition. -- Andrei

I don't really know what kind of data you would need but thereare the European Unions Language Technology Resources corpusesmade available for the research community. There are severaldifferent data sets in different formats (documents, alignments,xml) and in all European languages that can be used forexperiments and real world use. The data is public domain and isfree to use. The DGT-TM dataset is compiled by myself and updatedyearly. It consist of around 12 billion characters or 1.8 billionwords or 111 million segments in 28 languages.


https://ec.europa.eu/jrc/en/language-technologies

Re: [OT] All your medians are belong to me

Reply via email to