Hi Milos,

Thank you for providing the pretrained word vectors. I am specifically
interested in the Arabic version.
I have a question in regards to Hamza manipulation, I noticed when
searching for أحمد [Ahmad or >Hmd in Buckwalter] the results were empty as
opposed to using احمد without hamza. Did you normalize all the hamza to
regular alef?

Thank you,
 Ayah

On Fri, Feb 2, 2018 at 9:07 AM, Miloš Jakubíček <
milos.jakubi...@sketchengine.co.uk> wrote:

> Dear all,
>
> this is to announce public availability of word embedding model calculated
> for large corpora that we have in Sketch Engine. At this moment, we have
> processed corpora for following languages:
>
> English, Arabic, Chinese, Czech, Danish, French, German, Italian, Korean,
> Portuguese, Russian, Spanish
>
> See https://embeddings.sketchengine.co.uk/ where you can find an online
> interface for executing word similarity queries (such as the infamous
> king-man+woman) and download the datasets. They can be used freely for
> non-commercial purposes, for the commercial ones do not hesitate to get
> back to me to work out a mutually suitable model of collaboration.
>
> We continue building further models as our spare computing capacity
> allows, and will continue publishing them. If you are interested in a
> particular language that is missing at this moment, let me know and I can
> try to prioritise (no guarantees though).
>
> The embeddings were calculated using FastText with various parameters and
> on various corpus attributes (word, lemma, lemma+PoS combination, lowercase
> etc.)
>
> We have had increasing amount of requests to obtain corpora from Sketch
> Engine for these purposes, so this is our response to that to support
> research in this area.
>
> Cheers,
> Milos Jakubicek
>
> CEO, Lexical Computing
> Brno, CZ | Brighton, UK
> http://www.lexicalcomputing.com
> http://www.sketchengine.co.uk
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora@uib.no
> https://mailman.uib.no/listinfo/corpora
>
>
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora@uib.no
https://mailman.uib.no/listinfo/corpora

Reply via email to