Hi Milos, Thank you for providing the pretrained word vectors. I am specifically interested in the Arabic version. I have a question in regards to Hamza manipulation, I noticed when searching for أحمد [Ahmad or >Hmd in Buckwalter] the results were empty as opposed to using احمد without hamza. Did you normalize all the hamza to regular alef?
Thank you, Ayah On Fri, Feb 2, 2018 at 9:07 AM, Miloš Jakubíček < milos.jakubi...@sketchengine.co.uk> wrote: > Dear all, > > this is to announce public availability of word embedding model calculated > for large corpora that we have in Sketch Engine. At this moment, we have > processed corpora for following languages: > > English, Arabic, Chinese, Czech, Danish, French, German, Italian, Korean, > Portuguese, Russian, Spanish > > See https://embeddings.sketchengine.co.uk/ where you can find an online > interface for executing word similarity queries (such as the infamous > king-man+woman) and download the datasets. They can be used freely for > non-commercial purposes, for the commercial ones do not hesitate to get > back to me to work out a mutually suitable model of collaboration. > > We continue building further models as our spare computing capacity > allows, and will continue publishing them. If you are interested in a > particular language that is missing at this moment, let me know and I can > try to prioritise (no guarantees though). > > The embeddings were calculated using FastText with various parameters and > on various corpus attributes (word, lemma, lemma+PoS combination, lowercase > etc.) > > We have had increasing amount of requests to obtain corpora from Sketch > Engine for these purposes, so this is our response to that to support > research in this area. > > Cheers, > Milos Jakubicek > > CEO, Lexical Computing > Brno, CZ | Brighton, UK > http://www.lexicalcomputing.com > http://www.sketchengine.co.uk > > _______________________________________________ > UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora > Corpora mailing list > Corpora@uib.no > https://mailman.uib.no/listinfo/corpora > >
_______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora@uib.no https://mailman.uib.no/listinfo/corpora