Dear Ayah,

I asked my colleagues and apparently yes, the tagger removes all diacritics.

Best
Milos

Milos Jakubicek

CEO, Lexical Computing
Brno, CZ | Brighton UK
http://www.lexicalcomputing.com
http://www.sketchengine.co.uk

On 2 February 2018 at 18:11, Ayah Zirikly <aya.zeri...@gmail.com> wrote:

> Hi Milos,
>
> Thank you for providing the pretrained word vectors. I am specifically
> interested in the Arabic version.
> I have a question in regards to Hamza manipulation, I noticed when
> searching for أحمد [Ahmad or >Hmd in Buckwalter] the results were empty as
> opposed to using احمد without hamza. Did you normalize all the hamza to
> regular alef?
>
> Thank you,
>  Ayah
>
> On Fri, Feb 2, 2018 at 9:07 AM, Miloš Jakubíček <
> milos.jakubi...@sketchengine.co.uk> wrote:
>
>> Dear all,
>>
>> this is to announce public availability of word embedding model
>> calculated for large corpora that we have in Sketch Engine. At this moment,
>> we have processed corpora for following languages:
>>
>> English, Arabic, Chinese, Czech, Danish, French, German, Italian, Korean,
>> Portuguese, Russian, Spanish
>>
>> See https://embeddings.sketchengine.co.uk/ where you can find an online
>> interface for executing word similarity queries (such as the infamous
>> king-man+woman) and download the datasets. They can be used freely for
>> non-commercial purposes, for the commercial ones do not hesitate to get
>> back to me to work out a mutually suitable model of collaboration.
>>
>> We continue building further models as our spare computing capacity
>> allows, and will continue publishing them. If you are interested in a
>> particular language that is missing at this moment, let me know and I can
>> try to prioritise (no guarantees though).
>>
>> The embeddings were calculated using FastText with various parameters and
>> on various corpus attributes (word, lemma, lemma+PoS combination, lowercase
>> etc.)
>>
>> We have had increasing amount of requests to obtain corpora from Sketch
>> Engine for these purposes, so this is our response to that to support
>> research in this area.
>>
>> Cheers,
>> Milos Jakubicek
>>
>> CEO, Lexical Computing
>> Brno, CZ | Brighton, UK
>> http://www.lexicalcomputing.com
>> http://www.sketchengine.co.uk
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora@uib.no
>> https://mailman.uib.no/listinfo/corpora
>>
>>
>
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora@uib.no
https://mailman.uib.no/listinfo/corpora

Reply via email to