Dear all,

This is to announce the release of LX-DSemVectors, a set of semantic vector
models (aka word embeddings) for Portuguese, whose performance over standard test sets is on a par with the performance of similar, state of the art models for English.

We are announcing also the release of the respective test sets for Portuguese.


The word embeddings are developed over a very large corpus of 1.7 billion tokens.
The collection of models, containing a range of possible parametrizations,
and the collection of test sets are made available at:

https://github.com/nlx-group/lx-dsemvectors

Further versions of the word embeddings and further test sets will be released
through this channel as these will be completed.


The LX-DSemVectors are described in this recent paper:

Rodrigues, João, António Branco, Steven Neale and João Silva, 2016,
"LX-DSemVectors: Distributional Semantics Models for the Portuguese Language", Lecture Notes in Artificial Intelligence, 9727, Berlin, Springer, pp.259-270.

http://www.di.fc.ul.pt/~ahb/pubs/2016RodriguesBrancoNealeSilva.pdf



This and other data sets and tools for Portuguese developed in our team
can be accessed also from:
http://lxcenter.di.fc.ul.pt/datasets/en/index.html



Greetings,
João A. Rodrigues
University of Lisbon

NLX-Natural Language and Speech Group
http://nlxgroup.di.fc.ul.pt/









_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora@uib.no
http://mailman.uib.no/listinfo/corpora

Reply via email to