Dear all,

This is to announce the release of LX-DSemVectors, a set of semantic vector
models (aka word embeddings) for Portuguese, whose performance over standard test sets is on a par with the performance of similar, state of the art models for English.

We are announcing also the release of the respective test sets for Portuguese.

The word embeddings are developed over a very large corpus of 1.7 billion tokens.
The collection of models, containing a range of possible parametrizations,
and the collection of test sets are made available at:

Further versions of the word embeddings and further test sets will be released
through this channel as these will be completed.

The LX-DSemVectors are described in this recent paper:

Rodrigues, João, António Branco, Steven Neale and João Silva, 2016,
"LX-DSemVectors: Distributional Semantics Models for the Portuguese Language", Lecture Notes in Artificial Intelligence, 9727, Berlin, Springer, pp.259-270.

This and other data sets and tools for Portuguese developed in our team
can be accessed also from:

João A. Rodrigues
University of Lisbon

NLX-Natural Language and Speech Group

UNSUBSCRIBE from this page:
Corpora mailing list

Reply via email to