[Apertium-stuff] Fwd: [INFOLING] Recursos li ngüísticos: Corpus español de dominio púb lico de 120 millones de palabras

Hèctor Alòs i Font Mon, 10 Jan 2011 13:55:43 -0800

May be interesting.

-------- Mensaje original --------  Asunto: [INFOLING] Recursos
lingüísticos: Corpus español de dominio público de 120 millones de
palabras  Fecha:
Mon, 10 Jan 2011 21:29:22 +0100  De: INFOLING
<[email protected]>
<[email protected]>  Responder
a: INFOLING 
<[email protected]><[email protected]>
 Para:
[email protected]


INFOLING. Información global sobre lingüística hispánica:
http://infoling.org/

Moderadores: Carlos Subirats (UAB), Mar Cruz (UB)
Editoras: Paloma Garrido (U. Rey Juan Carlos), Laura Romero (UB)
Programación y desarrollo: Marc Ortega (UAB)
Directoras de reseñas: Alexandra Álvarez (U. Los Andes, Venezuela), Yvette
Bürki (U. Bern), María Luisa Calero (U. Córdoba, España)
Asesores: Isabel Verdaguer (UB), Gerd Wotjak (U. Leipzig)
Colaboradores: Antonio Ríos (UAB), Danica Salazar (UB)

Con el apoyo de:

   - Editorial Octaedro: http://www.octaedro.com/
    - Arco Libros: http://www.arcomuralla.com/Arco/Shop/default.asp


ISSN: 1576-3404
© Infoling 1996-2010. Reservados todos los derechos


------------------------------
*Recursos lingüísticos: *
Corpus español de dominio público de 120 millones de palabras
*URL:* 
http://www.lsi.upc.edu/~nlp/wikicorpus/<http://www.lsi.upc.edu/%7Enlp/wikicorpus/>
*Información de:* Infoling List <[email protected]><[email protected]>
------------------------------

*Descripción*

Wikicorpus, v. 1.0: Spanish, English, and Catalan portions of the Wikipedia.

The Wikicorpus is a trilingual corpus (Spanish, English, Catalan) that
contains large portions of the Wikipedia (based on a 2006 dump) and has been
automatically enriched with linguistic information. In its present version,
it contains over 750 million words.

The corpora have been annotated with lemma and part of speech information
using the open source library FreeLing. Also, they have been sense annotated
with the state of the art Word Sense Disambiguation algorithm UKB. As UKB
assigns WordNet senses, and WordNet has been aligned across languages via
the InterLingual Index, this sort of annotation opens the way to massive
explorations in lexical semantics that were not possible before.

Moreover, we also provide an open source Java-based parser for Wikipedia
pages developed for the construction of the corpus.

*Área temática:* Lingüística de corpus

*Información en la web de Infoling:*
 http://www.infoling.org/informacion/RecursoL29.html

------------------------------------------------------------------------------
Accelerate I/O Performance of HDD-Based Arrays
Configure SSDs as a secondary tier of high performance
cache to maximize transactional I/O performance while
minimizing investments in SSD technology.
http://p.sf.net/sfu/infohub-sdnews

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] Fwd: [INFOLING] Recursos li ngüísticos: Corpus español de dominio púb lico de 120 millones de palabras

Reply via email to