[Corpora-List] Albertina 1.5B + Gervásio 7B: collection of LLMs for Portuguese expanded

Antonio Branco via Corpora Tue, 05 Mar 2024 01:32:14 -0800



Good day,

This is to announce the expansion of the collection of open Large Language 
Models (LLMs)
for the Portuguese language with the following models:

- the family of *encoders* is enlarged with the new *_Albertina 1.5B_
*https://huggingface.co/PORTULAN/albertina-1b5-portuguese-ptpt-encoder

- the family of *decoders* has now _*Gervásio 7B*_
https://huggingface.co/PORTULAN/gervasio-7b-portuguese-ptpt-decoder


This ecosystem encompasses now over ten LLMs that were specifically developed 
for
the Portuguese language, covering both its European variant, spoken in Portugal 
(PTPT),
and its American variant, spoken in Brazil (PTBR), and that can be run
on consumer-grade hardware.

The Albertina family includes encoders with *100M*, *900M* and *1.5B* 
parameters.

The Gervásio family, in turn, integrates a decoder with *7B* parameters.


All these models are *fully open*, being open source and openly distributed,
for free and with no registration required, under an open license, including
for research and commercial purposes.

They are also *fully documented*, thus including reports also on their 
evaluation scores,
which indicate they are top performing solutions for fully open models of their 
class
for Portuguese.


These models, their companion datasets and their documentation, for both PTPT 
and PTBR,
can all be found at https://huggingface.co/PORTULAN


Regards,

António Branco

University of Lisbon
NLX Natural Language and Speech Group
Faculdade de Ciências, Departamento de Informática

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] Albertina 1.5B + Gervásio 7B: collection of LLMs for Portuguese expanded

Reply via email to