We are very happy to announce the twenty-second release of annotated
treebanks in Universal Dependencies, v2.16, available at
https://universaldependencies.org/.
Universal Dependencies is a project that seeks to develop
cross-linguistically consistent treebank annotation for many languages
with the goal of facilitating multilingual parser development,
cross-lingual learning, and parsing research from a language typology
perspective (de Marneffe et al., 2021; Nivre et al., 2020). The
annotation scheme is based on (universal) Stanford dependencies (de
Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags
(Petrov et al., 2012), and the Interset interlingua for morphosyntactic
tagsets (Zeman, 2008). The general philosophy is to provide a universal
inventory of categories and guidelines to facilitate consistent
annotation of similar constructions across languages, while allowing
language-specific extensions when necessary.
The *319* treebanks in v2.16 are annotated according to version 2 of the
UD guidelines and represent the following *179* languages: Abaza,
Abkhaz, Afrikaans, Akkadian, Akuntsu, Albanian, Alemannic, Amharic,
Ancient Greek, Ancient Hebrew, Apurina, Arabic, Armenian, Assyrian,
Azerbaijani, Bambara, Basque, Bavarian, Beja, Belarusian, Bengali,
Bhojpuri, Bokota, Bororo, Breton, Bulgarian, Buryat, Cantonese,
Cappadocian, Catalan, Cebuano, Chinese, Chukchi, Classical Armenian,
Classical Chinese, Coptic, Croatian, Czech, Danish, Dutch, Egyptian,
English, Erzya, Esperanto, Estonian, Faroese, Finnish, French, Frisian
Dutch, Galician, Georgian, German, Gheg, Gothic, Greek, Guajajara,
Guarani, Gujarati, Gwichin, Haitian Creole, Hausa, Hebrew, Highland
Puebla Nahuatl, Hindi, Hittite, Hungarian, Icelandic, Ika, Indonesian,
Irish, Italian, Japanese, Javanese, Kaapor, Kangri, Karelian, Karo,
Kazakh, Khoekhoe, Khunsari, Kiche, Komi Permyak, Komi Zyrian, Korean,
Kurmanji, Kyrgyz, Latgalian, Latin, Latvian, Ligurian, Lithuanian,
Livvi, Low Saxon, Luxembourgish, Macedonian, Madi, Maghrebi Arabic
French, Makurap, Malayalam, Maltese, Manx, Marathi, Mbya Guarani, Middle
French, Moksha, Munduruku, Naga, Naija, Nayini, Neapolitan, Nenets,
Nheengatu, North Sami, Northwest Gbaya, Norwegian, Occitan, Odia, Old
Church Slavonic, Old East Slavic, Old English, Old French, Old Irish,
Old Turkish, Ottoman Turkish, Pashto, Paumari, Persian, Pesh, Phrygian,
Polish, Pomak, Portuguese, Romanian, Russian, Sanskrit, Scottish Gaelic,
Serbian, Sindhi, Sinhala, Skolt Sami, Slovak, Slovenian, Soi, South
Levantine Arabic, Spanish, Spanish Sign Language, Swedish, Swedish Sign
Language, Tagalog, Tamil, Tatar, Teko, Telugu, Telugu English, Thai,
Tswana, Tupinamba, Turkish, Turkish English, Turkish German, Ukrainian,
Umbrian, Upper Sorbian, Urdu, Uyghur, Uzbek, Veps, Vietnamese, Warlpiri,
Welsh, Western Armenian, Western Sierra Puebla Nahuatl, Wolof, Xavante,
Xibe, Yakut, Yoruba, Yupik and Zaar. The 179 languages belong to *35*
families: Afro-Asiatic, Arawakan, Arawan, Austro-Asiatic, Austronesian,
Basque, Bororoan, Chibchan, Chukotko-Kamchatkan, Code switching,
Constructed, Creole, Dravidian, Eskimo-Aleut, Indo-European, Japanese,
Kartvelian, Khoe-Kwadi, Korean, Macro-Je, Mande, Mayan, Mongolic,
Na-Dene, Niger-Congo, Northwest Caucasian, Pama-Nyungan, Sign Language,
Sino-Tibetan, Tai-Kadai, Tungusic, Tupian, Turkic, Uralic and
Uto-Aztecan. Depending on the language, the treebanks range in size from
less than 1,000 tokens to over 3 million tokens. We expect the next
release to be available in November 2025.
The size of the following 48 treebanks changed significantly since the
last release:
Abkhaz AbNC : 6363 → 9652
Alemannic UZH : 0 → 1444
Ancient Hebrew PTNK : 39036 → 90770
Azerbaijani TueCL : 663 → 912
Bokota ChibErgIS : 0 → 2713
Bororo BDT : 6993 → 160356
Classical Armenian CAVaL : 88009 → 99663
Coptic Bohairic : 0 → 32724
Czech PDT : 1506486 → 0
Czech PDTC : 0 → 3440052
Egyptian UJaen : 14650 → 21927
English CHILDES : 0 → 226470
English GUM : 211920 → 233926
English LinES : 94217 → 106305
Esperanto Cairo : 0 → 177
Esperanto Prago : 0 → 839
French ALTS : 0 → 43832
Georgian GNC : 0 → 18747
Greek Cretan : 0 → 4351
Greek Lesbian : 0 → 3333
Haitian Creole Adolphe : 0 → 71734
Ika ChibErgIS : 0 → 3706
Khoekhoe KDT : 0 → 29007
Korean KSL : 66989 → 108072
Korean LittlePrince : 0 → 13656
Kyrgyz TueCL : 1001 → 1250
Latin CIRCSE : 18968 → 24899
Middle French PROFITEROLE: 12025 → 68454
Naga Suansu : 0 → 3123
Neapolitan RB : 10 → 199
Nenets Tundra : 0 → 651
Nheengatu CompLin : 19278 → 21813
Northwest Gbaya Autogramm: 2417 → 2693
Occitan CorAG : 0 → 37585
Occitan TTB : 0 → 25619
Odia ODTB : 0 → 1029
Old English Cairo : 0 → 171
Ottoman Turkish DUDU : 813 → 10287
Pashto Sikaram : 995 → 2515
Pesh ChibErgIS : 2508 → 4275
Russian Taiga : 197001 → 1758939
Sindhi Isra : 0 → 15741
Swedish LinES : 90961 → 102538
Turkish English BUTR : 0 → 393
Turkish TueCL : 0 → 904
Ukrainian ParlaMint : 51997 → 84189
Uzbek TueCL : 0 → 939
Xavante XDT : 1740 → 2234
In total, the new release contains *2,263,318* sentences, 36437487
surface tokens and *37,158,675* syntactic words.
Daniel Zeman, Joakim Nivre, Mitchell Abrams, Elia Ackermann, Jephtey
Adolphe, Noëmi Aepli, Hamid Aghaei, Željko Agić, Amir Ahmadi, Lars
Ahrenberg, Chika Kennedy Ajede, Arofat Akhundjanova, Furkan Akkurt,
Gabrielė Aleksandravičiūtė, Ika Alfina, Avner Algom, Khalid Alnajjar,
Chiara Alzetta, Antonios Anastasopoulos, Erik Andersen, Matthew Andrews,
Lene Antonsen, Tatsuya Aoyama, Katya Aplonova, Angelina Aquino, Carolina
Aragon, Glyd Aranes, Maria Jesus Aranzabe, Bilge Nas Arıcan, Þórunn
Arnardóttir, Gashaw Arutie, Jessica Naraiswari Arwidarasti, Masayuki
Asahara, Katla Ásgeirsdóttir, Deniz Baran Aslan, Cengiz Asmazoğlu, Luma
Ateyah, Furkan Atmaca, Mohammed Attia, Aitziber Atutxa, Liesbeth
Augustinus, Mariana Avelãs, Elena Badmaeva, Jana Bajorat, Keerthana
Balasubramani, Miguel Ballesteros, Esha Banerjee, Sebastian Bank, Bryan
Khelven da Silva Barbosa, Verginica Barbu Mititelu, Starkaður Barkarson,
Rodolfo Basile, Victoria Basmov, Colin Batchelor, John Bauer, Seyyit
Talha Bedir, Shabnam Behzad, Juan Belieni, Alevtina Bémová, Kepa
Bengoetxea, İbrahim Benli, Yifat Ben Moshe, Marie Benzerrak, Ansu Berg,
Gözde Berk, Riyaz Ahmad Bhat, Erica Biagetti, Eckhard Bick, Agnė
Bielinskienė, Esma Fatıma Bilgin Taşdemir, Helin Binici, Kristín
Bjarnadóttir, Verena Blaschke, Rogier Blokland, Nina Böbel, Victoria
Bobicev, Loïc Boizou, Stavros Bompolas, Johnatan Bonilla, Emanuel Borges
Völker, Carl Börstell, Cristina Bosco, Gosse Bouma, Sam Bowman, Adriane
Boyd, Anouck Braggaar, António Branco, Myriam Bras, Kristina Brokaitė,
Lanni Bu, Eva Buráňová, Aljoscha Burchardt, Carmen Cabeza, Natalia
Cáceres Arandia, Marisa Campos, Marie Candito, Bernard Caron, Gauthier
Caron, Catarina Carvalheiro, Rita Carvalho, Lauren Cassidy, Maria Clara
Castro, Sérgio Castro, Tatiana Cavalcanti, Gülşen Cebiroğlu Eryiğit,
Flavio Massimiliano Cecchini, Giuseppe G. A. Celano, Anila Çepani,
Slavomír Čéplö, Neslihan Cesur, Savas Cetin, Özlem Çetinoğlu, Fabricio
Chalub, Liyanage Chamila, Claudine Chamoreau, Shweta Chauhan, Yifei
Chen, Ethan Chi, Taishi Chika, Yongseok Cho, Jinho Choi, Bermet
Chontaeva, Jayeol Chun, Juyeon Chung, Alessandra T. Cignarella, Silvie
Cinková, Aurélie Collomb, Çağrı Çöltekin, Miriam Connor, Claudia
Corbetta, Daniela Corbetta, Francisco Costa, Marine Courtin, Benoît
Crabbé, Mihaela Cristescu, Vladimir Cvetkoski, Netanel Dahan, Ingerid
Løyning Dale, Philemon Daniel, Khensa Daoudi, Bijayalaxmi Dash, Satya
Ranjan Dash, Elizabeth Davidson, Leonel Figueiredo de Alencar, Mathieu
Dehouck, Martina de Laurentiis, Marie-Catherine de Marneffe, Ahmet
Demir, Valeria de Paiva, Mehmet Oguz Derin, Elvis de Souza, Arantza Diaz
de Ilarraza, Roberto Antonio Díaz Hernández, Carly Dickerson, Ariani Di
Felippo, Arawinda Dinakaramani, Elisa Di Nuovo, Bamba Dione, Peter
Dirix, Hoa Do, Kaja Dobrovoljc, Caroline Döhmer, Adrian Doyle, Timothy
Dozat, Kira Droganova, Magali Sanches Duran, Puneet Dwivedi, Christian
Ebert, Hanne Eckhoff, Masaki Eguchi, Sandra Eiche, Roald Eiselen,
Marhaba Eli, Ali Elkahky, Binyam Ephrem, Olga Erina, Tomaž Erjavec,
Louise Esher, Soudabeh Eslami, Farah Essaidi, Aline Etienne, Wograine
Evelyn, Sidney Facundes, Richárd Farkas, Ján Faryad, Federica Favero,
Jannatul Ferdaousi, Marília Fernanda, Hector Fernandez Alcalde, Amal
Fethi, Jennifer Foster, Barbara Francioni, Theodorus Fransen, Cláudia
Freitas, Kazunori Fujita, Katarína Gajdošová, Daniel Galbraith, Edith
Galy, Federica Gamba, Marcos Garcia, José María García-Miguel, Moa
Gärdenfors, Tanja Gaustad, Efe Eren Genç, Fabrício Ferraz Gerardi, Kim
Gerdes, Luke Gessler, Filip Ginter, Gustavo Godoy, Iakes Goenaga, Koldo
Gojenola, Memduh Gökırmak, Yoav Goldberg, Gili Goldin, Xavier Gómez
Guinovart, Berta González Saavedra, Bernadeta Griciūtė, Matias Grioni,
Loïc Grobol, Normunds Grūzītis, Bruno Guillaume, Kirian Guiller, Céline
Guillot-Barbance, Tunga Güngör, Vladimir Gurevich, Nizar Habash, Hinrik
Hafsteinsson, Michael Hahn, Jan Hajič, Jan Hajič jr., Eva Hajičová, Mika
Hämäläinen, Linh Hà Mỹ, Na-Rae Han, Muhammad Yudistira Hanifmuti,
Takahiro Harada, Sam Hardwick, Kim Harris, Naïma Hassert, Dag Haug, Jiří
Havelka, Johannes Heinecke, Oliver Hellwig, Felix Hennig, Barbora
Hladká, Jaroslava Hlaváčová, Florinel Hociung, Diana Hoefels, Petter
Hohle, Nick Howell, Yidi Huang, Marivel Huerta Mendez, Jena Hwang,
Takumi Ikeda, Inessa Iliadou, Anton Karl Ingason, Radu Ion, Elena
Irimia, Ọlájídé Ishola, Artan Islamaj, Kaoru Ito, Federica Iurescia,
Jessica K. Ivani, Sandra Jagodzińska, Siratun Jannat, Tomáš Jelínek,
Apoorva Jha, Katharine Jiang, Sylvanus Job, Mayank Jobanputra, Anders
Johannsen, Hildur Jónsdóttir, Fredrik Jørgensen, Zhuoxuan Ju, Markus
Juutinen, Hüner Kaşıkara, Nadezhda Kabaeva, Sylvain Kahane, Hiroshi
Kanayama, Jenna Kanerva, Neslihan Kara, Ritván Karahóǧa, Jiří Kárník,
Andre Kåsen, Tolga Kayadelen, Sarveswaran Kengatharaiyer, Václava
Kettnerová, Lilit Kharatyan, Jesse Kirchner, Elena Klementieva, Elena
Klyachko, Petr Kocharov, Arne Köhn, Abdullatif Köksal, Veronika
Kolářová, Kamil Kopacewicz, Timo Korkiakangas, Mehmet Köse, Alexey
Koshevoy, Nelda Kote, Natalia Kotsyba, Barbara Kovačić, Jolanta
Kovalevskaitė, Emmanuelle Kowner, Simon Krek, Parameswari Krishnamurthy,
Sandra Kübler, Lucie Kučová, Adrian Kuqi, Oğuzhan Kuyrukçu, Aslı Kuzgun,
Sookyoung Kwak, Kris Kyle, Käbi Laan, Veronika Laippala, Lorenzo
Lambertino, Israel Landau, Tatiana Lando, Septina Dian Larasati, Pierre
Larrivée, Alexei Lavrentiev, John Lee, Phương Lê Hồng, Alessandro Lenci,
Saran Lertpradit, Herman Leung, Maria Levina, Lauren Levine, Cheuk Ying
Li, Josie Li, Keying Li, Yixuan Li, Yuan Li, KyungTae Lim, Bruna Lima
Padovani, Yi-Ju Jessica Lin, Krister Lindén, Yang Janet Liu, Zoey Liu,
Nikola Ljubešić, Irina Lobzhanidze, Olga Loginova, Markéta Lopatková,
Lucelene Lopes, Edita Luftiu, Arsenii Lukashevskyi, Stefano Lusito,
Anne-Marie Lutgen, Andry Luthfi, Mikko Luukko, Olga Lyashevskaya, Teresa
Lynn, Vivien Macketanz, Menel Mahamdi, Jean Maillard, Ilya Makarchuk,
Aibek Makazhanov, Francesco Mambrini, Michael Mandl, Christopher
Manning, Ruli Manurung, Büşra Marşan, Cătălina Mărănduc, David Mareček,
Katrin Marheinecke, Stella Markantonatou, Héctor Martínez Alonso, Lorena
Martín Rodríguez, André Martins, Cláudia Martins, Jan Mašek, Hiroshi
Matsuda, Yuji Matsumoto, Alessandro Mazzei, Ryan McDonald, Sarah
McGuinness, Maitrey Mehta, Pierre André Ménard, Gustavo Mendonça, Hilla
Merhav, Tatiana Merzhevich, Paul Meurer, Niko Miekka, Marie Mikulová,
Emilia Milano, Aleksandra Miletić, Aaron Miller, Junghyun Min, Yael
Minerbi, Jiří Mírovský, Karina Mischenkova, Anna Missilä, Cătălin
Mititelu, Maria Mitrofan, Yusuke Miyao, Biswakalpita Mohapatra,
AmirHossein Mojiri Foroushani, Judit Molnár, Amirsaeid Moloodi,
Simonetta Montemagni, Amir More, Laura Moreno Romero, Giovanni Moretti,
Shinsuke Mori, Tomohiko Morioka, Shigeki Moro, Bjartur Mortensen, Bohdan
Moskalevskyi, Kadri Muischnek, Robert Munro, Yugo Murawaki, Nikolett
Mus, Kaili Müürisep, Pinkey Nainwani, Mariam Nakhlé, Juan Ignacio
Navarro Horñiacek, Anna Nedoluzhko, Gunta Nešpore-Bērzkalne, Manuela
Nevaci, Lương Nguyễn Thị, Huyền Nguyễn Thị Minh, Yoshihiro Nikaido,
Vitaly Nikolaev, Rattima Nitisaroj, Victor Norrman, Alireza Nourian,
Michal Novák, Maria das Graças Volpe Nunes, Hanna Nurmi, Stina Ojala,
Atul Kr. Ojha, Hulda Óladóttir, Adédayọ̀ Olúòkun, Mai Omura, Emeka
Onwuegbuzia, Noam Ordan, Petya Osenova, Robert Östling, Annika Ott,
Lilja Øvrelid, Masanori Oya, Şaziye Betül Özateş, Merve Özçelik, Arzucan
Özgür, Balkız Öztürk Başaran, Teresa Paccosi, Petr Pajas, Alessio
Palmero Aprosio, Jarmila Panevová, Anastasia Panova, Thiago Alexandre
Salgueiro Pardo, Shantipriya Parida, Hyunji Hayley Park, Niko Partanen,
Elena Pascual, Marco Passarotti, Agnieszka Patejuk, Guilherme
Paulino-Passos, Giulia Pedonese, Oggi Peeters, Angelika Peljak-Łapińska,
Siyao Peng, Siyao Logan Peng, Rita Pereira, Sílvia Pereira,
Cenel-Augusto Perez, Natalia Perkova, Guy Perrier, Slav Petrov, Daria
Petrova, Andrea Peverelli, Jason Phelan, Claudel Pierre-Louis, Jussi
Piitulainen, Yuval Pinter, Clara Pinto, Rodrigo Pintucci, Tommi A
Pirinen, Emily Pitler, Magdalena Plamada, Barbara Plank, Alistair Plum,
Thierry Poibeau, Larisa Ponomareva, Martin Popel, Clamença Poujade,
Lauma Pretkalniņa, Rigardt Pretorius, Sophie Prévost, Prokopis
Prokopidis, Adam Przepiórkowski, Robert Pugh, Tiina Puolakainen,
Christoph Purschke, Sampo Pyysalo, Peng Qi, Andreia Querido, Andriela
Rääbis, Ella Rabinovich, Alexandre Rademaker, Mutee-u Rahman, Mizanur
Rahoman, Taraka Rama, Loganathan Ramasamy, Carlos Ramisch, Joana Ramos,
Fam Rashel, Mohammad Sadegh Rasooli, Vinit Ravishankar, Livy Real, Petru
Rebeja, Siva Reddy, Mathilde Regnault, Georg Rehm, Arij Riabi, Ivan
Riabov, Michael Rießler, Erika Rimkutė, Larissa Rinaldi, Laura Rituma,
Putri Rizqiyah, Luisa Rocha, Eiríkur Rögnvaldsson, Ivan Roksandic,
Norton Trevisan Roman, Mykhailo Romanenko, Natalia Romanova, Rudolf
Rosa, Valentin Roșca, Paulette Roulon, Davide Rovati, Ben Rozonoyer,
Olga Rudina, Jack Rueter, Paolo Ruffolo, Kristján Rúnarsson, Rozana
Rushiti, Shoval Sadde, Pegah Safari, Aleksi Sahala, Kalyanamalini Sahoo,
Saraswati Sahoo, Shadi Saleh, Alessio Salomoni, Tanja Samardžić,
Konstantinos Sampanis, Stephanie Samson, Xulia Sánchez-Rodríguez,
Manuela Sanguinetti, Ezgi Sanıyar, Dage Särg, Marta Sartor, Albina
Sarymsakova, Mitsuya Sasaki, Baiba Saulīte, Agata Savary, Yanin
Sawanakunanon, Shefali Saxena, Kevin Scannell, Salvatore Scarlata,
Emmanuel Schang, Nathan Schneider, Sebastian Schuster, Lane Schwartz,
Djamé Seddah, Wolfgang Seeker, Sven Sellmer, Mojgan Seraji, Magda
Ševčíková, Petr Sgall, Syeda Shahzadi, Mo Shen, Atsuko Shimada, Gyu-Ho
Shin, Hiroyuki Shirasu, Yana Shishkina, Muh Shohibussirri, Maria
Shvedova, Jean Sibille, Janine Siewert, Einar Freyr Sigurðsson, João
Silva, Aline Silveira, Natalia Silveira, Sara Silveira, Maria Simi, Radu
Simionescu, Katalin Simkó, Mária Šimková, Haukur Barri Símonarson, Kiril
Simov, Dmitri Sitchinava, Ted Sither, Aaron Smith, Isabela
Soares-Bastos, Per Erik Solberg, Dolores Sollberger, Barbara
Sonnenhauser, Shafi Sourov, Nina Speransky, Rachele Sprugnoli, Vivian
Stamou, Steinþór Steingrímsson, Antonio Stella, Jan Štěpánek, Barbora
Štěpánková, Abishek Stephen, Milan Straka, Omer Strass, Emmett
Strickland, Jana Strnadová, Alane Suhr, Yogi Lesmana Sulestio, Umut
Sulubacak, Hakyung Sung, Shingo Suzuki, Daniel Swanson, Zsolt Szántó,
Chihiro Taguchi, Dima Taji, Luigi Talamo, Fabio Tamburini, Mary Ann C.
Tan, Takaaki Tanaka, Dipta Tanaya, Mirko Tavoni, Nursena Teker, Samson
Tella, Isabelle Tellier, Marinella Testori, Guillaume Thomas, Tarık Emre
Tıraş, Thea Tollersrud, Sara Tonelli, Liisi Torga, Lucas Toribio,
Marsida Toska, Trond Trosterud, Anna Trukhina, Reut Tsarfaty, Kira
Tulchynska, Utku Türk, Francis Tyers, Sveinbjörn Þórðarson, Vilhjálmur
Þorsteinsson, Sumire Uematsu, Roman Untilov, Zdeňka Urešová, Larraitz
Uria, Hans Uszkoreit, Andrius Utka, Elena Vagnoni, Sowmya Vajjala,
Socrates Vak, Socrates Vakirtzian, Rob van der Goot, Martine Vanhove,
Daniel van Niekerk, Gertjan van Noord, Viktor Varga, Uliana Vedenina,
Giulia Venturi, Marianne Vergez-Couret, Barbora Vidová Hladká, Eric
Villemonte de la Clergerie, Veronika Vincze, Anishka Vissamsetty,
Natalia Vlasova, Eleni Vligouridou, Aya Wakasa, Joel C. Wallenberg, Lars
Wallin, Abigail Walsh, John Wang, Jonathan North Washington, Leonie
Weissweiler, Maximilan Wendt, Paul Widmer, Shira Wigderson, Sri Hartati
Wijono, Vanessa Berwanger Wille, Seyi Williams, Miriam Winkler, Shuly
Wintner, Mats Wirén, Christian Wittern, Alena Witzlack-Makarevich,
Tsegay Woldemariam, Tak-sum Wong, Alina Wróblewska, Qishen Wu, Mary
Yako, Kayo Yamashita, Naoki Yamazaki, Chunxiao Yan, Xiulin Yang, Koichi
Yasuoka, Marat M. Yavrumyan, Arife Betül Yenice, Enes Yılandiloğlu,
Olcay Taner Yıldız, Zhuoran Yu, Arlisa Yuliawati, Zdeněk Žabokrtský,
Shorouq Zahra, Amir Zeldes, He Zhou, Hanzhi Zhu, Yilun Zhu, Anna
Zhuravleva, Rayan Ziane, Artūrs Znotiņš
References
Marie-Catherine de Marneffe, Christopher Manning, Joakim Nivre, Daniel
Zeman. 2021. Universal Dependencies. In Computational Linguistics 47:2,
pp. 255–308.
Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajič,
Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis
Tyers, Daniel Zeman. 2020. Universal Dependencies v2: An Evergrowing
Multilingual Treebank Collection. In Proceedings of LREC.
--------------------------------------------------------------------------------
Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D.
Manning. 2006. Generating typed dependency parses from phrase structure
parses. In Proceedings of LREC.
Marie-Catherine de Marneffe and Christopher D. Manning. 2008. The
Stanford typed dependencies representation. In COLING Workshop on
Cross-framework and Cross-domain Parser Evaluation.
Marie-Catherine de Marneffe, Timothy Dozat, Natalia Silveira, Katri
Haverinen, Filip Ginter, Joakim Nivre, and Christopher Manning. 2014.
Universal Stanford Dependencies: A cross-linguistic typology. In
Proceedings of LREC.
Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg,
Jan Hajič, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo
Pyysalo, Natalia Silveira, Reut Tsarfaty, Daniel Zeman. 2016. Universal
Dependencies v1: A Multilingual Treebank Collection. In Proceedings of LREC.
Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A universal
part-of-speech tagset. In Proceedings of LREC.
Daniel Zeman. 2008. Reusable Tagset Conversion Using Tagset Drivers. In
Proceedings of LREC.
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]