[ideoL] =?iso-8859-1?Q?El_tama=F1o_del_lexic=F3n_en_sociedades_agrafas?=

Davius Sanctex Sat, 14 Jun 2003 17:52:44 -0700

On the size of the lexicon in preliterate languagesHola a todos,

Un tema que me resulta particularmente interesante y que tiene ciertas conexiones 
ideolingï¿½ï¿½sticas [ya que todos los ideolingï¿½istas que conozco saben leer y escribir 
;-)] es como se ve afectado nuestro conocimiento del lenguaje por el conocimiento de 
la escritura.


Piaget, explica por ejemplo que los miembros de la comunidad no-alfabï¿½tica de Golah, 
en Liberia, ignoraban por completo el hecho de que su lengua consistiera en palabras, 
la ï¿½nica unidad de la que eran conscientes era la oraciï¿½n (que tiene un sentido 
completo), pero en su vida cotidiana jamï¿½s se referï¿½an a la existencia de palabras o 
formas lï¿½xicas combinables. El conocimiento metalingï¿½ï¿½stico que poseen muchas 
sociedades preestatales, coincide bastante bien con el que tienen los niï¿½os a la edad 
de 5 o 6 aï¿½os, asï¿½ que como sugieren varios trabajos, es el conocimiento de la 
escritura lo que dispara el conciencia de las estructuras lingï¿½ï¿½sticas en los adultos. 
Asï¿½ mismo con frecuencia en estas sociedades solo se distingue entre significado y 
sonido (entre los Eipo de las tierras altas de Irian Jaya, la misma palabra /yupe/ 
designa un sonido, como una palabra, como una frase, como la lengua de un pueblo, 
mientras que /nonge/ 'parte principal de una cosa, parte carnosa de un fruto' designa 
lo que una palabra puede significar y no existen muchas distinciones mï¿½s que esas. Sin 
embargo cuando los Eipo se encuentran con alguna otra comunidad que habla una lengua 
emparentada con la suya en seguida son capaces de predecir la forma de los cognados 
(eso tambiï¿½n pasa entre nosotros, aï¿½n los que no hablamos italiano no tenemos 
dificultados en acertar a travï¿½s de las correspondencias fonï¿½ticas cual es el 
equivalente italiano de muchas de nuestras palabras).

Sobre el inventario de palabras que pueden estar vivas en una comunidad preestatal no 
tenï¿½a datos, pero Andew Pawley, que al parecer trabajï¿½ sobre el Kalam (una lengua muy 
interesante sobre la que hablï¿½ hace algï¿½n tiempo en esta lista) enviï¿½ un mensaje muy 
interesante sobre la cuestiï¿½n del tamaï¿½o del lexicon en sociedades preestatales (el 
mensaje estï¿½ en inglï¿½s, siento no haber tenido tiempo o ganas de traducirlo!)

Davius Sanctex

 ____________________

From: Andy Pawley 
Cc: [EMAIL PROTECTED]
Sent: Wednesday, June 11, 2003 5:50 AM
Subject: [Papuanlanguages] On the size of the lexicon in preliterate languages


Dear Jim (if I may)


I understand your concern to be getting an idea of the size of the 'indigenous' 
lexicon in languages of preliterate societies.


I can tell you something about estimates based on the better dictionaries for 
'preliterate' languages of the Austronesian family and the Trans New Guinea (the 
largest Papuan) family.


But first some methodological considerations. We can't make useful comparisons without 
agreeing on the basic units to be counted. Defining terms such as 'lexical unit' and 
'lexeme' is, as you indicate, crucial to estimating the size of the lexicon.


Like D.A. Cruse in his book Lexical Semantics, I regard the basic lexical unit as the 
pairing of a form with a single sense.  Just counting 'lexical entries' or 'headwords' 
is highly unsatisfactory -- different dictionaries may organise entries on radically 
different principles so that counts of entries or headwords will not be commensurate. 
A polysemous root like run, take or head  consists of many sense units and each such 
unit has to be learnt separately.  A family of sense units forms a lexeme. One can in 
turn recognise a family of lexemes (related by derivation, compounding, etc.) which 
some dictionaries will include in a single entry and others will not. 


Given that the 10 most polysemous verb roots in English total 552 senses between them 
in the Macquarie Dictionary (many more in the OED, but that includes obsolete senses), 
and the top 200 verb roots total over 3000 senses, you can see that a count of sense 
units will yield a much larger larger lexicon than a count of lexemes. Comparison is 
further complicated by the fact that different languages seem to have different 
amounts of polysemy. (It is true that there is some fuzziness in boundaries between 
sense units but there are tests for polysemy that work most of the time.)


There are other considerations. Just counting single-word lexical units will result in 
an estimate that is far too low. In most, probably all languages much of the lexicon 
consists of compounds and phrasal units.  Estimating the size of the multi-word 
lexicon as opposed to the single word lexicon can't be done by a simple general 
formula because languages vary  considerably in how much use they make of compounding 
and phrasal units.


Defining the boundary between inflection and derivation and whether to count inflected 
forms is another issue. I think most of us agree that we should not count regular 
inflected forms but we should count irregular ones.  Another variable is the treatment 
of dialect variants. Some dictionaries represent a single regional dialect, others 
include material from a number of dialects.  And so on.   


Anyway, my own experience of attempting to compile comprehensive dictionaries is 
limited to one Austronesian language (Wayan Fijian) and one Trans New Guinea language 
(Kalam). I've been toiling at both for over 30 years, off and on. 


Wayan is a dialect of the Western Fijian language spoken by a farming and fishing 
community of about 1500 people.  The Wayan-English dictionary (1000 pages) contains 
around 35,000 sense units, of which probably not more than 3 percent would be 
loanwords from non-Fijian languages. I haven't done a sampling of lexemes but at a 
guess there are around 20 to 25,000. For sure, I have missed many thousands of 
multiword units and probably some thousands of derived words, as well as many foreign 
words and phrases that are more or less integrated into Wayans' speech repertoires. 


Kalam is spoken by a farming people on the fringes of the New Guinea Highlands. At 
first European contact (in the 1950s and 60s) there were about 13,000 Kalam, though 
these divided into several regional dialects.  The Kalam-English dictionary is smaller 
than the Wayan one, containing about 15,000 sense units. Why is it smaller? Mainly I 
think because Kalam doesn't have such a rich verbal derivational system as Wayan and 
because, unlike Wayan, it cannot derive verb roots from nouns and vice versa.


In her 1998 PhD thesis on problems in Tongan lexicography Melenaite Taumoefolau made 
counts of the number of entries in the largest dictionaries of Polynesian languages 
(Maori, Hawaiian, Tongan, Samoan). As I recall it, these ranged from 19,000 to 23,000. 
These figures don't tell us the number of basic lexical units (in my sense) but they 
indicate that these four dictionaries probably each contains on the order of 30 to 
50,000 lexical units.


All of which suggests that your historical linguist friends who said 50,000 were 
talking more sense (no pun intended) than those talking 3000.


Of some interest are the inventories for specialised semantic domains. Kalam has over 
1200 terms for plant taxa, Wayan has 600-700. The Kalam have a richer flora (Waya is a 
small island) and make wider use of it than contemporary Wayans, who are more 
westernised.  Comparative ethnobotanical data indicate that preliterate language 
communities generally have over 1000 terms for plants, provided they live in a place 
with a rich flora.  The Wayans exploit a rich marine environment and distinguish over 
400 fish taxa, 140 mollusc taxa and about 40 crustacean taxa. Other studies show that 
Pacific Island fishing communities consistently distinguish well over 300 fish taxa, 
except for small very remote islands where there are fewer fish.  The Kalam on the 
other hand are great on land animals and distinguish some 230 bird taxa, over 40 
mammals (mainly marsupials), 35 frogs and over 100 creepy crawly taxa. I would expect 
other New Guinea Highland peoples to pattern pretty much like Kalam.


I'll post this note on the Austronesian Languages and Papuan Languages lists to see if 
any of my colleagues there have opinions.


Andy Pawley
Linguistics Dept, RSPAS
Australian National University  






 


  Malcolm (or whomever is taking this)

  For some time I have been trying to establish ball park figures for the size of the 
lexicon of unwritten languages, i.e. languages that will not be full of learned 
European loans etc. and I have been getting estimates from historical linguists that 
range beyond a single order of magnitude (3,000 to 50,000). If there is a reliable 
source out there that covers such could you let me know. Otherwise, could this be 
asked around. I do appreciate how difficult this is to estimate especially given the 
problem of defining lexemes but some form of general order of magnitude would be 
useful.

  Jim
  _______________________________________________
  Papuanlanguages mailing list
  [EMAIL PROTECTED]
  http://mailman.anu.edu.au/mailman/listinfo/papuanlanguages



[Se han eliminado los trozos de este mensaje que no contenï¿½an texto]


--------------------------------------------------------------------
IdeoLengua - Lista de Lingï¿½istica e Idiomas Artificiales
Suscrï¿½base en [EMAIL PROTECTED]
Informacion en http://ideolengua.cjb.net
Desglose temï¿½tico 
http://groups.yahoo.com/group/ideolengua/files/Administracion/top-ideol.htm


 

Su uso de Yahoo! Grupos estï¿½ sujeto a las http://e1.docs.yahoo.com/info/utos.html

[ideoL] =?iso-8859-1?Q?El_tama=F1o_del_lexic=F3n_en_sociedades_agrafas?=

Responder a