[conlang] Digest Number 9282

conlang Sat, 11 May 2013 06:49:04 -0700

There are 6 messages in this issue.

Topics in this digest:

1a. Re: Typical lexicon size in natlangs    
    From: R A Brown
1b. Re: Typical lexicon size in natlangs    
    From: Padraic Brown
1c. Re: Typical lexicon size in natlangs    
    From: Jim Henry
1d. Re: Typical lexicon size in natlangs    
    From: Tristan

2a. Conaccents.    
    From: Leonardo Castro
2b. Re: Conaccents.    
    From: Nina-Kristine Johnson

Messages
________________________________________________________________________
1a. Re: Typical lexicon size in natlangs
    Posted by: "R A Brown" [email protected] 
    Date: Sat May 11, 2013 12:15 am ((PDT))

On 11/05/2013 06:42, H. S. Teoh wrote:
> On Fri, May 10, 2013 at 09:28:33PM -0700, Padraic Brown
> wrote:
>> --- On Fri, 5/10/13, H. S. Teoh wrote:
>>
>>> What's the typical lexicon size of a natlang? What's
>>>  the smallest known lexicon size of a natlang? The
>>> largest?
>>
>> This line of inquiry was discussed here a couple months
>> ago. I was chided for suggesting that in order to
>> discover which language has the largest lexicon you
>> actually have to count all the words in the lexicon.
>> And also for suggesting that "all the words" means ALL
>>  THE WORDS. Was also chided for "linguistic dick
>> comparing" or some such puerile term. All of these are
>>  interesting and valid questions. Though I'm not
>> certain anything really got answered. :/ You might
>> check the Archive for the discussion.
>
> I remember that discussion. But it was only about
> English, though.

Yes, it was - and more specifically it was about the claim
that English has a larger lexicon than any other language -
hence the "linguistic dick comparing" thing, especially
when English was defined to include the Saxon of the early
Germanic settlers in this island.

> I'm wondering what the situation looks like
> cross-linguistically.
>
> But yes, it's a vast complicated question because what
> constitutes a "word" is far from clear-cut, and which
> words to include/exclude are also far from obvious.

Yes - all true.  All languages will borrow vocabulary.  At
what point is a borrowed word not considered foreign?

Also with languages like Old English we can never know _ALL
the words_; all we know are those lexical items that got
written down in works that either survived or got copied.
In any 'dead language' we will have only a fraction of the
total lexicon.

> Your proposal was to count ALL THE WORDS, which means
> everything from Old English up to the present. But why
> stop there? After all, Old English words came from
> proto-Germanic, and if we're going to include words that
>  are no longer in use, we should include everything up to
>  PIE too.

But as we get further and further back in time the data
becomes less and less complete.  With PIE, indeed, we have
no written record and any lexicon will be a reconstruction
at best of only part of its total lexicon.  But as you are
suggest _ALL the words_ is the problem and IMHO fairly
meaningless.

[snip]

> Which is why I conceded that there is no universal or
> even consistent answer to my questions. But suppose we
> arbitrarily adopt one metric, whatever that may be, no
> matter how arbitrary or silly its definition may be.

But if its arbitrary and/or silly, won't the result be
fairly meaningless?

[snip]
>
> Interesting things to look might be, how does the lexicon
> size of a technologically-advanced society compare with,
> say, a third world society?

Obviously the technologically advanced one will have a whole
lot of vocabulary the other will not have.  But all living
languages can and do expand vocabulary when the need arises.
People often ridicule a 'more primitive' language because
it borrows terms from English, without considering that
English has plundered the lexicon of Latin & Greek to coin
terms!

The trouble with the lexicon of languages no longer spoken
is that we have only incomplete data - sometimes very
incomplete; and with languages actually being spoken now,
the problem is that their lexicon is open-ended.  There have
probably been some words added to some languages even since
I began typing this    ;)

> What about the number of core words?

What constitutes a 'core words'?  If we have something like
the Swadish list, then won't all natlangs have more or less
the same?

> What about the sizes of closed word classes across
> languages?

The classes don't correspond across languages and, indeed,
is any class really closed in a living, spoken language?

> Or more importantly, how to conlang lexicon sizes
> compare with natlang lexicon sizes? (I suspect the answer
> to this last one is likely "far too small" in almost all
> cases. But I could be wrong.)

I think you are not wrong. It seems to me the best way a
conlang can develop a vocabulary comparable to a natlang is
actually to get the conlang used by a reasonably large group
of users.  I'm no expert n Esperanto, but I am sure that
after more than a century of actual use it has a far greater
lexicon than that bequeathed to it by Zamenhof.  If a
language lives, it must surely grow.

I suspect field linguists have given estimates of the size
of lexicon of the languages of pre-industrial societies and
surviving hunter-gatherer societies. That might be useful
for what you require.

-- 
Ray
==================================
http://www.carolandray.plus.com
==================================
"language  began with half-musical unanalysed expressions
for individual beings and events."
[Otto Jespersen, Progress in Language, 1895]

Messages in this topic (10)
________________________________________________________________________
1b. Re: Typical lexicon size in natlangs
    Posted by: "Padraic Brown" [email protected] 
    Date: Sat May 11, 2013 4:51 am ((PDT))

--- On Sat, 5/11/13, H. S. Teoh <[email protected]> wrote:

> I remember that discussion. But it was only about English, though. I'm
> wondering what the situation looks like cross-linguistically.
> 
> But yes, it's a vast complicated question because what constitutes a
> "word" is far from clear-cut, and which words to include/exclude are
> also far from obvious. Your proposal was to count ALL THE WORDS, 

Yes. This approach has the advantages of: avoiding the issue of inclusion /
exclusion and also being the most accurate possible answer. It just seems
most logical to me that the best answer to your question especially is a
set of the most accurate numbers.

If you go to the bank and ask for your account balances and they tell you
"well, you've got a lot of money in there!"; or if you go to the doctor
and she says "well, just take a bunch of these pills"; don't you think
most people would want to know how much money or how many pills? No
different here.

> which means everything from Old English up to the present. But why
> stop there? After all, Old English words came from proto-Germanic, and
> if we're going to include words that are no longer in use, we should
> include everything up to PIE too. 

Right. And now we've arrived at a time where English and French and
Spanish and Sanskrit are all THE SAME LANGUAGE. Somewhere along the way,
a line does have to be drawn. I am just leary of the line drawers -- once
they get going, as one could see in the earlier discussion, they have
great difficulty in stopping. Before too long, not only have they cut off
Primitive Germanic (the trimmage of which dóes make sense), but they've
hacked away everything down to a list of "only the words a person utters 
on a Thursday after midnight with a bad case of pharyngitis". Makes 
counting a lot easier! ;)

> It's not as though Old English speakers suddenly one day decided that 
> now their dialect of Old Germanic was no longer a dialect but its own 
> language proper, and therefore on that day whatever words were in use by 
> them should be codified into an official English lexicon that excludes 
> all Old Germanic words in other dialects.
> Ditto for going all the way back to PIE (and beyond!). 
> OTOH, if we're going to cut it off at Old English, then why not cut it 
> off at Middle English, or Modern?

Perhaps one could approach the question nodally: we recognise that English
and German are different languages; at some time in the remote past we
also recognise that English and German per se did not exist (they were one
and the same language we call Primitive Germanic). Find the node where
the English family finally splits off from its nearest relatives and call
it a day. This, I think, answers the "how far back do we look" question,
by choosing a slightly arbitrary but sensible date beyond which we don't
even think of the language as "English" (or "Dutch" or "French") anymore.

> Which is why I conceded that there is no universal or even consistent
> answer to my questions. 

Perhaps not. But if you want an answer, you do have to come up with a
reasonable way of finding as accurate an answer as possible. Sooner or
later, one must piss or get off the pot.

> But suppose we arbitrarily adopt one metric, whatever that may be, no 
> matter how arbitrary or silly its definition may be. Then how does the 
> cross-linguistic situation look? What are the relative lexicon sizes of 
> various natlangs?

How high can you count? Or rather, how much time do you want to spend
in counting? ;))

> Interesting things to look might be, how does the lexicon size of a
> technologically-advanced society compare with, say, a third world
> society? 

Interesting indeed, and for several reasons. Immediately coming to mind
are societies (and therefore languages) that lack the full monty of
technical and scientific lexicon, but the people of which make use of
products of that science and technology and therefore have some of those
words in use. On the other hand, some / many third world societies will
keep in active use objects (and therefore words) for things and activities
that the more technologically advanced society no longer actively use.

> What about the number of core words? What about the sizes of closed word 
> classes across languages? Or more importantly, how do conlang lexicon 
> sizes compare with natlang lexicon sizes? 

Oo. There's a poser. The immediate question here, of course, is "what
conlang actually has a complete and definitive lexicon". I.e., the 
conlanger has recorded all possible words for the conlang? I know my
lexicons are chock full of "virtual words"! -- all those words that would
actually be on the list, because the concept exists and is named by the
culture, but I just haven't physically written the word into the list.

> (I suspect the answer to this last one is likely "far too small" in 
> almost all cases. But I could be wrong.)

At least with conlangs one can, generally speaking, find a definitive
corpus to count. I am less sure the comparison would yield anything
fruitful, though. It would be like comparing the body of knowledge of
a kindergartener with a recently graduated PhD. One is just starting out
on the path of learning and still has a lot of potential for learning
more, but does that really tell us anything of use?

> T

Padraic

Messages in this topic (10)
________________________________________________________________________
1c. Re: Typical lexicon size in natlangs
    Posted by: "Jim Henry" [email protected] 
    Date: Sat May 11, 2013 5:10 am ((PDT))

On Sat, May 11, 2013 at 3:15 AM, R A Brown <[email protected]> wrote:
> On 11/05/2013 06:42, H. S. Teoh wrote:
> But if its arbitrary and/or silly, won't the result be
> fairly meaningless?

>> What about the number of core words?

> What constitutes a 'core words'?  If we have something like
> the Swadish list, then won't all natlangs have more or less
> the same?

George Corley, I think it was, suggested a less arbitrary way to
filter out the archaic words and specialized jargon than simply
declaring a certain date cut-off or marking certain semantic domains
off-limits.  He suggested taking a large corpus of recent texts and
looking for the set of most frequent words that constitute 90% (or
80%, or whatever) of those texts.  That would give you an idea of the
core vocabulary of a specific language -- the set of words that many
or most speakers use frequently -- without using arbitrary
cross-linguistic standards like the Swadesh List.  You can set the
figure to 95% or even 99%, as long as you use the same figure for all
the languages whose corpora you're comparing.

Of course, that still leaves some arbitrary decisions about marking
word boundaries in your corpus before you parse it.  And for some
languages, a larger corpus will be available than for others.  But I
think it should give a less arbirary, more comparable method of
comparing different languages than simply counting entries in
dictionaries, when the lexicographers working with different languages
may have been using very different design principles and had different
resources available to them.

-- 
Jim Henry
http://www.pobox.com/~jimhenry/
http://www.jimhenrymedicaltrust.org

Messages in this topic (10)
________________________________________________________________________
1d. Re: Typical lexicon size in natlangs
    Posted by: "Tristan" [email protected] 
    Date: Sat May 11, 2013 6:25 am ((PDT))

> > Interesting things to look might be, how does the lexicon
> > size of a technologically-advanced society compare with,
> > say, a third world society?

> Obviously the technologically advanced one will have a whole
> lot of vocabulary the other will not have.

And the other will have a whole lot of words in common use that the
technologically advanced one will not.

Which put me in mind of the metric of it's speakers median speaking
lexicon. Simpler to measure, and can be perhaps be approxomated without
undue effort. If words are taken as unanayzable meanings, that is idioms
included, I would hazard a guess that it would be fairly consistant
across languages. The theory being that lexicon expands to fill the
available memory, be it from naming our technological explosion or from
naming the distinctions in the natural enviroment.

Tristan

-- 
All original matter is hereby placed immediately under the public domain.

Messages in this topic (10)
________________________________________________________________________
________________________________________________________________________
2a. Conaccents.
    Posted by: "Leonardo Castro" [email protected] 
    Date: Sat May 11, 2013 5:32 am ((PDT))

Do you folks have your own conaccents? Do they apply only to your own
conlangs or to natlagns too?

Have you alreday created any conaccent to be used by yourself in a
natlang (maybe your native one)?

Até mais!

Leonardo

Messages in this topic (2)
________________________________________________________________________
2b. Re: Conaccents.
    Posted by: "Nina-Kristine Johnson" [email protected] 
    Date: Sat May 11, 2013 5:38 am ((PDT))

For mine (Ehenív), I'm told it sounds *Asian*, even if I used elements of
Eastern European languages.

I've asked several friends and they all say the same thing, One actually
explained it to me (the specifics) and it made sense. I'd post it here, but
I don't know how comfortable he is with me exposing him to the CONLANG
World.

Sometimes it sounds like my own accent, but it certainly has an *Asian* sound.
Sadly, my Ehenív accent sounds better than my own, real accent. And I work
in tech support: people have to hear my *pashko* voice!

Cheers!

On 11 May 2013 05:32, Leonardo Castro <[email protected]> wrote:

> Do you folks have your own conaccents? Do they apply only to your own
> conlangs or to natlagns too?
>
> Have you alreday created any conaccent to be used by yourself in a
> natlang (maybe your native one)?
>
> Até mais!
>
> Leonardo
>

Messages in this topic (2)

------------------------------------------------------------------------
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/conlang/

<*> Your email settings:
    Digest Email  | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/conlang/join
    (Yahoo! ID required)

<*> To change settings via email:
    [email protected] 
    [email protected]

<*> To unsubscribe from this group, send an email to:
    [email protected]

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/

------------------------------------------------------------------------

[conlang] Digest Number 9282

Reply via email to