There are 15 messages in this issue.
Topics in this digest:
1.1. Re: Typical lexicon size in natlangs
From: Sam Stutter
1.2. Re: Typical lexicon size in natlangs
From: Gary Shannon
1.3. Re: Typical lexicon size in natlangs
From: David McCann
1.4. Re: Typical lexicon size in natlangs
From: R A Brown
1.5. Re: Typical lexicon size in natlangs
From: Rich Harrison
1.6. Re: Typical lexicon size in natlangs
From: Alex Fink
1.7. Re: Typical lexicon size in natlangs
From: Gary Shannon
1.8. Re: Typical lexicon size in natlangs
From: Herman Miller
1.9. Re: Typical lexicon size in natlangs
From: Nicole Valicia Thompson-Andrews
1.10. Re: Typical lexicon size in natlangs
From: Jörg Rhiemeier
1.11. Re: Typical lexicon size in natlangs
From: Herman Miller
1.12. Re: Typical lexicon size in natlangs
From: Jörg Rhiemeier
1.13. Re: Typical lexicon size in natlangs
From: MorphemeAddict
2a. Re: "Ice age superlanguage" -- linguistics journalism at its finest
From: Jörg Rhiemeier
3a. Re: Yet Another Simple Self-Segregating Morphology
From: neo gu
Messages
________________________________________________________________________
1.1. Re: Typical lexicon size in natlangs
Posted by: "Sam Stutter" [email protected]
Date: Sun May 12, 2013 5:49 am ((PDT))
Just thinking aloud here:
It strikes me that the question people ask when they say "how many words does a
natlang have?" is actually "how deep and wide is the semantic field a natlang
is capable of differentiating?". When you get down to it, the fact that French
has two words ("le" and "la") instead of the English "the" is utterly
irrelevant and doesn't really say anything about that language's semantic
capabilities - only the nature of its grammar. If you do want to count them as
separate words then should you count the conjugations of Spanish verbs
(particularly the irregular ones)? Words with affixes? Is "he runs" different
from "I ran" in a measurement of "words"? Where does a semantic difference
"male running presently" and "myself running in the past" become a grammatical
difference?
Let's say we use some sort of metalanguage to describe every possible thing,
event or quality in the universe. Yes, such a language is impossible, but for
theoretical purposes we can use it to say: "how many of these things, events
and qualities does the natlang in question describe?"
The answer, of course, is all of them, because even if your natlang has no word
for "airplane" you can probably describe it adequately well using words you
already have. English has to do this with familial relations and body parts all
the time. I guess the only way you could do this, without resorting to asking
"is it in the dictionary?" is to attempt to measure a language's capability to
describe things succinctly - hence "big metal bird mounted by humans" is less
succinct than "aeroplane". Then again, "airplane" isn't in itself particularly
succinct, being formed from two words which already existed.
Well, you might say, ""big metal bird mounted by humans" isn't one word, unlike
"airplane" which is". But then how do you define a word other than the way it's
written? The only difference I can hear between "sand witch" and "sandwich" are
the laws of English intonation. Conversely, words "witch" and "which" are quite
obviously different words but are pronounced identically and words which are
spelt identically are often obviously different words, such as "read" and
"read". In Swedish intonation is used to differentiate "identical words" with
different semantic meanings - how can we be 100% objectively sure that this
isn't what English is doing?
We could measure succinctness by number of syllables - which unfairly
prejudices against languages with smaller phonemic inventories - or time taken
to speak - which is also not particularly useful due to differences in regional
speech patterns, idiolects and the like. Besides, the NP "the turquoise jet" in
English could probably be said much more quickly than "jet" could be said in
many other languages which, when it comes to measuring semantic depth, unfairly
prejudices against languages which don't have words for "turquoise" and "jet",
since all they would be doing would be exactly the same as English does, only
their basic starting points are much "longer".
You could come from an experimental angle and measure people's capability to
describe a set of objects or, theoretically, every object, action or quality in
the universe. Then you might ask them "the way you described that - how many
words did you use?", using speaker's intuition. Then again, most English
speakers will be ignorant of the fact that English enables them to distinguish
between various native plants without resorting to latin or adjectives, or will
be unable to distinguish between certain objects, etc (like birds, metals,
etc). Any experimental measurement of semantic depth and breadth would be
stymied by a populace's ignorance. A language spoken by a rainforest tribe with
little contact with the outside world will have plenty of words for
differentiating local plants and animals and the like, but almost no words for
chemical elements, machines and internet etiquette. Furthermore, with rude
words or words which are socially prohibited people will be loathe to record
them and, where two words describe the same thing one might accidentally record
a semantic difference where none is present.
In the end, a language needs to describe some basic things quite succinctly -
man, woman, fire, water, etc. And that's about it. There's nothing more, as I
see it, which can be said, measured or compared. If a language has these basic
terms then we can state that it has what it needs to be a natural human
language. If it doesn't then it most likely isn't a natural human language.
Lexical measurement may seem like a possible science or, even, a reasonable
fudge but, like attempting to measure the coastline of the British Isles, the
distance you end up with will always be "infinity".
Sam Stutter
[email protected]
"No e na'l cu barri"
On 11 May 2013, at 22:36, H. S. Teoh <[email protected]> wrote:
> On Sat, May 11, 2013 at 01:53:10PM -0700, Gary Shannon wrote:
>> On Sat, May 11, 2013 at 12:08 PM, Alex Fink <[email protected]> wrote:
>>>
>>> For instance, it's a number bandied around that knowing 500 hanzi
>>> will allow you to read 90% of the characters in a Chinese newspaper
>>> -- but usually by people who don't appreciate the fact that this
>>> includes all the grammatical and closed-class words, and a swathe
>>> of basic lexis, but probably not the ìnteresting word or two in the
>>> headline you care about.
>>
>> For example, if you know the most common 28 words in English you can
>> read 50% of everything written. But what does THAT mean if 50% means
>> that you can read only 50% of each sentence?
>>
>> Or, if you get really ambitious you can learn 732 words and read 90%
>> of everything written in English. If you want to be able to read 99.9%
>> of everything written in English you will need to learn 2090 words.
>> (These figures are from my own million-word corpus taken from 20th
>> century fiction and non-fiction on Gutenberg.com.)
>>
>> So what does it really mean to say you can read 90% by knowing 732
>> words?
>>
>> Maybe the only meaningful measure of lexicon size is how many words
>> you must know to cover some specified x% of the whole of the written
>> corpus. That's a very different number for Toki Pona than it is for
>> English. That way you could talk meaningfully about a specific
>> language's "90% coverage lexicon", and its "98% coverage lexicon", and
>> so on.
> [...]
>
> The problem with these percentages is that they obscure a basic fact of
> information theory: the most information is conveyed by the most unusual
> or outstanding bits. The stuff that's repeated almost everywhere has
> very low information content. So if I can understand 50% of the most
> common words in a given text, but most of that 50% is just grammatical
> words, then I actually *don't* understand 50% of the information
> conveyed by that text, but far less, probably only 5% or so. OTOH, if
> of that 50% that I understand 40% are content words, then I may have a
> far better understanding of the information conveyed by the text, even
> if I'm ignorant of most of the grammatical particles and constructions.
>
> For example, given the English sentence:
>
> Last week in an upscale neighbourhood in downtown Manhattan a
> woman was brutally murdered by a suspected sex offender, thought
> to be dangerously armed.
>
> If I only know the most common grammatical words, then it would read
> like this to me:
>
> **** **** in an ******* ************* in ******** ********* a
> woman was ******** ******** by a ********* *** ********, *******
> to be *********** *****.
>
> The text is essentially opaque. But if I *didn't* know common words like
> "in", "an", "by", etc., but do recognise some of the keywords, what I
> comprehend might be something like:
>
> Last week ** ** ******* neighbourhood ** ******** Manhattan *
> woman *** ******** murdered ** * ********* *** offender, *******
> ** ** *********** armed.
>
> I can understand the gist of the text far better, even if the specific
> details are incomprehensible to me. Note also that in the latter case I
> only recognized 8 words, yet understood more than the first case, where
> 10 words were recognized but almost zero information was conveyed.
>
>
> T
>
> --
> Answer: Because it breaks the logical sequence of discussion.
> Question: Why is top posting bad?
Messages in this topic (28)
________________________________________________________________________
1.2. Re: Typical lexicon size in natlangs
Posted by: "Gary Shannon" [email protected]
Date: Sun May 12, 2013 8:09 am ((PDT))
That's a fascinating analysis of the problem. Here's another question
one might ask, specifically about conlangs rather than natlangs: At
what point can I consider my conlang vocabulary "complete"?
You point out that one language might say "big metal bird mounted by
humans" where another could say "airplane", or even "jet". But for
conlangers, the real question, or at least the practical, pragmatic
question is: if my conlang has no word for "airplane", can my conlang
say "big metal bird mounted by humans". Yes, I can describe a thing,
rather than name it, but to do so I must have a lexicon adequate to
the job of describing that thing.
The question for conlangers might better be: At what point do I have
an adequate "defining vocabulary" (
http://en.wikipedia.org/wiki/Defining_vocabulary ), because once you
have something like Longman's defining vocabulary you can write an
arbitrarily large dictionary entirely in the conlang.
So I would phrase the conlanger's version of the question thusly: For
my conlang X how many (and which) words do I need in order to be able
to compile a dictionary entirely in conlang X? I'm going to call that
the "Critical Mass Lexicon", because at that point, the lexicon can
support its own growth without using an L1 crutch.
--gary
On Sun, May 12, 2013 at 5:49 AM, Sam Stutter <[email protected]> wrote:
> Just thinking aloud here:
>
[snip some very interesting stuff]
Messages in this topic (28)
________________________________________________________________________
1.3. Re: Typical lexicon size in natlangs
Posted by: "David McCann" [email protected]
Date: Sun May 12, 2013 8:51 am ((PDT))
On Sun, 12 May 2013 13:49:28 +0100
Sam Stutter <[email protected]> wrote:
> It strikes me that the question people ask when they say "how many
> words does a natlang have?" is actually "how deep and wide is the
> semantic field a natlang is capable of differentiating?". When you
> get down to it, the fact that French has two words ("le" and "la")
> instead of the English "the" is utterly irrelevant and doesn't really
> say anything about that language's semantic capabilities - only the
> nature of its grammar. If you do want to count them as separate words
> then should you count the conjugations of Spanish verbs (particularly
> the irregular ones)? Words with affixes? Is "he runs" different from
> "I ran" in a measurement of "words"? Where does a semantic difference
> "male running presently" and "myself running in the past" become a
> grammatical difference?
Which is why we normally distinguish between words and lexemes:
run, runs, ran, running are different words but a single lexeme.
How many lexemes do you need? The Modern Chinese Course produced by the
government for foreigners in the 1960s taught a total of 3000
characters, but that would be just a beginning. My Novial dictionary
has 5700 lexemes, but there are a lot of missing items (addict,
diary, militia, screenâ¦) My Esperanto dictionary has just under 9000,
including some proper nouns.
Messages in this topic (28)
________________________________________________________________________
1.4. Re: Typical lexicon size in natlangs
Posted by: "R A Brown" [email protected]
Date: Sun May 12, 2013 9:54 am ((PDT))
On 12/05/2013 16:51, David McCann wrote:
[snip]
> Which is why we normally distinguish between words and
> lexemes:
Or rather between 'word forms' and 'lexemes'
> run, runs, ran, running are different words but a single
> lexeme.
They are indeed single lexeme RUN; but _run, runs, ran,
running_ are four different word forms. Also, as Trask
observes, "... outside of grammar, the term 'word' is used
in yet other senses, as in 'phonological word' and
'orthographic' word.
Therefore saying things like "English has more words than
any other language" or "How many words do natlangs typically
have?" is meaningless IMO without actually defining
what you mean by 'word' in that statement or question.
> How many lexemes do you need?
That surely depends on why you are creating a particular
conlang. Tolkien certainly didn't give either Quenya or
Sindarin a "full" vocabulary so that we could use them, if
we wished, as auxlangs. But he did create sufficient
vocabulary for the Lord of the Rings and other works.
=========================================================
On 12/05/2013 16:09, Gary Shannon wrote:
[snip]
>
> The question for conlangers might better be: At what
> point do I have an adequate "defining vocabulary" (
> http://en.wikipedia.org/wiki/Defining_vocabulary ),
> because once you have something like Longman's defining
> vocabulary you can write an arbitrarily large dictionary
> entirely in the conlang.
Yes, that I think is a sensible approach if your aiming for
a conlang which could actually used for everyday
communication.
But, if your language did get used, you need not worry; when
new vocabulary is required it will appear. When computer
science and information technology developed in the second
half of the last century, did Esperantists throw up their
hands and bewail the fact Zamenhof hadn't given the language
adequate vocabulary? Of course not! The language grew as,
indeed, all living languages must.
I imagine that applied also to Ido and any other auxlang
that has a reasonable body of users.
--
Ray
==================================
http://www.carolandray.plus.com
==================================
"language ⦠began with half-musical unanalysed expressions
for individual beings and events."
[Otto Jespersen, Progress in Language, 1895]
Messages in this topic (28)
________________________________________________________________________
1.5. Re: Typical lexicon size in natlangs
Posted by: "Rich Harrison" [email protected]
Date: Sun May 12, 2013 10:02 am ((PDT))
This has been a very interesting discussion with a lot of fresh thoughts.
Gary wrote:
> So I would phrase the conlanger's version of the question thusly: For
> my conlang X how many (and which) words do I need in order to be able
> to compile a dictionary entirely in conlang X? I'm going to call that
> the "Critical Mass Lexicon", because at that point, the lexicon can
> support its own growth without using an L1 crutch.
I wonder if skilled users of aUI and Toki Pona could write dictionaries
entirely in their conlangs? If so, the minimal size of a Critical Mass Lexicon
might be a lot smaller than many people assume.
Messages in this topic (28)
________________________________________________________________________
1.6. Re: Typical lexicon size in natlangs
Posted by: "Alex Fink" [email protected]
Date: Sun May 12, 2013 10:47 am ((PDT))
On Sun, 12 May 2013 13:02:37 -0400, Rich Harrison <[email protected]> wrote:
>This has been a very interesting discussion with a lot of fresh thoughts.
>
>Gary wrote:
>
>> So I would phrase the conlanger's version of the question thusly: For
>> my conlang X how many (and which) words do I need in order to be able
>> to compile a dictionary entirely in conlang X? I'm going to call that
>> the "Critical Mass Lexicon", because at that point, the lexicon can
>> support its own growth without using an L1 crutch.
>
>I wonder if skilled users of aUI and Toki Pona could write dictionaries
>entirely in their conlangs? If so, the minimal size of a Critical Mass Lexicon
>might be a lot smaller than many people assume.
Hm, this could be a use for Wierzbicka's Natural Semantic Metalanguage. I
continue to be unconvinced it has anything at all to do with human cognition,
or more than a one-sided relationship to semantic simplicity -- but the various
NSM definitions of words that have been presented would seem to show that it's
capable enough as a (longwinded) critical mass lexicon.
Alex
Messages in this topic (28)
________________________________________________________________________
1.7. Re: Typical lexicon size in natlangs
Posted by: "Gary Shannon" [email protected]
Date: Sun May 12, 2013 11:04 am ((PDT))
NSM is fascinating, theoretically, but who wants a dictionary that
takes 3 pages to define "mouse"? I think Wierzbicka's semantic
primitives make up far too lean a set to be of practical use for
day-to-day users. I think that a practical number might lie somewhere
between NSM's 100 or so primes and Longman's 2,000. Maybe something in
the neighborhood of Ogden's 800 "Basic English" words, although his
list, as it stands, would not be suitable because of all the fudging
he does with meanings.
Still, something like 800 or 1000 defining words might be an usable
conlang target to shoot for.
--gary
On Sun, May 12, 2013 at 10:47 AM, Alex Fink <[email protected]> wrote:
> On Sun, 12 May 2013 13:02:37 -0400, Rich Harrison <[email protected]> wrote:
>
>>This has been a very interesting discussion with a lot of fresh thoughts.
>>
>>Gary wrote:
>>
>>> So I would phrase the conlanger's version of the question thusly: For
>>> my conlang X how many (and which) words do I need in order to be able
>>> to compile a dictionary entirely in conlang X? I'm going to call that
>>> the "Critical Mass Lexicon", because at that point, the lexicon can
>>> support its own growth without using an L1 crutch.
>>
>>I wonder if skilled users of aUI and Toki Pona could write dictionaries
>>entirely in their conlangs? If so, the minimal size of a Critical Mass
>>Lexicon might be a lot smaller than many people assume.
>
> Hm, this could be a use for Wierzbicka's Natural Semantic Metalanguage. I
> continue to be unconvinced it has anything at all to do with human cognition,
> or more than a one-sided relationship to semantic simplicity -- but the
> various NSM definitions of words that have been presented would seem to show
> that it's capable enough as a (longwinded) critical mass lexicon.
>
> Alex
Messages in this topic (28)
________________________________________________________________________
1.8. Re: Typical lexicon size in natlangs
Posted by: "Herman Miller" [email protected]
Date: Sun May 12, 2013 11:12 am ((PDT))
On 5/12/2013 8:49 AM, Sam Stutter wrote:
> You could come from an experimental angle and measure people's
> capability to describe a set of objects or, theoretically, every
> object, action or quality in the universe. Then you might ask them
> "the way you described that - how many words did you use?", using
> speaker's intuition. Then again, most English speakers will be
> ignorant of the fact that English enables them to distinguish between
> various native plants without resorting to latin or adjectives, or
> will be unable to distinguish between certain objects, etc (like
> birds, metals, etc). Any experimental measurement of semantic depth
> and breadth would be stymied by a populace's ignorance. A language
> spoken by a rainforest tribe with little contact with the outside
> world will have plenty of words for differentiating local plants and
> animals and the like, but almost no words for chemical elements,
> machines and internet etiquette. Furthermore, with rude words or
> words which are socially prohibited people will be loathe to record
> them and, where two words describe the same thing one might
> accidentally record a semantic difference where none is present.
I suspect the number would be "tens of thousands" for most languages, at
least more than "thousands" and less than "millions", but that
determining a more precise number will turn out to require more research
and documentation than what's available for all but a few of the best
documented languages.
One first approximation might be the number of words in a typical
native-language dictionary (i.e. something more the scope of the
American Heritage dictionary, not the OED). Occasionally you might use
words outside the dictionary, but more likely there are quite a few
words in the dictionary that you won't ever need, and many that you
won't even recognize. But you're not going to find dictionaries like
that for many languages, and the size of a published dictionary may be
limited for practical reasons.
> In the end, a language needs to describe some basic things quite
> succinctly - man, woman, fire, water, etc. And that's about it.
> There's nothing more, as I see it, which can be said, measured or
> compared. If a language has these basic terms then we can state that
> it has what it needs to be a natural human language. If it doesn't
> then it most likely isn't a natural human language. Lexical
> measurement may seem like a possible science or, even, a reasonable
> fudge but, like attempting to measure the coastline of the British
> Isles, the distance you end up with will always be "infinity".
Well, in the case of measuring a coastline, you could measure it at
varying resolutions and chart the results on a graph.
So, the idea that's been suggested of counting the most frequent words
that make up 80% or 90% of text in the language seems like a reasonable
approach. You could compare those like coastline measurements at
different resolutions, and at least get a lower bound for the number of
words in a language.
Messages in this topic (28)
________________________________________________________________________
1.9. Re: Typical lexicon size in natlangs
Posted by: "Nicole Valicia Thompson-Andrews" [email protected]
Date: Sun May 12, 2013 12:51 pm ((PDT))
Thanks for the link. That question makes more sense.
Mellissa Green
@GreenNovelist
-----Original Message-----
From: Constructed Languages List [mailto:[email protected]] On
Behalf Of Gary Shannon
Sent: Sunday, May 12, 2013 8:09 AM
To: [email protected]
Subject: Re: Typical lexicon size in natlangs
That's a fascinating analysis of the problem. Here's another question
one might ask, specifically about conlangs rather than natlangs: At
what point can I consider my conlang vocabulary "complete"?
You point out that one language might say "big metal bird mounted by
humans" where another could say "airplane", or even "jet". But for
conlangers, the real question, or at least the practical, pragmatic
question is: if my conlang has no word for "airplane", can my conlang
say "big metal bird mounted by humans". Yes, I can describe a thing,
rather than name it, but to do so I must have a lexicon adequate to
the job of describing that thing.
The question for conlangers might better be: At what point do I have
an adequate "defining vocabulary" (
http://en.wikipedia.org/wiki/Defining_vocabulary ), because once you
have something like Longman's defining vocabulary you can write an
arbitrarily large dictionary entirely in the conlang.
So I would phrase the conlanger's version of the question thusly: For
my conlang X how many (and which) words do I need in order to be able
to compile a dictionary entirely in conlang X? I'm going to call that
the "Critical Mass Lexicon", because at that point, the lexicon can
support its own growth without using an L1 crutch.
--gary
On Sun, May 12, 2013 at 5:49 AM, Sam Stutter <[email protected]> wrote:
> Just thinking aloud here:
>
[snip some very interesting stuff]
Messages in this topic (28)
________________________________________________________________________
1.10. Re: Typical lexicon size in natlangs
Posted by: "Jörg Rhiemeier" [email protected]
Date: Sun May 12, 2013 1:24 pm ((PDT))
Hallo conlangers!
On Sunday 12 May 2013 20:04:37 Gary Shannon wrote:
> NSM is fascinating, theoretically, but who wants a dictionary that
> takes 3 pages to define "mouse"? I think Wierzbicka's semantic
> primitives make up far too lean a set to be of practical use for
> day-to-day users. I think that a practical number might lie somewhere
> between NSM's 100 or so primes and Longman's 2,000. Maybe something in
> the neighborhood of Ogden's 800 "Basic English" words, although his
> list, as it stands, would not be suitable because of all the fudging
> he does with meanings.
>
> Still, something like 800 or 1000 defining words might be an usable
> conlang target to shoot for.
All this is an issue which I will have to deal with if I ever
came to develop my Quetch project in earnest. To recall, the
idea behind Quetch is to test out whether Heinlein's idea of
a "speedtalk", an oligosynthetic language with a huge phoneme
inventory and exclusively unisegmental morphemes, actually
succeeds in producing shorter utterances than natlangs, or
the brevity of the morphemes is cancelled out by the need for
many multi-member compounds for concepts natlangs have simple
words for (indeed, I expect the latter!).
I am considering starting off with the root vocabulary list of
Toki Pona (~120 items, depending on what is the latest revision
of the language), but it may be necessary to use more, or use
a *different* list of root concepts.
Alas, I currently have enough to do with other projects, and
Quetch is of very low priority right now.
--
... brought to you by the Weeping Elf
http://www.joerg-rhiemeier.de/Conlang/index.html
"Bêsel asa Éam, a Éam atha cvanthal a cvanth atha Éamal." - SiM 1:1
Messages in this topic (28)
________________________________________________________________________
1.11. Re: Typical lexicon size in natlangs
Posted by: "Herman Miller" [email protected]
Date: Sun May 12, 2013 1:32 pm ((PDT))
On 5/12/2013 2:04 PM, Gary Shannon wrote:
> NSM is fascinating, theoretically, but who wants a dictionary that
> takes 3 pages to define "mouse"? I think Wierzbicka's semantic
> primitives make up far too lean a set to be of practical use for
> day-to-day users. I think that a practical number might lie somewhere
> between NSM's 100 or so primes and Longman's 2,000. Maybe something in
> the neighborhood of Ogden's 800 "Basic English" words, although his
> list, as it stands, would not be suitable because of all the fudging
> he does with meanings.
>
> Still, something like 800 or 1000 defining words might be an usable
> conlang target to shoot for.
Defining animals like "mouse" is tricky. There's hundreds of small
rodents in the world; which ones are mice and which ones are rats?
Which ones are different enough to get their own names, like "gerbil"?
It might be more relevant to see how many pages it takes for NSM to
define words like "guitar".
Then again, definitions for words like "guitar" might not even be as
straightforward as they initially seem. It's a stringed instrument with
six strings, which is played by placing fingers behind the frets and
plucking the strings, right? Well, there are twelve-stringed guitars,
with six courses of two strings each (tuned in unison or octaves). Some
guitars have seven strings. If you put six strings on a banjo, that
doesn't make it a guitar. Is a charango a kind of guitar? A ukulele?
What would be useful is to have a checklist of things like musical
instruments, organized in groups like checklists of birds or mammals.
Then you could pick examples from the list to illustrate the definitions
of words in your language. You could get into some tricky dialect
problems like the classification of cookies and biscuits, but even
checklists of birds run into issues with names (e.g "Gray-headed
Chickadee" vs. "Siberian Tit").
Messages in this topic (28)
________________________________________________________________________
1.12. Re: Typical lexicon size in natlangs
Posted by: "Jörg Rhiemeier" [email protected]
Date: Sun May 12, 2013 1:51 pm ((PDT))
Hallo conlangers!
On Sunday 12 May 2013 22:32:27 Herman Miller wrote:
> [...]
>
> Defining animals like "mouse" is tricky. There's hundreds of small
> rodents in the world; which ones are mice and which ones are rats?
> Which ones are different enough to get their own names, like "gerbil"?
This is indeed a problem, and that in a field that is easier
to classify than many others, as there is a natural taxonomy
one could follow. For many other things, there isn't.
> It might be more relevant to see how many pages it takes for NSM to
> define words like "guitar".
>
> Then again, definitions for words like "guitar" might not even be as
> straightforward as they initially seem. It's a stringed instrument with
> six strings, which is played by placing fingers behind the frets and
> plucking the strings, right? Well, there are twelve-stringed guitars,
> with six courses of two strings each (tuned in unison or octaves). Some
> guitars have seven strings. If you put six strings on a banjo, that
> doesn't make it a guitar. Is a charango a kind of guitar? A ukulele?
>
> What would be useful is to have a checklist of things like musical
> instruments, organized in groups like checklists of birds or mammals.
This has been attempted:
http://en.wikipedia.org/wiki/Hornbostel%E2%80%93Sachs
You still get very broad classes, e.g. a guitar and a violin are
both "necked box lutes" in class 321.322. Oh well.
> Then you could pick examples from the list to illustrate the definitions
> of words in your language. You could get into some tricky dialect
> problems like the classification of cookies and biscuits, but even
> checklists of birds run into issues with names (e.g "Gray-headed
> Chickadee" vs. "Siberian Tit").
Oh yes! Things like these are why I indeed expect Quetch to sink
like a setting stone. Even if all morphemes are unisegmental, a
11-member compound is 11 segments long, i. e. quite a mouthful.
And it is not unlikely that I will need such juggernauts for some
everyday concepts.
It is this kind of problems that brought down the 17th-century
"philosophical" languages, after all. They had to make use of
arbitrary taxonomies, and to unwieldy compounds for many ordinary
concepts. Indeed, Quetch, where I have already decided that
compounds will be head-initial, would look quite much like such
a language.
--
... brought to you by the Weeping Elf
http://www.joerg-rhiemeier.de/Conlang/index.html
"Bêsel asa Éam, a Éam atha cvanthal a cvanth atha Éamal." - SiM 1:1
Messages in this topic (28)
________________________________________________________________________
1.13. Re: Typical lexicon size in natlangs
Posted by: "MorphemeAddict" [email protected]
Date: Sun May 12, 2013 2:52 pm ((PDT))
This is why I'd like to see a dictionary with three levels:
1) Pure, basic NSM semantic primes - the (~63) 'semantic atoms';
2) An intermediate much larger group (500-2000?) of 'semantic molecules'
defined in terms of level one;
3) Everything else defined in terms of levels 1 and 2.
This way the explications would be shorter, and also easier to understand
than if all explications were given in level one, since they'd be on a
richer, more natural semantic field.
Wierzbicka herself talks about this in Mental Lexicon and The Common
Language of All People, both free downloads at
http://www.griffith.edu.au/humanities-languages/school-languages-linguistics/research/natural-semantic-metalanguage-homepage/downloads
stevo
On Sun, May 12, 2013 at 2:04 PM, Gary Shannon <[email protected]> wrote:
> NSM is fascinating, theoretically, but who wants a dictionary that
> takes 3 pages to define "mouse"? I think Wierzbicka's semantic
> primitives make up far too lean a set to be of practical use for
> day-to-day users. I think that a practical number might lie somewhere
> between NSM's 100 or so primes and Longman's 2,000. Maybe something in
> the neighborhood of Ogden's 800 "Basic English" words, although his
> list, as it stands, would not be suitable because of all the fudging
> he does with meanings.
>
> Still, something like 800 or 1000 defining words might be an usable
> conlang target to shoot for.
>
> --gary
>
> On Sun, May 12, 2013 at 10:47 AM, Alex Fink <[email protected]> wrote:
> > On Sun, 12 May 2013 13:02:37 -0400, Rich Harrison <[email protected]>
> wrote:
> >
> >>This has been a very interesting discussion with a lot of fresh thoughts.
> >>
> >>Gary wrote:
> >>
> >>> So I would phrase the conlanger's version of the question thusly: For
> >>> my conlang X how many (and which) words do I need in order to be able
> >>> to compile a dictionary entirely in conlang X? I'm going to call that
> >>> the "Critical Mass Lexicon", because at that point, the lexicon can
> >>> support its own growth without using an L1 crutch.
> >>
> >>I wonder if skilled users of aUI and Toki Pona could write dictionaries
> entirely in their conlangs? If so, the minimal size of a Critical Mass
> Lexicon might be a lot smaller than many people assume.
> >
> > Hm, this could be a use for Wierzbicka's Natural Semantic Metalanguage.
> I continue to be unconvinced it has anything at all to do with human
> cognition, or more than a one-sided relationship to semantic simplicity --
> but the various NSM definitions of words that have been presented would
> seem to show that it's capable enough as a (longwinded) critical mass
> lexicon.
> >
> > Alex
>
Messages in this topic (28)
________________________________________________________________________
________________________________________________________________________
2a. Re: "Ice age superlanguage" -- linguistics journalism at its finest
Posted by: "Jörg Rhiemeier" [email protected]
Date: Sun May 12, 2013 6:34 am ((PDT))
Hallo conlangers!
On Saturday 11 May 2013 21:12:47 BPJ wrote:
> 2013-05-08 21:42, Matthew Turnbull skrev:
> > OK, so having now read the paper, the author does mention three main
> > sources of potential error in their data. (1) historical linguists may be
> > more likely to say two words are cognate, based solely on the fact that
> > they expect them to be (2) some words may be more likely to have similar
> > sounding phonetic representations, despite no underlying cognancy simply
> > because of some innate human preference to use simple words for
> > frequently used concepts, and a similar judgement of simplicity across
> > time and space and
>
> But cognates don't have to be similar-sounding,
> they have to exemplify systematic correspondences!
Indeed. A famous example are the Armenian numerals: _erku_
doesn't sound much like _duô_, _two_ and their ilk, yet it is
perfectly cognate!
And chance resemblances occur also between related languages.
Lat. _dêus_ vs. Gk. _theos_ is a famous example; these two
words are from different PIE roots (*deiwos and *dhesos,
respectively), which did not stop people searching for traces
of the lost Atlantean languages from comparing *both* to
Nahuatl _teotl_, which of course really has nothing to do
with either.
> If they don't understand that basic fact of comparative
> linguistics they are bound to compare apples and oranges!
Amen. This is a problem with most macrocomparison works.
They like to pretend they are using the comparative method,
but their sound correspondence charts (if they give any at
all in the first place) are usually full of many-to-many
matches, and the "cognate sets" also often show incredible
semantic latitude. With such relaxation of the method, it
is possible to "prove" anything while actually proving
*nothing*.
For instance, there are two major reconstruction attempts
for Nostratic on the market (one by Illich-Svitych and
Dolgopolsky, one by Bomhard); they contradict each other,
use different correspondence sets (each involving some
500 etymologies, but different ones, though there is
indeed a considerable overlap) and thus cannot both be
right; which probably means that both are wrong. A method
that yields such false positives just cannot be reliable!
And these two reconstructions still count among the *better*
of the long-range comparison works currently in circulation.
Greenberg's Amerind, for instance, is much worse!
Yet, I am open-minded. There certainly are very deep
relationships between human languages which have not been
discovered yet. Indo-European and Uralic, for instance,
look quite similar for "unrelated" languages to a point
that phyletic relationship looks like one of the likeliest
hypotheses to account for the similarities, and their
homelands were not too far apart from each other, so it is
conceivable that both have a common ancestor that was spoken
perhaps 10,000 or 12,000 years ago somewhere north of the
Caspian Sea. Alas, it is hard to establish. The evidence
is good, but not compulsive.
--
... brought to you by the Weeping Elf
http://www.joerg-rhiemeier.de/Conlang/index.html
"Bêsel asa Éam, a Éam atha cvanthal a cvanth atha Éamal." - SiM 1:1
Messages in this topic (24)
________________________________________________________________________
________________________________________________________________________
3a. Re: Yet Another Simple Self-Segregating Morphology
Posted by: "neo gu" [email protected]
Date: Sun May 12, 2013 1:47 pm ((PDT))
On Tue, 7 May 2013 14:47:49 -0700, Gary Shannon <[email protected]> wrote:
>Syllables are CV where C may be a single consonant or a simple cluster, and
>V is a pure vowel, never a diphthong or vowel cluster of any kind.
>
>Roots are made up of some number of syllables, all of which share the same
>vowel. Thus "midi" and "tolosko" are valid roots, but "madi" and "taluska"
>are not.
>
>When roots are joined, if they use different vowels then the roots are
>simply joined: mo + kala = mokala; moko + la = mokola. All compounds can
>then be decomposed in only one way.
>
>If the vowels in the two roots are the same then some different vowel is
>used to glue the two roots together. Thus: mo + koto = mo + a + koto =
>moakoto; lama + da =lamaida (but la + mada = laimada). Compounds, again,
>can only be decomposed in one way.
>
>That's all there is to it. Comments?
>
>--gary
If you didn't have clusters, you might get away with writing the vowel only
once, e.g. kalamidi => klamdi or kalmid.
Speaking of SSM's, how about this?
Every word begins with a V- prefix, probably for syntactical function.
There may be medial CV- inflectional prefixes (C is a single consonant).
Content words end with 1 or more CC(VC)*V roots, e.g. sti, pkalo, mbelitu.
Function words have at least one medial but no roots.
Messages in this topic (8)
------------------------------------------------------------------------
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/conlang/
<*> Your email settings:
Digest Email | Traditional
<*> To change settings online go to:
http://groups.yahoo.com/group/conlang/join
(Yahoo! ID required)
<*> To change settings via email:
[email protected]
[email protected]
<*> To unsubscribe from this group, send an email to:
[email protected]
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
------------------------------------------------------------------------