[conlang] Digest Number 9281

conlang Fri, 10 May 2013 22:44:40 -0700

There are 15 messages in this issue.

Topics in this digest:

1.1. Re: the LCC5 relay is up    
    From: Roger Mills
1.2. Re: the LCC5 relay is up    
    From: neo gu
1.3. Re: the LCC5 relay is up    
    From: neo gu
1.4. Re: the LCC5 relay is up    
    From: Zach Wellstood
1.5. Re: the LCC5 relay is up    
    From: Jim Henry
1.6. Re: the LCC5 relay is up    
    From: Padraic Brown

2a. Re: another indexing sketch    
    From: neo gu
2b. Re: another indexing sketch    
    From: H. S. Teoh
2c. Re: another indexing sketch    
    From: MorphemeAddict

3a. Typical lexicon size in natlangs    
    From: H. S. Teoh
3b. Re: Typical lexicon size in natlangs    
    From: Nicole Valicia Thompson-Andrews
3c. Re: Typical lexicon size in natlangs    
    From: Jim Henry
3d. Re: Typical lexicon size in natlangs    
    From: Nicole Valicia Thompson-Andrews
3e. Re: Typical lexicon size in natlangs    
    From: Padraic Brown
3f. Re: Typical lexicon size in natlangs    
    From: H. S. Teoh

Messages
________________________________________________________________________
1.1. Re: the LCC5 relay is up
    Posted by: "Roger Mills" [email protected] 
    Date: Fri May 10, 2013 9:39 am ((PDT))

It has struck me that one of the major differences between early and recent 
relays, is that the early ones were restricted (IIRC) to members of Conlang-L, 
while later ones let people in from all over. Don't mean to be elitist here, 
but isn't that the source of some of the problems? People unfamiliar with the 
rules/techniques???

Messages in this topic (29)
________________________________________________________________________
1.2. Re: the LCC5 relay is up
    Posted by: "neo gu" [email protected] 
    Date: Fri May 10, 2013 11:12 am ((PDT))

On Fri, 10 May 2013 09:39:05 -0700, Roger Mills <[email protected]> wrote:

>It has struck me that one of the major differences between early and recent 
>relays, is that the early ones were restricted (IIRC) to members of Conlang-L, 
>while later ones let people in from all over. Don't mean to be elitist here, 
>but isn't that the source of some of the problems? People unfamiliar with the 
>rules/techniques???

I think that only causes an increased dropout rate. I note that Relay 19 was 
being run by Amanda, an experienced person, and is still not finished.

Messages in this topic (29)
________________________________________________________________________
1.3. Re: the LCC5 relay is up
    Posted by: "neo gu" [email protected] 
    Date: Fri May 10, 2013 7:08 pm ((PDT))

Anyways, I think the LCC5 Relay should be titled, "A Song of Horses, Turtles, 
Carrots, Plums, and Goats".

Messages in this topic (29)
________________________________________________________________________
1.4. Re: the LCC5 relay is up
    Posted by: "Zach Wellstood" [email protected] 
    Date: Fri May 10, 2013 7:19 pm ((PDT))

That's about the most disjointed and accurate title...

On Fri, May 10, 2013 at 10:08 PM, neo gu <[email protected]> wrote:

> Anyways, I think the LCC5 Relay should be titled, "A Song of Horses,
> Turtles, Carrots, Plums, and Goats".
>

-- 
raa'lalí 'aa! - [sirisaá! <http://en.wikipedia.org/wiki/Conlang>]

Messages in this topic (29)
________________________________________________________________________
1.5. Re: the LCC5 relay is up
    Posted by: "Jim Henry" [email protected] 
    Date: Fri May 10, 2013 7:34 pm ((PDT))

On Fri, May 10, 2013 at 10:08 PM, neo gu <[email protected]> wrote:
> Anyways, I think the LCC5 Relay should be titled, "A Song of Horses, Turtles, 
> Carrots, Plums, and Goats".

Or maybe "The Bad Advice Relay"?

-- 
Jim Henry
http://www.pobox.com/~jimhenry/
http://www.jimhenrymedicaltrust.org

Messages in this topic (29)
________________________________________________________________________
1.6. Re: the LCC5 relay is up
    Posted by: "Padraic Brown" [email protected] 
    Date: Fri May 10, 2013 10:08 pm ((PDT))

--- On Thu, 5/9/13, Adam Walker <[email protected]> wrote:

> > There is always a silver lining, and no mistake! More badgers!
> 
> I made mention of you and your badgers a couple of times during the 
> events surrounding the reading out o the LCC5 relay.  

O dearie me! Dare I ask what you could have possibly said -- and how many
folks ran screaming from the room at the mention? ;))

> there were those in attendance who are not on the Conlang-L or are new 
> there to who professed astoundment.  

Probably while thinking to themselves "he's crazier than ever I thought!"

> Once one has fireproof turtles that morph into lions by way of dragons, 
> all that one lacks is the badgers!

:)

> Adam

Padraic

Messages in this topic (29)
________________________________________________________________________
________________________________________________________________________
2a. Re: another indexing sketch
    Posted by: "neo gu" [email protected] 
    Date: Fri May 10, 2013 9:54 am ((PDT))

It occurs to me that in the examples, the Apr28 sentences don't actually 
indicate past or future. Probably, BEF-O would be added for past time and AFT-O 
for future time here, but other temporal adverbs could be used.

BEF-O PN John-i i-leave AFT i-eat.
"John left after eating."

Also, BFR should be BEF, for consistency.

On Mon, 29 Apr 2013 15:13:02 -0400, neo gu <[email protected]> wrote:

>Apr28 Temporal Morphosyntax
>
>The basis for all temporal relations is the set of 4 temporal conjunctions: 
>AFT ("after"), BFR ("before"), DUR ("during"), and TMP ("when"). They all take 
>link arguments referring to the adjunct clause to which the time of the host 
>clause is relative. The adjunct clause begins with the nominalizer NOM, whose 
>link is used, unless the clause immediately follows the conjunction (in which 
>case the link is 0 and NOM is omitted). AFT and BFR can also take another link 
>argument referring to the degree of temporal displacement, expressed as a 
>phrase where the noun denotes temporal units. 
>
>PN Mary arrive AFT-k i-eat PN John-i [2] hour-k.
>"Mary arrived 2 hours after John ate."
>
>NOM-i PN Mary eat j-leave PN John-j DUR-i
>"John left while Mary was eating."
>
>There are also the temporal pronominals O ("now") and T ("then") which appear 
>on the conjunction in place of the clause link and the temporal pronominal SD 
>("the same day") which appears in place of the degree link. These can be used 
>to construct temporal adverbs:
>
>TMP-O          now
>TMP-T          at that time
>BEF-O          in the past
>BEF-O-SD       earlier today
>AFT-T-SD       later the same day
>AFT-O [1] day  tomorrow
>
>When the clause link is 0, the conjunction DUR can be replaced by the aspect 
>prefix PRG on the adjunct verb.
>
>PN John leave, PN Mary PRG-eat
>"John left while Mary was eating."
>
>When both the clause link and the degree link are 0, the conjunctions AFT and 
>BEF can be replaced by the aspect prefixes RET and PRO, respectively.
>
>PN Mary arrive, PN John RET-eat.
>"Mary arrived after John ate."
>
>Verbs marked for aspect can also be used outside of temporal adjuncts.
>
>TMP-T PN Mary PRG-eat. "(At that time,) Mary was eating."

Messages in this topic (5)
________________________________________________________________________
2b. Re: another indexing sketch
    Posted by: "H. S. Teoh" [email protected] 
    Date: Fri May 10, 2013 4:53 pm ((PDT))

On Fri, May 10, 2013 at 12:54:16PM -0400, neo gu wrote:
> It occurs to me that in the examples, the Apr28 sentences don't
> actually indicate past or future. Probably, BEF-O would be added for
> past time and AFT-O for future time here, but other temporal adverbs
> could be used.
[...]

In a number of natlangs, "yesterday" and "tomorrow" are sometimes reused
to mean past or future. My L1 does this to some extent, for example. (Of
course, context makes it clear which meaning is actually intended.)

I borrowed this idea in Tatari Faran: _hara_ is an adverb meaning
"tomorrow", and _nara_ is an adverb meaning "yesterday". They also
double as future/past tense markers:

        huu sa       tapa hara     pasanan da       bata
        huu sa       tapa hara     pasanan na       bata
        1SG CVY:MASC walk tomorrow town    RCP:MASC FIN
        I'll go to town tomorrow. (Or, I'll go to town (indefinite
        future).)

        huu sa       tapa nara      pasanan da       bata
        huu sa       tapa nara      pasanan na       bata
        1SG CVY:MASC walk yesterday town    RCP:MASC FIN
        I went to town yesterday. (Or, I went to town (indefinite
        past).)

To diambiguate between "tomorrow" and future (and likewise "yesterday"
and past), TF uses the idioms _baran hara_ (lit. tomorrow morning) to
mean "tomorrow" and _mubun nara_ (lit. last night) to mean "yesterday".
(Yes, these paraphrases were st... adapted from my L1. :-P) They are
used when the speaker wishes to make clear that the precise day was
meant, not just the generic future/past.

A language doesn't always have to map precisely to semantics. :)

--T

Messages in this topic (5)
________________________________________________________________________
2c. Re: another indexing sketch
    Posted by: "MorphemeAddict" [email protected] 
    Date: Fri May 10, 2013 7:58 pm ((PDT))

This is similar to what I did with my Esperanto relex. All the past-tense
morphemes, including the verb tense, participles, and adverbs, such as
already, share a syllable (or the consonant, for the verb endings), and the
same is true for the present tense and future tense.

present: now = *su*, -*as *= *-s*, -*ant*- = *-nsu-* (ACT-PART + now), -*at*-
= -*rsu*- (PASS-PART + now), today = *sufawe*, present tense = *sucro*
past:      already = *jo*, -*is *= *-j*, -*int*- = -*njo*- (ACT-PART +
already), -*it*- = -*rjo*- (PASS-PART + already), yesterday = *jofawe*,
past tense = *jocro*
future:    soon = *tca*, -*os *= -*tc*, -*ont*- = -*ntca*- (ACT-PART +
soon), -*ot*- = -*rtca*- (PASS-PART + soon), tomorrow = *tcafawe*, future
tense = *tcacro*

'*c*' = /S/ (*sh*e), '*tc*' = /tS/ (*ch*air), '*j*' = /Z/ (plea*s*ure), '*w*'
= /D/ (*th*is)

stevo

On Fri, May 10, 2013 at 7:51 PM, H. S. Teoh <[email protected]> wrote:

> On Fri, May 10, 2013 at 12:54:16PM -0400, neo gu wrote:
> > It occurs to me that in the examples, the Apr28 sentences don't
> > actually indicate past or future. Probably, BEF-O would be added for
> > past time and AFT-O for future time here, but other temporal adverbs
> > could be used.
> [...]
>
> In a number of natlangs, "yesterday" and "tomorrow" are sometimes reused
> to mean past or future. My L1 does this to some extent, for example. (Of
> course, context makes it clear which meaning is actually intended.)
>
> I borrowed this idea in Tatari Faran: _hara_ is an adverb meaning
> "tomorrow", and _nara_ is an adverb meaning "yesterday". They also
> double as future/past tense markers:
>
>         huu sa       tapa hara     pasanan da       bata
>         huu sa       tapa hara     pasanan na       bata
>         1SG CVY:MASC walk tomorrow town    RCP:MASC FIN
>         I'll go to town tomorrow. (Or, I'll go to town (indefinite
>         future).)
>
>         huu sa       tapa nara      pasanan da       bata
>         huu sa       tapa nara      pasanan na       bata
>         1SG CVY:MASC walk yesterday town    RCP:MASC FIN
>         I went to town yesterday. (Or, I went to town (indefinite
>         past).)
>
> To diambiguate between "tomorrow" and future (and likewise "yesterday"
> and past), TF uses the idioms _baran hara_ (lit. tomorrow morning) to
> mean "tomorrow" and _mubun nara_ (lit. last night) to mean "yesterday".
> (Yes, these paraphrases were st... adapted from my L1. :-P) They are
> used when the speaker wishes to make clear that the precise day was
> meant, not just the generic future/past.
>
> A language doesn't always have to map precisely to semantics. :)
>
>
> --T
>

Messages in this topic (5)
________________________________________________________________________
________________________________________________________________________
3a. Typical lexicon size in natlangs
    Posted by: "H. S. Teoh" [email protected] 
    Date: Fri May 10, 2013 6:18 pm ((PDT))

What's the typical lexicon size of a natlang?
What's the smallest known lexicon size of a natlang? The largest?

I realize that these questions are vague and do not have precise
answers, due to conflicting definitions of what constitutes a word, how
to count inflected forms, whether to include/exclude rare words only
used in certain circles, etc., etc., but I'm interested to know how
natlangs compare given a particular choice of metric.

Equally interesting is the question of how many "core words" there are
in a typical natlang; i.e., those words that constitute the vocabulary
of everyday, non-technical conversation between native speakers, and
whether any general statements can be made about words *outside* this
core. Are they mostly derived words? Words of (mostly?) regular
formation?

This latter question occurred to me because I was thinking about the
difference between basic words like pronouns or "hand", "ear", "eyes",
"talk", etc., that are often of opaque lineage (unless you're
linguistically-capable) and irregular form, vs. obscure words like
scientific names for species, that have a fixed pattern of formation
from stems of a given source (e.g., Latin & Greek), and are basically
completely regular. It seems to make sense, since if a rare word had
irregular inflections, nobody would remember it, and it would tend to be
regularised over time via analogy, whereas irregular forms of a common
word are more likely to be retained because everybody uses them all the
time.

In the same vein, it would seem to me that rare words confined to
specific areas (e.g. scientific terminology) should tend to have very
regular derivations, in order to maximize comprehensibility to other
people in that area, since something like "homo sapiens" is more likely
to be understood by people who already know about the Latin/Greek
conventions of scientific terminology, than a totally opaque,
unanalysable coinage like, say, "fim". But OTOH you have terms like
"quark", which isn't exactly analysable either (but such words seem rare
among the non-core words). Are there any such general trends that can be
said about non-core words?

The reason I'm asking all this, of course, is because I'm wondering
what's the best way to develop a conlang's lexicon. With core words,
it's rather easy to flesh it out -- you just make a list of things you
need to talk about in everyday life, then decide how such things should
be expressed in the conlang.  But once you get into the more obscure
region of field-specific words and expressions, it becomes uncomfortably
easy to just start relexing English (unconsciously), because that's just
how you think about those things. It's not as simple to come up with
conlang-specific ways of saying "computer", "keyboard", "program", etc.,
when it's non-obvious how a native speaker would perceive such concepts
and from which roots he would coin the new terminology. It would seem
that the unconscious tendency for a conlanger is to just make 1-to-1
mappings of the English words, carrying over any extralexical
connotations and associations (like using the word for "rodent" to refer
to the (computer) mouse in slang, when such an association between the
device and the animal may not exist in the context of the conlang).

If one could reasonably say that such field-specific words tend to have
regular pedigree, then one could avoid this trap by designating a set of
basic concepts specific to the conlang, from which a native speaker
would draw if asked to coin a new word for a concept that doesn't yet
exist in the language. The resulting coinage would then be much more
naturalistic than simply relexifying English terminology in a 1-to-1
manner. There could even be conlang-specific idioms with these coinages
that arise from associations not present in English.

(Case in point: I learned from my Russian web-pal that the verb "to
type" in English actually translates to different verbs in Russian
depending on whether it's to type at a typewriter, or type at a modern
computer keyboard. Such distinctions seem all too easy to miss when
fleshing out a conlang's lexicon -- I know I'd have just added a single
new verb for "to type" if asked to add computer terminology to a
conlang, had I not learned this little tidbit about Russian to alert me
to other possibilities.)

(And, to wrap up, the original question about typical lexicon sizes is
basically to have a rough way of gauging when one should start thinking
about such issues as the above, in the process of fleshing out one's
conlang's lexicon.)

T

-- 
Give a man a fish, and he eats once. Teach a man to fish, and he will sit 
forever.

Messages in this topic (6)
________________________________________________________________________
3b. Re: Typical lexicon size in natlangs
    Posted by: "Nicole Valicia Thompson-Andrews" [email protected] 
    Date: Fri May 10, 2013 6:35 pm ((PDT))

I need to know he typical size as well. I haven't gotten there yet, but have
a list of words I got from a field linguistics website, that I want to
translate, along with words from this guide I'm working through. I also have
words Istarted working on in the past, that I need to re-organize. My
question along those lines is what do I do when I get to an English word
that has no Yardish word. For example, Yemorans don't have apples, so do I
just leave those words out, with a note explain any word not on this list
has no Yardish equivalent?
Also, do I put city spellings for words like noydle, which is also noidle?

In other words, do I create two lexicons, one with the regular spellings,
and one with the spellings by city? I'm thinking if so, I'd just do a cut
and past and change the spellings.
I'm thinking the definitions wouldn't change per city.

Mellissa Green

@GreenNovelist

-----Original Message-----
From: Constructed Languages List [mailto:[email protected]] On
Behalf Of H. S. Teoh
Sent: Friday, May 10, 2013 6:17 PM
To: [email protected]
Subject: Typical lexicon size in natlangs

What's the typical lexicon size of a natlang?
What's the smallest known lexicon size of a natlang? The largest?

I realize that these questions are vague and do not have precise
answers, due to conflicting definitions of what constitutes a word, how
to count inflected forms, whether to include/exclude rare words only
used in certain circles, etc., etc., but I'm interested to know how
natlangs compare given a particular choice of metric.

Equally interesting is the question of how many "core words" there are
in a typical natlang; i.e., those words that constitute the vocabulary
of everyday, non-technical conversation between native speakers, and
whether any general statements can be made about words *outside* this
core. Are they mostly derived words? Words of (mostly?) regular
formation?

This latter question occurred to me because I was thinking about the
difference between basic words like pronouns or "hand", "ear", "eyes",
"talk", etc., that are often of opaque lineage (unless you're
linguistically-capable) and irregular form, vs. obscure words like
scientific names for species, that have a fixed pattern of formation
from stems of a given source (e.g., Latin & Greek), and are basically
completely regular. It seems to make sense, since if a rare word had
irregular inflections, nobody would remember it, and it would tend to be
regularised over time via analogy, whereas irregular forms of a common
word are more likely to be retained because everybody uses them all the
time.

In the same vein, it would seem to me that rare words confined to
specific areas (e.g. scientific terminology) should tend to have very
regular derivations, in order to maximize comprehensibility to other
people in that area, since something like "homo sapiens" is more likely
to be understood by people who already know about the Latin/Greek
conventions of scientific terminology, than a totally opaque,
unanalysable coinage like, say, "fim". But OTOH you have terms like
"quark", which isn't exactly analysable either (but such words seem rare
among the non-core words). Are there any such general trends that can be
said about non-core words?

The reason I'm asking all this, of course, is because I'm wondering
what's the best way to develop a conlang's lexicon. With core words,
it's rather easy to flesh it out -- you just make a list of things you
need to talk about in everyday life, then decide how such things should
be expressed in the conlang.  But once you get into the more obscure
region of field-specific words and expressions, it becomes uncomfortably
easy to just start relexing English (unconsciously), because that's just
how you think about those things. It's not as simple to come up with
conlang-specific ways of saying "computer", "keyboard", "program", etc.,
when it's non-obvious how a native speaker would perceive such concepts
and from which roots he would coin the new terminology. It would seem
that the unconscious tendency for a conlanger is to just make 1-to-1
mappings of the English words, carrying over any extralexical
connotations and associations (like using the word for "rodent" to refer
to the (computer) mouse in slang, when such an association between the
device and the animal may not exist in the context of the conlang).

If one could reasonably say that such field-specific words tend to have
regular pedigree, then one could avoid this trap by designating a set of
basic concepts specific to the conlang, from which a native speaker
would draw if asked to coin a new word for a concept that doesn't yet
exist in the language. The resulting coinage would then be much more
naturalistic than simply relexifying English terminology in a 1-to-1
manner. There could even be conlang-specific idioms with these coinages
that arise from associations not present in English.

(Case in point: I learned from my Russian web-pal that the verb "to
type" in English actually translates to different verbs in Russian
depending on whether it's to type at a typewriter, or type at a modern
computer keyboard. Such distinctions seem all too easy to miss when
fleshing out a conlang's lexicon -- I know I'd have just added a single
new verb for "to type" if asked to add computer terminology to a
conlang, had I not learned this little tidbit about Russian to alert me
to other possibilities.)

(And, to wrap up, the original question about typical lexicon sizes is
basically to have a rough way of gauging when one should start thinking
about such issues as the above, in the process of fleshing out one's
conlang's lexicon.)

T

-- 
Give a man a fish, and he eats once. Teach a man to fish, and he will sit
forever.

Messages in this topic (6)
________________________________________________________________________
3c. Re: Typical lexicon size in natlangs
    Posted by: "Jim Henry" [email protected] 
    Date: Fri May 10, 2013 7:32 pm ((PDT))

On Sat, May 11, 2013 at 12:35 AM, Nicole Valicia Thompson-Andrews
<[email protected]> wrote:
> question along those lines is what do I do when I get to an English word
> that has no Yardish word. For example, Yemorans don't have apples, so do I
> just leave those words out, with a note explain any word not on this list
> has no Yardish equivalent?

You don't necessarily need to say that; it should go without saying
given the context of your language.  Your Yardish dictionary should
list words for the native plants and animals of their world, but
unless they're in contact with humans and have words for plants and
animals from Earth, they don't need to have words for those things,
and you shouldn't need to spell that out in every case.

> Also, do I put city spellings for words like noydle, which is also noidle?

Do you mean local dialect variations?

> In other words, do I create two lexicons, one with the regular spellings,
> and one with the spellings by city? I'm thinking if so, I'd just do a cut
> and past and change the spellings.
> I'm thinking the definitions wouldn't change per city.

I'd suggest that you (perhaps arbitrarily) pick some dialect that will
be represented as "primary" in your lexicon, and write your main
definitions under headwords in the form they have in that dialect.
Then, for variant forms, just have a cross-reference saying "dialect
variant of <other word>", e.g.,

tsalim - n., barrel or crate

tsaalem - Terepsan dialect for "tsalim" (q.v.)

Of course, some dialect words will be semantically distinct from words
in the prestige dialect, so you'd give those their own independent
definitions.  For instance, the Sulitsan dialect might use "tsalhima"
for most barrels and crates, but a local word "ipsanu" which occurs
nowhere else just for barrels of beer.

-- 
Jim Henry
http://www.pobox.com/~jimhenry/
http://www.jimhenrymedicaltrust.org

Messages in this topic (6)
________________________________________________________________________
3d. Re: Typical lexicon size in natlangs
    Posted by: "Nicole Valicia Thompson-Andrews" [email protected] 
    Date: Fri May 10, 2013 7:38 pm ((PDT))

Thanks. Yes, local dialects. I put city because the spellings are dependent
on the city or village you come from.

Mellissa Green

@GreenNovelist

-----Original Message-----
From: Constructed Languages List [mailto:[email protected]] On
Behalf Of Jim Henry
Sent: Friday, May 10, 2013 7:32 PM
To: [email protected]
Subject: Re: Typical lexicon size in natlangs

On Sat, May 11, 2013 at 12:35 AM, Nicole Valicia Thompson-Andrews
<[email protected]> wrote:
> question along those lines is what do I do when I get to an English word
> that has no Yardish word. For example, Yemorans don't have apples, so do I
> just leave those words out, with a note explain any word not on this list
> has no Yardish equivalent?

You don't necessarily need to say that; it should go without saying
given the context of your language.  Your Yardish dictionary should
list words for the native plants and animals of their world, but
unless they're in contact with humans and have words for plants and
animals from Earth, they don't need to have words for those things,
and you shouldn't need to spell that out in every case.

> Also, do I put city spellings for words like noydle, which is also noidle?

Do you mean local dialect variations?

> In other words, do I create two lexicons, one with the regular spellings,
> and one with the spellings by city? I'm thinking if so, I'd just do a cut
> and past and change the spellings.
> I'm thinking the definitions wouldn't change per city.

I'd suggest that you (perhaps arbitrarily) pick some dialect that will
be represented as "primary" in your lexicon, and write your main
definitions under headwords in the form they have in that dialect.
Then, for variant forms, just have a cross-reference saying "dialect
variant of <other word>", e.g.,

tsalim - n., barrel or crate

tsaalem - Terepsan dialect for "tsalim" (q.v.)

Of course, some dialect words will be semantically distinct from words
in the prestige dialect, so you'd give those their own independent
definitions.  For instance, the Sulitsan dialect might use "tsalhima"
for most barrels and crates, but a local word "ipsanu" which occurs
nowhere else just for barrels of beer.

-- 
Jim Henry
http://www.pobox.com/~jimhenry/
http://www.jimhenrymedicaltrust.org

Messages in this topic (6)
________________________________________________________________________
3e. Re: Typical lexicon size in natlangs
    Posted by: "Padraic Brown" [email protected] 
    Date: Fri May 10, 2013 9:28 pm ((PDT))

--- On Fri, 5/10/13, H. S. Teoh <[email protected]> wrote:

> What's the typical lexicon size of a
> natlang?
> What's the smallest known lexicon size of a natlang? The
> largest?

This line of inquiry was discussed here a couple months ago. I was
chided for suggesting that in order to discover which language has the
largest lexicon you actually have to count all the words in the lexicon.
And also for suggesting that "all the words" means ALL THE WORDS. Was also 
chided for "linguistic dick comparing" or some such puerile term. All of 
these are interesting and valid questions. Though I'm not certain anything 
really got answered. :/ You might check the Archive for the discussion.

Padraic

> I realize that these questions are vague and do not have
> precise
> answers, due to conflicting definitions of what constitutes
> a word, how
> to count inflected forms, whether to include/exclude rare
> words only
> used in certain circles, etc., etc., but I'm interested to
> know how
> natlangs compare given a particular choice of metric.
> 
> Equally interesting is the question of how many "core words"
> there are
> in a typical natlang; i.e., those words that constitute the
> vocabulary
> of everyday, non-technical conversation between native
> speakers, and
> whether any general statements can be made about words
> *outside* this
> core. Are they mostly derived words? Words of (mostly?)
> regular
> formation?
> 
> This latter question occurred to me because I was thinking
> about the
> difference between basic words like pronouns or "hand",
> "ear", "eyes",
> "talk", etc., that are often of opaque lineage (unless
> you're
> linguistically-capable) and irregular form, vs. obscure
> words like
> scientific names for species, that have a fixed pattern of
> formation
> from stems of a given source (e.g., Latin & Greek), and
> are basically
> completely regular. It seems to make sense, since if a rare
> word had
> irregular inflections, nobody would remember it, and it
> would tend to be
> regularised over time via analogy, whereas irregular forms
> of a common
> word are more likely to be retained because everybody uses
> them all the
> time.
> 
> In the same vein, it would seem to me that rare words
> confined to
> specific areas (e.g. scientific terminology) should tend to
> have very
> regular derivations, in order to maximize comprehensibility
> to other
> people in that area, since something like "homo sapiens" is
> more likely
> to be understood by people who already know about the
> Latin/Greek
> conventions of scientific terminology, than a totally
> opaque,
> unanalysable coinage like, say, "fim". But OTOH you have
> terms like
> "quark", which isn't exactly analysable either (but such
> words seem rare
> among the non-core words). Are there any such general trends
> that can be
> said about non-core words?
> 
> The reason I'm asking all this, of course, is because I'm
> wondering
> what's the best way to develop a conlang's lexicon. With
> core words,
> it's rather easy to flesh it out -- you just make a list of
> things you
> need to talk about in everyday life, then decide how such
> things should
> be expressed in the conlang.  But once you get into the
> more obscure
> region of field-specific words and expressions, it becomes
> uncomfortably
> easy to just start relexing English (unconsciously), because
> that's just
> how you think about those things. It's not as simple to come
> up with
> conlang-specific ways of saying "computer", "keyboard",
> "program", etc.,
> when it's non-obvious how a native speaker would perceive
> such concepts
> and from which roots he would coin the new terminology. It
> would seem
> that the unconscious tendency for a conlanger is to just
> make 1-to-1
> mappings of the English words, carrying over any
> extralexical
> connotations and associations (like using the word for
> "rodent" to refer
> to the (computer) mouse in slang, when such an association
> between the
> device and the animal may not exist in the context of the
> conlang).
> 
> If one could reasonably say that such field-specific words
> tend to have
> regular pedigree, then one could avoid this trap by
> designating a set of
> basic concepts specific to the conlang, from which a native
> speaker
> would draw if asked to coin a new word for a concept that
> doesn't yet
> exist in the language. The resulting coinage would then be
> much more
> naturalistic than simply relexifying English terminology in
> a 1-to-1
> manner. There could even be conlang-specific idioms with
> these coinages
> that arise from associations not present in English.
> 
> (Case in point: I learned from my Russian web-pal that the
> verb "to
> type" in English actually translates to different verbs in
> Russian
> depending on whether it's to type at a typewriter, or type
> at a modern
> computer keyboard. Such distinctions seem all too easy to
> miss when
> fleshing out a conlang's lexicon -- I know I'd have just
> added a single
> new verb for "to type" if asked to add computer terminology
> to a
> conlang, had I not learned this little tidbit about Russian
> to alert me
> to other possibilities.)
> 
> (And, to wrap up, the original question about typical
> lexicon sizes is
> basically to have a rough way of gauging when one should
> start thinking
> about such issues as the above, in the process of fleshing
> out one's
> conlang's lexicon.)
> 
> 
> T
> 
> -- 
> Give a man a fish, and he eats once. Teach a man to fish,
> and he will sit forever.
> 

Messages in this topic (6)
________________________________________________________________________
3f. Re: Typical lexicon size in natlangs
    Posted by: "H. S. Teoh" [email protected] 
    Date: Fri May 10, 2013 10:44 pm ((PDT))

On Fri, May 10, 2013 at 09:28:33PM -0700, Padraic Brown wrote:
> --- On Fri, 5/10/13, H. S. Teoh <[email protected]> wrote:
> 
> > What's the typical lexicon size of a natlang?
> > What's the smallest known lexicon size of a natlang? The largest?
> 
> This line of inquiry was discussed here a couple months ago. I was
> chided for suggesting that in order to discover which language has the
> largest lexicon you actually have to count all the words in the
> lexicon.  And also for suggesting that "all the words" means ALL THE
> WORDS. Was also chided for "linguistic dick comparing" or some such
> puerile term. All of these are interesting and valid questions. Though
> I'm not certain anything really got answered. :/ You might check the
> Archive for the discussion.

I remember that discussion. But it was only about English, though. I'm
wondering what the situation looks like cross-linguistically.

But yes, it's a vast complicated question because what constitutes a
"word" is far from clear-cut, and which words to include/exclude are
also far from obvious. Your proposal was to count ALL THE WORDS, which
means everything from Old English up to the present. But why stop there?
After all, Old English words came from proto-Germanic, and if we're
going to include words that are no longer in use, we should include
everything up to PIE too. It's not as though Old English speakers
suddenly one day decided that now their dialect of Old Germanic was no
longer a dialect but its own language proper, and therefore on that day
whatever words were in use by them should be codified into an official
English lexicon that excludes all Old Germanic words in other dialects.
Ditto for going all the way back to PIE (and beyond!).  OTOH, if we're
going to cut it off at Old English, then why not cut it off at Middle
English, or Modern?

Which is why I conceded that there is no universal or even consistent
answer to my questions. But suppose we arbitrarily adopt one metric,
whatever that may be, no matter how arbitrary or silly its definition
may be. Then how does the cross-linguistic situation look? What are the
relative lexicon sizes of various natlangs?

Interesting things to look might be, how does the lexicon size of a
technologically-advanced society compare with, say, a third world
society? What about the number of core words? What about the sizes of
closed word classes across languages? Or more importantly, how to
conlang lexicon sizes compare with natlang lexicon sizes? (I suspect the
answer to this last one is likely "far too small" in almost all cases.
But I could be wrong.)

T

-- 
Lawyer: (n.) An innocence-vending machine, the effectiveness of which
depends on how much money is inserted.

Messages in this topic (6)

------------------------------------------------------------------------
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/conlang/

<*> Your email settings:
    Digest Email  | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/conlang/join
    (Yahoo! ID required)

<*> To change settings via email:
    [email protected] 
    [email protected]

<*> To unsubscribe from this group, send an email to:
    [email protected]

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/

------------------------------------------------------------------------

[conlang] Digest Number 9281

Reply via email to