There are 15 messages in this issue.
Topics in this digest:
1a. Re: Conaccents.
From: Leonardo Castro
1b. Re: Conaccents.
From: H. S. Teoh
1c. Re: Conaccents.
From: Nina-Kristine Johnson
2a. Re: LCC5 photos at pics.conlang.org
From: Jim Henry
3a. Re: Typical lexicon size in natlangs
From: Alex Fink
3b. Re: Typical lexicon size in natlangs
From: MorphemeAddict
3c. Re: Typical lexicon size in natlangs
From: Gary Shannon
3d. Re: Typical lexicon size in natlangs
From: H. S. Teoh
3e. Re: Typical lexicon size in natlangs
From: Logan Kearsley
4a. Re: "Ice age superlanguage" -- linguistics journalism at its finest
From: BPJ
5.1. Re: the LCC5 relay is up
From: Padraic Brown
5.2. Re: the LCC5 relay is up
From: Alex Fink
5.3. Re: the LCC5 relay is up
From: H. S. Teoh
6a. Re: another indexing sketch
From: neo gu
7.1. Re: Edeinal: Language of the Edeinos
From: Logan Kearsley
Messages
________________________________________________________________________
1a. Re: Conaccents.
Posted by: "Leonardo Castro" [email protected]
Date: Sat May 11, 2013 7:53 am ((PDT))
Nice! And is your conlang spoken with different accents in your
conworld (if you have one)?
Até mais!
Leonardo
2013/5/11 Nina-Kristine Johnson <[email protected]>:
> For mine (Ehenív), I'm told it sounds *Asian*, even if I used elements of
> Eastern European languages.
>
> I've asked several friends and they all say the same thing, One actually
> explained it to me (the specifics) and it made sense. I'd post it here, but
> I don't know how comfortable he is with me exposing him to the CONLANG
> World.
>
> Sometimes it sounds like my own accent, but it certainly has an *Asian* sound.
> Sadly, my Ehenív accent sounds better than my own, real accent. And I work
> in tech support: people have to hear my *pashko* voice!
>
> Cheers!
>
>
> On 11 May 2013 05:32, Leonardo Castro <[email protected]> wrote:
>
>> Do you folks have your own conaccents? Do they apply only to your own
>> conlangs or to natlagns too?
>>
>> Have you alreday created any conaccent to be used by yourself in a
>> natlang (maybe your native one)?
>>
>> Até mais!
>>
>> Leonardo
>>
Messages in this topic (5)
________________________________________________________________________
1b. Re: Conaccents.
Posted by: "H. S. Teoh" [email protected]
Date: Sat May 11, 2013 8:35 am ((PDT))
On Sat, May 11, 2013 at 09:32:36AM -0300, Leonardo Castro wrote:
> Do you folks have your own conaccents? Do they apply only to your own
> conlangs or to natlagns too?
>
> Have you alreday created any conaccent to be used by yourself in a
> natlang (maybe your native one)?
[...]
Conaccent? Are you talking about prosody? Or dialectal differences in a
conlang?
Tatari Faran definitely has its own flavor of prosody... I've yet to
work it all out, but so far, I've found some rather interesting
patterns. For example, if only one NP is present:
tara' sa tapa bata.
3SG CVY walk FIN
[tâ4a? sa tapá bata]
He is walking.
TF is pitch-accented, so I'm transcribing with IPA pitches/tones here.
Note that the verb _tapa_, which has lexical stress on the 2nd syllable,
is assigned high pitch in this case, followed by the finalizer _bata_
which is always pronounced with low pitch. The first NP _tara' sa_ also
"receives stress", meaning that lexical stress within its constituents
are expressed as high pitch.
Now if an additional NP were added to the clause:
tara' sa tapa misanan dei bata.
tara' sa tapa misanan nei bata.
3SG CVY walk village RCP FIN
[tâ4a? sÄ tapà misânan dej bata]
He is walking to the village.
Here, the presence of the second NP before the finalizer changes the
intonation pattern: the second NP "receives stress", but now the verb
_tapa_ is pronounced with low pitch -- its lexical stress is not
expressed. In fact, it's almost as though it now receives "low pitch
stress", such that even the case particle _sa_, which is never stressed,
is now assigned a mid-level pitch.
However, if an adverb is present, then the verb's lexical stress is
expressed again:
tara' sa tapa tsat misanan dei bata.
tara' sa tapa tsat misanan nei bata.
3SG CVY walk fast village RCP FIN
[tâ4a? sa tapá ts)at misânan dej bata]
He is walking quickly to the village.
If there are two NPs following the verb, the prosody changes again:
tara' sa tapa buta' kei misanan dei bata.
tara' sa tapa buta' kei misanan nei bata.
3SG CVY walk hut ORG village RCP FIN
[tâ4a? sÄ tapà butá? keÄ« misânan dej bata]
He is walking from the hut to the village.
There's a feature here I don't quite know how to represent in the IPA:
the high pitch in the NP _misanan dei_ is pronounced higher than in the
NP _buta' kei_. One might say that this sentence has 3 peaks: at the
beginning of the sentence with the first NP, falling into a valley at
the verb _tapa_, then rising to a (lower) peak in _buta' kei_, then to a
higher peak in _misanan dei_, then falling back to a low-pitch valley in
the finalizer _bata_.
Interestingly enough -- and this is what I've only recently noticed --
this prosodic contour means that the NP immediately before the finalizer
receives more stress than the NP preceding it, which makes it more
preferable to place an NP you want to emphasize in that position. So in
the example above, "to the village" is emphasized; if we were to swap
the two NPs following the verb, then it would be "from the hut" that
would be emphasized. This would be the more unusual word order, since
generally speaking, one would tend to emphasize the destination of an
action more than its origin. IOW, prosody in TF has an effect on word
order preference! I was quite happy to discover this emergent effect.
This isn't all there is to TF prosody, of course. Adding adjectives into
the mix also changes the way NPs are stressed. Furthermore, there are a
small number of words that have inherently low lexical pitch (I called
them enclitics, but I'm not so sure that's the correct term anymore).
These words alter the prosody by forcing the pitch to be low even when
the NP they occur in "receive stress". Some words like _tse_ ("you
(sg)") go so far as to even force adjacent case particles to become high
pitch, even though they would never do so otherwise. This makes for
unusual reversals of the usual prosodic contours, which may have
consequences on NP ordering within the clause (I haven't fully explored
the consequences yet).
T
--
The early bird gets the worm. Moral: ewww...
Messages in this topic (5)
________________________________________________________________________
1c. Re: Conaccents.
Posted by: "Nina-Kristine Johnson" [email protected]
Date: Sat May 11, 2013 2:02 pm ((PDT))
"Nice! And is your conlang spoken with different accents in your
conworld (if you have one)?"--Leonardo
Well by* World* you mean like Tolkien, fantasy-stuff...no.
But I am making a low-budget, YouTube movie in this language (I'm a total
amateur!). I have some scenes filmed, already and its going well.
The *World* in this movie is present-day Earth and it plays with "What if
English was not the dominate language?" (Ehenív takes the place of
English--English is a minority language).
Yes, I have a bit of a superiority complex. LOL
Cheers!,
N. Kristine
On 11 May 2013 08:33, H. S. Teoh <[email protected]> wrote:
> On Sat, May 11, 2013 at 09:32:36AM -0300, Leonardo Castro wrote:
> > Do you folks have your own conaccents? Do they apply only to your own
> > conlangs or to natlagns too?
> >
> > Have you alreday created any conaccent to be used by yourself in a
> > natlang (maybe your native one)?
> [...]
>
> Conaccent? Are you talking about prosody? Or dialectal differences in a
> conlang?
>
> Tatari Faran definitely has its own flavor of prosody... I've yet to
> work it all out, but so far, I've found some rather interesting
> patterns. For example, if only one NP is present:
>
> tara' sa tapa bata.
> 3SG CVY walk FIN
> [tâ4a? sa tapá bata]
> He is walking.
>
> TF is pitch-accented, so I'm transcribing with IPA pitches/tones here.
> Note that the verb _tapa_, which has lexical stress on the 2nd syllable,
> is assigned high pitch in this case, followed by the finalizer _bata_
> which is always pronounced with low pitch. The first NP _tara' sa_ also
> "receives stress", meaning that lexical stress within its constituents
> are expressed as high pitch.
>
> Now if an additional NP were added to the clause:
>
> tara' sa tapa misanan dei bata.
> tara' sa tapa misanan nei bata.
> 3SG CVY walk village RCP FIN
> [tâ4a? sÄ tapà misânan dej bata]
> He is walking to the village.
>
> Here, the presence of the second NP before the finalizer changes the
> intonation pattern: the second NP "receives stress", but now the verb
> _tapa_ is pronounced with low pitch -- its lexical stress is not
> expressed. In fact, it's almost as though it now receives "low pitch
> stress", such that even the case particle _sa_, which is never stressed,
> is now assigned a mid-level pitch.
>
> However, if an adverb is present, then the verb's lexical stress is
> expressed again:
>
> tara' sa tapa tsat misanan dei bata.
> tara' sa tapa tsat misanan nei bata.
> 3SG CVY walk fast village RCP FIN
> [tâ4a? sa tapá ts)at misânan dej bata]
> He is walking quickly to the village.
>
> If there are two NPs following the verb, the prosody changes again:
>
> tara' sa tapa buta' kei misanan dei bata.
> tara' sa tapa buta' kei misanan nei bata.
> 3SG CVY walk hut ORG village RCP FIN
> [tâ4a? sÄ tapà butá? keÄ« misânan dej bata]
> He is walking from the hut to the village.
>
> There's a feature here I don't quite know how to represent in the IPA:
> the high pitch in the NP _misanan dei_ is pronounced higher than in the
> NP _buta' kei_. One might say that this sentence has 3 peaks: at the
> beginning of the sentence with the first NP, falling into a valley at
> the verb _tapa_, then rising to a (lower) peak in _buta' kei_, then to a
> higher peak in _misanan dei_, then falling back to a low-pitch valley in
> the finalizer _bata_.
>
> Interestingly enough -- and this is what I've only recently noticed --
> this prosodic contour means that the NP immediately before the finalizer
> receives more stress than the NP preceding it, which makes it more
> preferable to place an NP you want to emphasize in that position. So in
> the example above, "to the village" is emphasized; if we were to swap
> the two NPs following the verb, then it would be "from the hut" that
> would be emphasized. This would be the more unusual word order, since
> generally speaking, one would tend to emphasize the destination of an
> action more than its origin. IOW, prosody in TF has an effect on word
> order preference! I was quite happy to discover this emergent effect.
>
> This isn't all there is to TF prosody, of course. Adding adjectives into
> the mix also changes the way NPs are stressed. Furthermore, there are a
> small number of words that have inherently low lexical pitch (I called
> them enclitics, but I'm not so sure that's the correct term anymore).
> These words alter the prosody by forcing the pitch to be low even when
> the NP they occur in "receive stress". Some words like _tse_ ("you
> (sg)") go so far as to even force adjacent case particles to become high
> pitch, even though they would never do so otherwise. This makes for
> unusual reversals of the usual prosodic contours, which may have
> consequences on NP ordering within the clause (I haven't fully explored
> the consequences yet).
>
>
> T
>
> --
> The early bird gets the worm. Moral: ewww...
>
Messages in this topic (5)
________________________________________________________________________
________________________________________________________________________
2a. Re: LCC5 photos at pics.conlang.org
Posted by: "Jim Henry" [email protected]
Date: Sat May 11, 2013 10:16 am ((PDT))
On Thu, May 9, 2013 at 4:13 PM, Jim Henry <[email protected]> wrote:
> http://pics.conlang.org/v/LCC5/
> I still have about thirty more pictures to add; I'll probably finish
> them tomorrow if not today.
I've finished adding my photos. David added a group shot, and I added
the guestbook image that Sai sent out.
--
Jim Henry
http://www.pobox.com/~jimhenry/
http://www.jimhenrymedicaltrust.org
Messages in this topic (4)
________________________________________________________________________
________________________________________________________________________
3a. Re: Typical lexicon size in natlangs
Posted by: "Alex Fink" [email protected]
Date: Sat May 11, 2013 12:08 pm ((PDT))
On Sat, 11 May 2013 08:10:48 -0400, Jim Henry <[email protected]> wrote:
>George Corley, I think it was, suggested a less arbitrary way to
>filter out the archaic words and specialized jargon than simply
>declaring a certain date cut-off or marking certain semantic domains
>off-limits. He suggested taking a large corpus of recent texts and
>looking for the set of most frequent words that constitute 90% (or
>80%, or whatever) of those texts. That would give you an idea of the
>core vocabulary of a specific language -- the set of words that many
>or most speakers use frequently -- without using arbitrary
>cross-linguistic standards like the Swadesh List. You can set the
>figure to 95% or even 99%, as long as you use the same figure for all
>the languages whose corpora you're comparing.
Indeed, that's probably one of the less arbitrary ways to actually generate a
number in practice. But in view of other figures I've seen, I suspect any of
those thresholds will probably yield drastic undercounting compared to the kind
of numbers you'd like for speakers' mental lexicon size (though maybe 99%
begins to get close). For instance, it's a number bandied around that knowing
500 hanzi will allow you to read 90% of the characters in a Chinese newspaper
-- but usually by people who don't appreciate the fact that this includes all
the grammatical and closed-class words, and a swathe of basic lexis, but
probably not the ìnteresting word or two in the headline you care about.
In fact, I wonder how much variation there would be from language to language
in the rate at which this number of words varies with the cutoff -- e.g. the
exponent if you fit the growth to a power law. That seems like it should be
even less subject to irrelevant factors.
>Of course, that still leaves some arbitrary decisions about marking
>word boundaries in your corpus before you parse it. And for some
>languages, a larger corpus will be available than for others. But I
>think it should give a less arbirary, more comparable method of
>comparing different languages than simply counting entries in
>dictionaries, when the lexicographers working with different languages
>may have been using very different design principles and had different
>resources available to them.
Another concern with corpus methods is that, if you want to disregard
inflectional morphology when deciding what counts as the same word (and surely
you do?), you still need a good stemmer for the language in question. But if
you accept that, it's not possible to completely avoid judģment calls on
what's inflection and what's derivation; or even if there are no judģment
calls, it might rely too much on semantic understanding for the software to be
able to do it. (Drifting completely away from objectivity, my own proclivity
would be to answer this question fuzzily. E.g. once you know the English word
"build", you don't get the nominal sense of "building" completely for free, but
neither is it a wholly separate word of its own; perhaps it should count as one
half of a word?)
Alex
Messages in this topic (15)
________________________________________________________________________
3b. Re: Typical lexicon size in natlangs
Posted by: "MorphemeAddict" [email protected]
Date: Sat May 11, 2013 1:44 pm ((PDT))
This makes me wonder how many words/characters one needs to know beyond the
basic structural/core words that occur in any or all contexts, essentially
the grammar words. How many characters in Chinese (Mandarin/Putonghua)
represent 'empty', structural/grammar words? And what are they?
stevo
On Sat, May 11, 2013 at 3:08 PM, Alex Fink <[email protected]> wrote:
> On Sat, 11 May 2013 08:10:48 -0400, Jim Henry <[email protected]>
> wrote:
>
> >George Corley, I think it was, suggested a less arbitrary way to
> >filter out the archaic words and specialized jargon than simply
> >declaring a certain date cut-off or marking certain semantic domains
> >off-limits. He suggested taking a large corpus of recent texts and
> >looking for the set of most frequent words that constitute 90% (or
> >80%, or whatever) of those texts. That would give you an idea of the
> >core vocabulary of a specific language -- the set of words that many
> >or most speakers use frequently -- without using arbitrary
> >cross-linguistic standards like the Swadesh List. You can set the
> >figure to 95% or even 99%, as long as you use the same figure for all
> >the languages whose corpora you're comparing.
>
> Indeed, that's probably one of the less arbitrary ways to actually
> generate a number in practice. But in view of other figures I've seen, I
> suspect any of those thresholds will probably yield drastic undercounting
> compared to the kind of numbers you'd like for speakers' mental lexicon
> size (though maybe 99% begins to get close). For instance, it's a number
> bandied around that knowing 500 hanzi will allow you to read 90% of the
> characters in a Chinese newspaper -- but usually by people who don't
> appreciate the fact that this includes all the grammatical and closed-class
> words, and a swathe of basic lexis, but probably not the ìnteresting word
> or two in the headline you care about.
>
> In fact, I wonder how much variation there would be from language to
> language in the rate at which this number of words varies with the cutoff
> -- e.g. the exponent if you fit the growth to a power law. That seems like
> it should be even less subject to irrelevant factors.
>
> >Of course, that still leaves some arbitrary decisions about marking
> >word boundaries in your corpus before you parse it. And for some
> >languages, a larger corpus will be available than for others. But I
> >think it should give a less arbirary, more comparable method of
> >comparing different languages than simply counting entries in
> >dictionaries, when the lexicographers working with different languages
> >may have been using very different design principles and had different
> >resources available to them.
>
> Another concern with corpus methods is that, if you want to disregard
> inflectional morphology when deciding what counts as the same word (and
> surely you do?), you still need a good stemmer for the language in
> question. But if you accept that, it's not possible to completely avoid
> judģment calls on what's inflection and what's derivation; or even if there
> are no judģment calls, it might rely too much on semantic understanding for
> the software to be able to do it. (Drifting completely away from
> objectivity, my own proclivity would be to answer this question fuzzily.
> E.g. once you know the English word "build", you don't get the nominal
> sense of "building" completely for free, but neither is it a wholly
> separate word of its own; perhaps it should count as one half of a word?)
>
> Alex
>
Messages in this topic (15)
________________________________________________________________________
3c. Re: Typical lexicon size in natlangs
Posted by: "Gary Shannon" [email protected]
Date: Sat May 11, 2013 1:53 pm ((PDT))
On Sat, May 11, 2013 at 12:08 PM, Alex Fink <[email protected]> wrote:
>
> For instance, it's a number bandied around that knowing 500 hanzi will allow
> you
> to read 90% of the characters in a Chinese newspaper -- but usually by people
> who don't appreciate the fact that this includes all the grammatical and
> closed-class
> words, and a swathe of basic lexis, but probably not the ìnteresting word or
> two
> in the headline you care about.
For example, if you know the most common 28 words in English you can
read 50% of everything written. But what does THAT mean if 50% means
that you can read only 50% of each sentence?
Or, if you get really ambitious you can learn 732 words and read 90%
of everything written in English. If you want to be able to read 99.9%
of everything written in English you will need to learn 2090 words.
(These figures are from my own million-word corpus taken from 20th
century fiction and non-fiction on Gutenberg.com.)
So what does it really mean to say you can read 90% by knowing 732 words?
Maybe the only meaningful measure of lexicon size is how many words
you must know to cover some specified x% of the whole of the written
corpus. That's a very different number for Toki Pona than it is for
English. That way you could talk meaningfully about a specific
language's "90% coverage lexicon", and its "98% coverage lexicon", and
so on.
--gary
Messages in this topic (15)
________________________________________________________________________
3d. Re: Typical lexicon size in natlangs
Posted by: "H. S. Teoh" [email protected]
Date: Sat May 11, 2013 2:37 pm ((PDT))
On Sat, May 11, 2013 at 01:53:10PM -0700, Gary Shannon wrote:
> On Sat, May 11, 2013 at 12:08 PM, Alex Fink <[email protected]> wrote:
> >
> > For instance, it's a number bandied around that knowing 500 hanzi
> > will allow you to read 90% of the characters in a Chinese newspaper
> > -- but usually by people who don't appreciate the fact that this
> > includes all the grammatical and closed-class words, and a swathe
> > of basic lexis, but probably not the ìnteresting word or two in the
> > headline you care about.
>
> For example, if you know the most common 28 words in English you can
> read 50% of everything written. But what does THAT mean if 50% means
> that you can read only 50% of each sentence?
>
> Or, if you get really ambitious you can learn 732 words and read 90%
> of everything written in English. If you want to be able to read 99.9%
> of everything written in English you will need to learn 2090 words.
> (These figures are from my own million-word corpus taken from 20th
> century fiction and non-fiction on Gutenberg.com.)
>
> So what does it really mean to say you can read 90% by knowing 732
> words?
>
> Maybe the only meaningful measure of lexicon size is how many words
> you must know to cover some specified x% of the whole of the written
> corpus. That's a very different number for Toki Pona than it is for
> English. That way you could talk meaningfully about a specific
> language's "90% coverage lexicon", and its "98% coverage lexicon", and
> so on.
[...]
The problem with these percentages is that they obscure a basic fact of
information theory: the most information is conveyed by the most unusual
or outstanding bits. The stuff that's repeated almost everywhere has
very low information content. So if I can understand 50% of the most
common words in a given text, but most of that 50% is just grammatical
words, then I actually *don't* understand 50% of the information
conveyed by that text, but far less, probably only 5% or so. OTOH, if
of that 50% that I understand 40% are content words, then I may have a
far better understanding of the information conveyed by the text, even
if I'm ignorant of most of the grammatical particles and constructions.
For example, given the English sentence:
Last week in an upscale neighbourhood in downtown Manhattan a
woman was brutally murdered by a suspected sex offender, thought
to be dangerously armed.
If I only know the most common grammatical words, then it would read
like this to me:
**** **** in an ******* ************* in ******** ********* a
woman was ******** ******** by a ********* *** ********, *******
to be *********** *****.
The text is essentially opaque. But if I *didn't* know common words like
"in", "an", "by", etc., but do recognise some of the keywords, what I
comprehend might be something like:
Last week ** ** ******* neighbourhood ** ******** Manhattan *
woman *** ******** murdered ** * ********* *** offender, *******
** ** *********** armed.
I can understand the gist of the text far better, even if the specific
details are incomprehensible to me. Note also that in the latter case I
only recognized 8 words, yet understood more than the first case, where
10 words were recognized but almost zero information was conveyed.
T
--
Answer: Because it breaks the logical sequence of discussion.
Question: Why is top posting bad?
Messages in this topic (15)
________________________________________________________________________
3e. Re: Typical lexicon size in natlangs
Posted by: "Logan Kearsley" [email protected]
Date: Sat May 11, 2013 4:39 pm ((PDT))
On 11 May 2013 15:36, H. S. Teoh <[email protected]> wrote:
> On Sat, May 11, 2013 at 01:53:10PM -0700, Gary Shannon wrote:
>> On Sat, May 11, 2013 at 12:08 PM, Alex Fink <[email protected]> wrote:
>> >
>> > For instance, it's a number bandied around that knowing 500 hanzi
>> > will allow you to read 90% of the characters in a Chinese newspaper
>> > -- but usually by people who don't appreciate the fact that this
>> > includes all the grammatical and closed-class words, and a swathe
>> > of basic lexis, but probably not the Änteresting word or two in the
>> > headline you care about.
>>
>> For example, if you know the most common 28 words in English you can
>> read 50% of everything written. But what does THAT mean if 50% means
>> that you can read only 50% of each sentence?
>>
>> Or, if you get really ambitious you can learn 732 words and read 90%
>> of everything written in English. If you want to be able to read 99.9%
>> of everything written in English you will need to learn 2090 words.
>> (These figures are from my own million-word corpus taken from 20th
>> century fiction and non-fiction on Gutenberg.com.)
>>
>> So what does it really mean to say you can read 90% by knowing 732
>> words?
>>
>> Maybe the only meaningful measure of lexicon size is how many words
>> you must know to cover some specified x% of the whole of the written
>> corpus. That's a very different number for Toki Pona than it is for
>> English. That way you could talk meaningfully about a specific
>> language's "90% coverage lexicon", and its "98% coverage lexicon", and
>> so on.
> [...]
>
> The problem with these percentages is that they obscure a basic fact of
> information theory: the most information is conveyed by the most unusual
> or outstanding bits. The stuff that's repeated almost everywhere has
> very low information content. So if I can understand 50% of the most
> common words in a given text, but most of that 50% is just grammatical
> words, then I actually *don't* understand 50% of the information
> conveyed by that text, but far less, probably only 5% or so. OTOH, if
> of that 50% that I understand 40% are content words, then I may have a
> far better understanding of the information conveyed by the text, even
> if I'm ignorant of most of the grammatical particles and constructions.
That's not a problem for counting how much vocabulary a particular
language has. It *is* a problem for counting how much of a particular
language's vocabulary you need to know, which might be more
enlightening anyway.
The last few weeks of my semantics class* that just ended in April
were largely concerned with how to determine how much and exactly
which vocabulary it is most essential to teach/learn for various
purposes- "general service", general academic discourse, reading
subject-field specific texts, etc. (most of the class was TESOL
students, so this was rather an important topic for them).
The vast majority of research on the topic is of the "which words make
up some percentage of the text, ordered by frequency" variety, with a
very little bit of supporting "how much comprehension do you get for a
particular level of coverage", and very very little "what counts as a
word" (which is surprisingly variable among different studies, and
contributes to different vocabulary researchers getting somewhat
different results).
There is a general understanding that there's some group of words that
especially needs to be taught/studied explicitly because they're
important for comprehension but not frequent enough to be picked up
casually, but no real general agreement as to what those are or how
best to determine them. Frustratingly, there is no cutoff point at
which learning more vocabulary starts to massively improve
comprehension, or at which learning more vocabulary suddenly stops
paying off- the graphs that come out of the few existing
vocabulary-level vs. comprehension studies have annoyingly gentle
curves.
I suspect that measuring the information content of different words in
a language would not really get you drastically different results from
just counting frequencies and coverages, since information content
should be roughly inversely proportional to frequency. But as far as I
know, that's never actually been done, so who knows, measuring
information coverage rather than just straight token count coverage
turn up some interesting things.
-l.
Messages in this topic (15)
________________________________________________________________________
________________________________________________________________________
4a. Re: "Ice age superlanguage" -- linguistics journalism at its finest
Posted by: "BPJ" [email protected]
Date: Sat May 11, 2013 12:12 pm ((PDT))
2013-05-08 21:42, Matthew Turnbull skrev:
> OK, so having now read the paper, the author does mention three main
> sources of potential error in their data. (1) historical linguists may be
> more likely to say two words are cognate, based solely on the fact that
> they expect them to be (2) some words may be more likely to have similar
> sounding phonetic representations, despite no underlying cognancy simply
> because of some innate human preference to use simple words for frequently
> used concepts, and a similar judgement of simplicity across time and space
> and
But cognates don't have to be similar-sounding,
they have to exemplify systematic correspondences!
If they don't understand that basic fact of comparative
linguistics they are bound to compare apples and oranges!
/bpj
Messages in this topic (23)
________________________________________________________________________
________________________________________________________________________
5.1. Re: the LCC5 relay is up
Posted by: "Padraic Brown" [email protected]
Date: Sat May 11, 2013 12:43 pm ((PDT))
--- On Thu, 5/9/13, H. S. Teoh <[email protected]> wrote:
> Or maybe the simplest solution is just to allot more time to
> each leg of
> the relay. :) Then most people can finish early and people
> needing more time can make use of the extra time.
Well, the sìmplest solution is just to remind relay participants that We
Know Where You Live. After 48 hours are up, we send out the lads with
their nifty collapsible iron truncheons for one, final, plea for swift
passage of the torch. If the torch ain't forthcoming, well, the details
are best left to the imagination!...
Seriously, allowing more time for each leg is exactly what has led to the
incredible seven month relay they've got going now!
For the record, I don't much care for the idea of a computer run relay
master. Half the fun is the communal effort among relay master and
translators to get the job done. If we're going to just have a computer do
the work of master, the same computer might as well just do the translating
as well! That way, no one will have to expend any actual energy on the
relay, and the relay will be all done in 4.3 seconds and we call all see
the results instantly. Badgers and all!
Padraic
> T
Messages in this topic (32)
________________________________________________________________________
5.2. Re: the LCC5 relay is up
Posted by: "Alex Fink" [email protected]
Date: Sat May 11, 2013 1:13 pm ((PDT))
May as well make here some of the points that Sai and I were going to make on
our relay text <http://conlang.org/language-creation-conference/lcc5/10-unlws/>
at the reveal, until we got pulled aside by Tamara and crew and missed it.
Structurally, in the text that came to us, the content of the advice had been
rationalised into having only two dangers, the baleful fruits and the _yuska_,
both individually potentially quite severe, but with a safe middle way between
them -- see, the _yuska_ is also wary of the fruits, and you can use this to
your advantage. As such, we chose to express the "middle way" structure very
explicitly in the advice portion of our text: the long vertical line in the
middle, referring to the son, plots a course equidistant from the line
referring to the fruits, on the left, and the _yuska_, on the right. Each
danger got one if-then cartouche structure to itself, and a third one at the
bottom described their interaction. (In retrospect the "we say the fruits are
sweet" bit was stylistically misplaced given this; it shoulda been lower.)
We were tickled at our recursive expression of "X's family / tribe" = "the set
F = (X plus parents of F plus children of F)", and I'm impressed that Tony
extracted that!
We reduced the _yuska_ to a "big cat"; and it also seems that we were also the
first language to dislodge the goats. UNLWS doesn't go in for breaking its
back over biological precision, so our basic word here is a lumper; it might be
overscientifically rendered 'caprid', or familiarly 'critter with ram-like
horns'.
The main UNLWS mechanism for indicating asides ("by the way...") is using
lighter lines, thus the section of our text drawn in pencil. It is also joined
to the rest of the text with the "conversational" variant of the grouping
glyph, indicating that it was actually added later. But that's about all there
are for ordering cues here. I tried to forget the original ordering of the
text when making my retranslation but wasn't very successful at doing so. Of
course it all got reduced to paste immediately afterwards anyhow...
Alex
Bonus: here is our English rendering of Cedh's BN text:
_(Some of) father's advice_
As a son of mine, you might come in closer to me. You have decided that you
must depart this family of yours, and you intend to travel to that mountain in
the foreign lands. I begin to think that I should maybe act as a counsellor
for you, and I mean to give you advice for your ears until you depart.
* It may happen that you will encounter a _yuska_. You must not tell it my
advice, although it may say it means to strike you down. It will frighten your
horse, but not kill it.
* You may see apples near the place where the _yuska_ is lying in wait, even
though apples might not even grow there. You will see the apples which appear
like bright and edible moons.
* You must remember that the moon always surpasses an apple in brightness. You
must remember that a normal apple is always red. You must at all costs prevent
the fake apples from reaching your horse, because those very apples can indeed
bring about pain and death.
* If you stay near the baleful apples, then you won't leave there as a corpse.
My son, you must remember this: if you stay near the baleful apples, then the
_yuska_ won't attack you.
Yes, we always say of the fake apples that they are sweet, because they keep us
safe from people. If these rare apples are indeed there, then you may be
certain that you will be kept safe from everyone.
My son, you might listen to this, anyhow: the goat begins to weep, because your
horse will likewise be departing.
(These asterisks stand for a BN section-starting discourse particle that seemed
lighter than anything written English had available.)
Messages in this topic (32)
________________________________________________________________________
5.3. Re: the LCC5 relay is up
Posted by: "H. S. Teoh" [email protected]
Date: Sat May 11, 2013 1:41 pm ((PDT))
On Sat, May 11, 2013 at 12:43:31PM -0700, Padraic Brown wrote:
> --- On Thu, 5/9/13, H. S. Teoh <[email protected]> wrote:
>
> > Or maybe the simplest solution is just to allot more time to each
> > leg of the relay. :) Then most people can finish early and people
> > needing more time can make use of the extra time.
>
> Well, the sìmplest solution is just to remind relay participants that
> We Know Where You Live. After 48 hours are up, we send out the lads
> with their nifty collapsible iron truncheons for one, final, plea for
> swift passage of the torch. If the torch ain't forthcoming, well, the
> details are best left to the imagination!...
>
> Seriously, allowing more time for each leg is exactly what has led to
> the incredible seven month relay they've got going now!
But those seven months consist mostly of idle time! Or at least, waiting
time. If we extend the time allowance to, say, 72 hours, but with strict
enforcement, then there should be no problem.
> For the record, I don't much care for the idea of a computer run relay
> master. Half the fun is the communal effort among relay master and
> translators to get the job done. If we're going to just have a
> computer do the work of master, the same computer might as well just
> do the translating as well! That way, no one will have to expend any
> actual energy on the relay, and the relay will be all done in 4.3
> seconds and we call all see the results instantly. Badgers and all!
[...]
Actually, we can already do that: translate.google.com. Just insert some
random text, say in English, then have it translate to, oh, German. Then
copy the German text, paste it into the first box, then select German as
input language, and another language as target, say Russian. Or perhaps
run through the entire list of supported languages in alphabetic order.
The resulting text will definitely be hilariously mangled from the
original. Here's an example:
Original text (purportedly a translation from Klingon, no less):
Take 1 liter of wheat and place it in a jar. Moisten with water,
and pour off the excess. Turn the jar over, and cover it with
nylon mesh to keep the grain from falling out. Each day, pour
fresh water on the grains and then pour it off, to keep them
moist. After a few days, the wheat will grow soft, and will
begin to sprout. Remove the sprouted wheat from the jar, place
it in a bowl, and mash thoroughly. Add salt. Take 120
milliliters of flaxseed and place in water. Let it sit for a
whole day. Remove the flaxseed from the water and mash
thoroughly. Place the mashed wheat and flaxseed in a bowl and
mix well. Make cakes from the combined mixture, in the form of
patties 7 centimeters in diameter. On a bright, cloudless day,
place the cakes outside, so the sun can heat them. When the
cakes have become dry and solid, place them in a canister. The
cakes are now ready to eat.
Here's the result after passing through a random selection of Google
Translate languages (about 25-30 or so):
Professor and corn in a bowl. When cool, water was added.
Zaljubiti.Gubitak excess weight, daily food. A few days later
nylon mesh, shade, moisture, and welcome to grow, and blue.
Music and footwear factories, add salt, 120 ml of water and
linoleum. Every day. Clean water, soaked clothing. 7 cm, and a
bowl of pasta and a room at pat.Plotis Nyampur organizations and
flaxseed. The solar hot water zelite.Kolac clean dry bread
kanace English tragic day.
I've no idea what turned wheat into professor, but obviously it suggests
that wheat has hitherto unsuspected superior intellect. Also, music and
footwear factories are obviously very important to the proper
application of the original recipé. The flaxseed was apparently
considered redundant, though I'm not sure about having linoleum in my
food... And soaked clothing, probably from careless spillage during the
elaborate cooking process, no doubt. I'm not sure where "Plotis Nyampur
organizations" came from -- perhaps subversive parties plotted to steal
the flaxseed? And it will definitely be a tragic day at the end of it
all, judging by how unrecognizably mangled the result was. :-P
Um...
Oh you're talking about *conlangs*? Oh, I see... well, better start
lobbying for the inclusion of conlangs in Google Translate!
:-P
On a more serious note, though: I think automation is a good thing to
have as a fallback mechanism in case people start going MIA. If all
parties were present when they're supposed to be, then the automation
shouldn't even kick in -- the relay master should be able to click on a
button to send a reminder to the current person, for example, or to skip
over to the next person. Only when people suddenly fall off the face of
the 'Net, will the automaton come along and clean up the mess and prod
the process onwards.
T
--
Never ascribe to malice that which is adequately explained by incompetence. --
Napoleon Bonaparte
Messages in this topic (32)
________________________________________________________________________
________________________________________________________________________
6a. Re: another indexing sketch
Posted by: "neo gu" [email protected]
Date: Sat May 11, 2013 3:48 pm ((PDT))
On Fri, 10 May 2013 16:51:53 -0700, H. S. Teoh <[email protected]> wrote:
>On Fri, May 10, 2013 at 12:54:16PM -0400, neo gu wrote:
>> It occurs to me that in the examples, the Apr28 sentences don't
>> actually indicate past or future. Probably, BEF-O would be added for
>> past time and AFT-O for future time here, but other temporal adverbs
>> could be used.
>[...]
>
>In a number of natlangs, "yesterday" and "tomorrow" are sometimes reused
>to mean past or future. My L1 does this to some extent, for example. (Of
>course, context makes it clear which meaning is actually intended.)
>
>I borrowed this idea in Tatari Faran: _hara_ is an adverb meaning
>"tomorrow", and _nara_ is an adverb meaning "yesterday". They also
>double as future/past tense markers:
>
> huu sa tapa hara pasanan da bata
> huu sa tapa hara pasanan na bata
> 1SG CVY:MASC walk tomorrow town RCP:MASC FIN
> I'll go to town tomorrow. (Or, I'll go to town (indefinite
> future).)
>
> huu sa tapa nara pasanan da bata
> huu sa tapa nara pasanan na bata
> 1SG CVY:MASC walk yesterday town RCP:MASC FIN
> I went to town yesterday. (Or, I went to town (indefinite
> past).)
>
>To diambiguate between "tomorrow" and future (and likewise "yesterday"
>and past), TF uses the idioms _baran hara_ (lit. tomorrow morning) to
>mean "tomorrow" and _mubun nara_ (lit. last night) to mean "yesterday".
>(Yes, these paraphrases were st... adapted from my L1. :-P) They are
>used when the speaker wishes to make clear that the precise day was
>meant, not just the generic future/past.
I'll have to remember that the next time I attempt a more naturalistic conlang,
maybe adapting it for the protolang.
>A language doesn't always have to map precisely to semantics. :)
But some are supposed to. Also, what semantics is might not always be the same
for different languages.
Messages in this topic (6)
________________________________________________________________________
________________________________________________________________________
7.1. Re: Edeinal: Language of the Edeinos
Posted by: "Logan Kearsley" [email protected]
Date: Sat May 11, 2013 7:01 pm ((PDT))
On 23 April 2013 01:09, Garth Wallace <[email protected]> wrote:
> On Mon, Apr 22, 2013 at 4:02 PM, H. S. Teoh <[email protected]> wrote:
>> On Mon, Apr 22, 2013 at 01:18:10PM -0600, Logan Kearsley wrote:
>>
>>> While it is clearly wildly different in many other ways, lots of your
>>> descriptions of the Ferochromon remind me of various aspects of the
>>> Unicorn Jelly cosmi. I wonder if that says something about general
>>> psychological tendencies when humans try to make up worlds. The
>>> Ebisedi with their proclivity for describing everything in threes
>>> would probably feel quite at home in Tryslmaistan (the primary cosmos
>>> of Unicorn Jelly) as physics there genuinely does favor triplets of
>>> everything (three electric-equivalent charges, space segmented in a
>>> triangular-faceted latticework, etc.).
>>
>> Heh. I guess it's a much more realistic take on my childhood question of
>> "what if there were 3 electric charges", than the way the Ferochromon
>> turned out to be. :-P
>
> I don't know about that. IIRC, the humans (who originally come from
> this universe and are therefore made of conventional dipolar matter)
> are only able to exist in the tripolar universe by consuming a
> particular herb. How diet can affect subatomic properties isn't really
> addressed.
It's not addressed directly in the comic, but it is explained in the
extensive supporting materials.
Basically, the charges of human biomolecules in Tryslmaistan are
mapped onto two of the three "electanic" charges; with one charge
missing, there are extra repulsive forces present that distort protein
structures, among other things. The native vlaxifurm organism
conveniently contains lots of 3rd-charge "ions" in biologically-fixed
form that the human body can absorb, which surround proteins to
neutralize the excess charge and keep all of our biomolecules stable.
On 23 April 2013 09:12, H. S. Teoh <[email protected]> wrote:
> On Tue, Apr 23, 2013 at 12:49:28AM -0600, Logan Kearsley wrote:
>> On 22 April 2013 22:34, H. S. Teoh <[email protected]> wrote:
> [...]
>> > Yeah, I realize that real physics *has* a 3-way color charge in
>> > quarks. I ruled that out early on in the development of the
>> > Ferochromon because the Yukawa potential makes color interactions
>> > obvious only at subatomic scales, and I wanted a macroscopic 3-way
>> > charge. :)
>>
>> Just having a Yukawa potential doesn't inherently restrict it to
>> subatomic scales; the range of the force is inversely proportional to
>> the mass of the force carriers. If gluons were significantly lighter,
>> strong nuclear force could extend over macroscopic distances.
>
> I see. Didn't know that before. :)
The range restriction is determined by the uncertainty principle- how
long a virtual exchange particle can exist is inversely proportional
to its energy content (i.e., mass), and how far it can travel (marking
the maximum range of the force it carries) is determined by the limit
on its lifetime times the speed of light. It is also interesting to
note that the Coulomb potential for electromagnetism is actually
identical to a Yukawa potential in which the mass term is set to 0.
>> > Graham's number exploded my perception of infinity several times
>> > over, and it's still a *finite* number! And in fact, a rather
>> > smallish one as far as huge numbers are concerned (look up Jonathan
>> > Bowers' "infinity scrapers" sometime, if you feel like having your
>> > perception of infinity blown off the face of the universe, many
>> > times over).
>>
>> I remember that experience myself. It was rather exciting.
>>
>> > But all of this is still in the realm of the *finite*. Infinity? Do
>> > we even remotely have any idea just how unbelievably huge the first
>> > countable infinity is? And here we're talking about an *uncountable*
>> > infinity? Occam's Razor already kicked in before we started
>> > counting past 1, and now we're positing an uncountable number of
>> > universes? I just find that a *little* hard to swallow. :-)
>>
>> Hm. Well, given my background in programming, I tend to think there
>> are only three natural numbers of things that require no explanation-
>> zero (because it's impossible), one, or infinite. Any other number,
>> you better have a darn good reason for.
>>
>> So, either there's only one universe, or there's an uncountably
>> infinite number, or physics got some 'splainin' to do.
>
> Hold on there, I think you may have missed my point. There is a
> difference between a *countable* infinity and an *uncountable* one. A
> countable infinity is one in which the constituent elements may be fully
> enumerated in a sequence (obviously, infinite in length). An uncountable
> infinity, OTOH, is a far vaster quantity, in that it is *inherently*
> impossible to enumerate its elements in any linear sequence.
Ah. I did miss your point, but my own opinion still stands unchanged.
If we restrict possible universes to those with our same basic
physical laws but different values for free parameters in the theory,
you still get an uncountable number, because those free parameters are
potentially real valued.
If you want a countably infinite number of universes, there needs to
be some explanation as to why free parameters are *actually* rational.
And if you *can* have real valued parameters and thus a
more-than-countable number of universes, then there's another
application of the "zero, one, many" principle- why stop at the lowest
rung of the ladder of alephs? There should be no infinities, the
smallest infinity, or as big of an infinity as you want, or else a
darn good explanation for why physics shouldn't work that way.
I lack any intuitive feeling that the total size of all existence
ought to be something that I can actually comprehend in any way.
> Yeah, I'm quite aware that the inverse square laws seem to be directly
> tied to the dimensionality of space.
>
> In fact, it has been proven that not only are planetary orbits
> impossible above 3D; the Schroedinger equation of the hydrogen atom in
> dimensions above 3 has no non-zero minima, which means that not only
> will your energy levels change radically; they will vanish altogether!
Really! Do you have references for that? I have done the exploration
with classical mechanics, but I never was adventurous enough to
experiment with generalizing the Schroedinger equation to higher
dimensions.
> So atoms as we know them cannot exist above 3D. "Peeling" a 3D being
> off into a higher-dimensional space would imply instant disintegration,
> not just into the constituent atoms, but the atoms themselves would
> disintegrate into fundamental particles. Whatever *can* exist in a
> higher dimensional space must therefore be of a *fundamentally*
> different nature than anything we're familiar with in our 3D world.
Perhaps we can save things if we fall back on force models that do not
depend on the dimensionality of space; gluons, for example, behave
oddly because of self-interaction, forming spring-like flux tubes that
transmit force independent of spatial dimensions. Perhaps similar
mechanisms could restrict the effective dimensionality of force
interactions in higher dimensional spaces to make comprehensible
structures possible.
On 24 April 2013 18:18, Alex Fink <[email protected]> wrote:
> Sorry, Jasyn, I've been meaning to read your material about Edeinos, doubly
> so in view of the interest it has generated among the good folks of the list,
> but I still haven't gotten a circular tuit. Hopefully I will yet!
>
> Ditto I have only glanced at the upriver parts of this conversation.
> Creating a consistent physics is something I completely quail at: to do it
> justice, getting the things that one wants to be emergent to actually be
> plausibly _emergent_ and not just bodged on or handwaved in, seems like the
> work of dozens of lifetimes (... as opposed to creating a believably lived-in
> language, which seems only the work of dozens of lifetimes).
That's why I focus my efforts on fiddling with free parameters. I have
occasionally pondered New Math (and also pondered the use of cellular
automata as physical models- but that's like trying to derive the
existence of people with interesting stories to tell directly from the
properties of quantum foam!), but I too rather quail at it.
> Though: [dons ranty hat] Electric charge is not to colour charge as two is to
> three, but as *one* is to three! Electric charges belong to a
> one-dimensional representation of their gauge group, colour charges to a
> three-dimensional representation of theirs.
You are right. My analogy is flawed. Red plus blue does not equal
antigreen. Red plus blue plus green, does, however, still manage to
add up to white/zero- you don't need negative charges to result in
neutral total charge, which is how I'd expect a tripolar system to
work.
> The two-ness that is manifested in the dyad of positive and negative charges
> is the fact that there are two signs of real number, positive and negative:
> i.e. that the maximal compact subgroup of the multiplicative group of the
> reals has cardinality two. So if in Ferochromatic (or like) physics you
> wanted thàt two to become a three, you'd want something like a field with
> three topological connected components, and I have nò idea where you get that
> kind of thing if you're unwilling to abandon the reals and anything to do
> with them. (Could you have a physics whose underlying numbers were the
> 3-adics?) [doffs]
I rather wonder if you couldn't get by with a two-dimensional
distribution of charges in which the basic charges that fundamental
particles can have are *not* additive inverses of each other, but
rather are distributed at 120 degrees from each other in the plane (so
that, by vector addition, red plus blue really is antigreen).
That doesn't get you the kind of round-robin interacting system that
Unicorn Jelly has, though. I had a thought that maybe one could
develop a physics in which charges represented by quaternions that
multiply with each other rather than add... but then my train of
thought exploded and I qualied at actually working something like that
out. That's what I get for deciding not to get that double-major in
physics after all.
And now I'm wondering if the fact that there are three quark colors is
related to the dimensionality of space or not....
And... back to conlanging!
>> > Right, so one could conceivably have a conlang in which predicative
>> > adjectives must be used on a generic noun, or maybe reflexive noun,
>> > that refers back to the subject. So you couldn't say "the car is
>> > red", but you have to say "the car is a red one" with "one" being a
>> > suitable generic noun.
>>
>> Yup. But, to make it more interesting, rather than just allowing free
>> choice of a suitable noun, I'd want a whole system of standardized
>> thematically appropriate classifier-nouns (rather like Tatari Faran
>> finalizers, now that I think about it).
>
> Except that the TF finalizers are no nouns; they carry predicative
> meaning (albeit said meaning has since been bleached, so they play a
> mostly grammatical role now -- except for a scant few exceptions where
> they still retain a bit of adverbial meaning).
Right, right. I meant the "rather like Tatari Faran" to apply to the
"standardized thematically appropriate" part, rather than the "-nouns"
part.
>> > These two observations put together seems to imply that adjectives
>> > behave almost exactly like monovalent verbs!
>>
>> This would not be at all unusual; Russian again provides the ANADEW,
>> as actual honest-to-goodness past tense verbs (which conveniently have
>> the same inflectional endings as predicate adjectives) are in fact
>> historically derived from short-form predicate participles.
>
> Really! Heh, I didn't know that. :) But it totally makes sense in
> retrospect. The -0, -а, -Ñ endings *are* indeed shared between the two.
> Why didn't I notice that before?!
I don't know. That is an interesting question for psycholinguistics, I
expect; verbs and non-verb predicates just fall into different classes
that one doesn't expect to necessarily have any relation. I for one
noticed that they were the same right off, but I just thought it was
an interesting coincidence, probably having something to do with sound
symbolism in gender agreement or some such. I was rather pleasantly
surprised when I found out that particular historical fact!
>> This of course causes one to wonder in which direction the convergence
>> started out historically- did adjectives getting reanalyzed as verbs
>> cause verbal inflections to change by analogy, in parallel to the
>> evolution of modern Russian, or did adjectives getting used like verbs
>> cause adjective inflections to change by analogy with an existing verb
>> paradigm?
> [...]
>
> Hmm, that's a good question! Who knows, maybe adjectives in proto-TF
> were *not* paired with finalizers? Perhaps that's a recent innovation
> inspired by analogy with verbal clauses!
That would make sense given the explanation for the origin of
finalizers. Or, to make things more interesting, perhaps *some*
adjectives do have etymologically archaic finalizers derived from
adverbs frequently used with them (like obsolete words for "very",
etc.), which helped along the analogical process for the larger body
of remaining adjectives.
-l.
Messages in this topic (33)
------------------------------------------------------------------------
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/conlang/
<*> Your email settings:
Digest Email | Traditional
<*> To change settings online go to:
http://groups.yahoo.com/group/conlang/join
(Yahoo! ID required)
<*> To change settings via email:
[email protected]
[email protected]
<*> To unsubscribe from this group, send an email to:
[email protected]
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
------------------------------------------------------------------------