There are 20 messages in this issue.

Topics in this digest:

1a. Re: Is there a CONLANG term for this?    
    From: Jim Henry
1b. Re: Is there a CONLANG term for this?    
    From: Michael Everson
1c. Re: Is there a CONLANG term for this?    
    From: Padraic Brown
1d. Re: Is there a CONLANG term for this?    
    From: Hugo Cesar de Castro Carneiro
1e. Re: Is there a CONLANG term for this?    
    From: Gary Shannon
1f. Re: Is there a CONLANG term for this?    
    From: Peter Cyrus
1g. Re: Is there a CONLANG term for this?    
    From: Charlie Brickner
1h. Re: Is there a CONLANG term for this?    
    From: Padraic Brown

2.1. Re: embodiment in language    
    From: David McCann

3a. A Self-Segmenting Orthography    
    From: Gary Shannon
3b. Re: A Self-Segmenting Orthography    
    From: MorphemeAddict
3c. Re: A Self-Segmenting Orthography    
    From: Jörg Rhiemeier
3d. Re: A Self-Segmenting Orthography    
    From: Logan Kearsley
3e. Re: A Self-Segmenting Orthography    
    From: MorphemeAddict
3f. Re: A Self-Segmenting Orthography    
    From: Christophe Grandsire-Koevoets
3g. Re: A Self-Segmenting Orthography    
    From: Christophe Grandsire-Koevoets

4a. Re: New Blog Post: Moten Part IV    
    From: neo gu
4b. Re: New Blog Post: Moten Part IV    
    From: Christophe Grandsire-Koevoets

5. Quick poll about the usage of "Googler"    
    From: Sai

6. Real Research on the Origins/Types of Linguistic Universals    
    From: Logan Kearsley


Messages
________________________________________________________________________
1a. Re: Is there a CONLANG term for this?
    Posted by: "Jim Henry" [email protected] 
    Date: Sun Dec 18, 2011 8:32 am ((PST))

On 12/17/11, Gary Shannon <[email protected]> wrote:
> But what about a language with these properties:
>
> 1) Constructed rather than spontaneous.
> 2) Created by blending both grammar AND lexicon of two related languages.
> 3) With a full and complex grammar rather than the simplified grammar
> of a pidgin.
> 4) With the goal of being mutually intelligible to both of the source
> languages.

Interlingua was pretty much this, only with more than two source
languages. If I recall correctly, it was based on the three or four
most prestigious Romance languages, with some lexical input from
English.  That's why it has irregularities and complexities that most
other auxlangs eschew, because all the source languages have them
(multiple verb classes, irrregular "to be", a few other things).

> In other words, a carefully engineered re-unification of the two
> source languages. Imagine, for example, re-unifying Spanish and
> Italian, or Dutch and German. And by that I don't mean a generic
> Germanic conlang, or a generic Romance conlang, but a very specific
> targeting of two natlangs to be the source of every piece of grammar
> and lexicon that is used.

I suppose one way to do it is to restrict your root vocabulary to
words which are cognate in both/all of the source languages, and try
to derive all other words from those roots.  It could lead to some
interesting compounds or set phrases, where words for more basic
concepts are derived from words for more complex ones, because the
source languages both retained the ancestral terms for the complex
concepts, but one of them happened to replace its word for the simpler
concept by borrowing.

> What would be a good term for such a midway language? A morph? A
> bridge language? A blend-lang? A linguistic halfway house?

I'd go for "blendlang".

> Then imagine using the result as one of the source languages to create
> a second generation blend. For example, having blended Dutch and
> German into one hybrid, and English and Frisian into another,
> different hybrid, blend those two hybrids with each other. You could
> go right back up the IE family tree, not trying to recreate the

It could be fun, but probably not potentially useful like a more
restricted blendlang.  A blend of two closely related but not mutually
comprehensible languages would be easy to learn and maybe readable at
sight for speakers of both the source languages.  But once you start
blending in more distantly related languages, that ready
comprehensibility would quickly be lost, and ease of learning would
also fall off rapidly.  But this is starting to get into AUXLANG
territory, so I'll shut up now.

-- 
Jim Henry
http://www.pobox.com/~jimhenry/





Messages in this topic (9)
________________________________________________________________________
1b. Re: Is there a CONLANG term for this?
    Posted by: "Michael Everson" [email protected] 
    Date: Sun Dec 18, 2011 9:11 am ((PST))

On 18 Dec 2011, at 02:27, Gary Shannon wrote:

> We have a number of different terms for contact languages such as
> "pidgin", "creole", "jargon", etc.
> 
> But what about a language with these properties:
> 
> 1) Constructed rather than spontaneous.

Isn't this all conlangs?

> 2) Created by blending both grammar AND lexicon of two related languages.

"Hybrid" fits here.

Michael Everson * http://www.evertype.com/





Messages in this topic (9)
________________________________________________________________________
1c. Re: Is there a CONLANG term for this?
    Posted by: "Padraic Brown" [email protected] 
    Date: Sun Dec 18, 2011 1:49 pm ((PST))

--- On Sat, 12/17/11, Gary Shannon <[email protected]> wrote:

> We have a number of different terms
> for contact languages such as
> "pidgin", "creole", "jargon", etc.
> 
> But what about a language with these properties:
> 
> 1) Constructed rather than spontaneous.
> 2) Created by blending both grammar AND lexicon of two
> related languages.
> 3) With a full and complex grammar rather than the
> simplified grammar
> of a pidgin.
> 4) With the goal of being mutually intelligible to both of
> the source languages.

Yes, there is a term for such things:

AUXLANG.

(:

Without property #4, you're simply describing any of the many binary 
mixlangs -- the Latin + Germanic or Slavic + Austronesian hybrids.

> In other words, a carefully engineered re-unification of
> the two source languages. 

This is a slightly different take on the above. In stead of mixing two
quite different languages, you're simply mixing two closely related ones.
I'd still classify it a mixlang, because it's still a hybrid.

> Imagine, for example, re-unifying Spanish and
> Italian, or Dutch and German. And by that I don't mean a generic
> Germanic conlang, or a generic Romance conlang, but a very specific
> targeting of two natlangs to be the source of every piece of grammar
> and lexicon that is used.

This is basically what we've done who've mixed two languages into a
hybrid.

> What would be a good term for such a midway language? A
> morph? A
> bridge language? A blend-lang? A linguistic halfway house?

Mestizolang. Mixlang.

> Then imagine using the result as one of the source
> languages to create
> a second generation blend. For example, having blended
> Dutch and
> German into one hybrid, and English and Frisian into
> another,
> different hybrid, blend those two hybrids with each other.
> You could
> go right back up the IE family tree, not trying to recreate
> the
> ancestral languages, but trying to create NEW unifications
> of the
> MODERN languages. By going all the way up the tree to the
> root you
> could eventually create a NEW proto-Indo-European. Not a
> reconstruction of the original ancient PIE, mind you, but a
> MODERN
> balanced blend of all modern IE languages. What fun!

An interesting project indeed! I'd still call it the same thing.

> The key is to not try to over-reach. Trying to blend
> Serbian with
> Irish Gaelic in one step would be just silly. 

Not really silly -- just a different design principle.

> It would take probably 6
> or 8 levels of blending before you even go to the blended
> Balto-Slavic, and that's still along ways from Irish
> Gaelic.

I'm not sure you can work back in time with this method, but I do think you
could end up with an "Average Indo-European" if you did a "tree blend"
scheme. Just like with the World Cup, you start out with a huge pool of
languages and dialects. Carefully pair up near relatives (like English and
Frisian; High and Low German; Swedish and Norwegian; and do the same with
all the other families). As you reduce the number of languages in each
family you'd at least end up with "Average Germanic" and "Average Romance".

The only difficulty I'd see is when it comes to mixing those two together.
They're related, but not closely. It's easy enough to average "man", "mann"
and "maðr" together -- but how do you mix the averaged "mant" with the
other averaged "omo"? Unless you looked for archaic roots like "were" and
"vir"!

> And just to keep it simple, maybe the blending should only happen
> between living languages so that Latin and Sanskrit don't qualify.

Well, that takes care of looking for archaic roots!

> Only their modern descendants are used.
> 
> But of course, that's a much more ambitious project than
> merely
> blending two languages. 

Ambitious only in length of scope -- once you've got the basic mechanics
down, it's just a matter of doing twenty or thirty more times!

> It is, however, a good candidate for
> "distributed processing". If one team were to be working on
> the
> Central Indic branch while another was working on the
> Western Romance
> branch, then eventually they could pool their results into
> the grand
> reunification. Hehe. Now there's a conlang project of truly
> monumental
> proportions.
> 
> Has anyone tried a conlang of this sort? It just sounds
> fascinating to
> me. I wonder what "Modern PIE" would sound and look like.

Indeed!

Padraic

> --gary





Messages in this topic (9)
________________________________________________________________________
1d. Re: Is there a CONLANG term for this?
    Posted by: "Hugo Cesar de Castro Carneiro" [email protected] 
    Date: Sun Dec 18, 2011 8:03 pm ((PST))

I agree with all other people who answered your question. The best name for
it would be a "mixlang" or "blendlang";

This reminds me of a project of mine.

I'm trying to reconstruct the proto-languages that descended from Vulgar
Latin. As Vulgar Latin was a dialect continuum there is no true
proto-language, becaiuse of this it is a conlang and not a real
reconstruction.
I'm trying to re-create the vocabulary of these proto-language by getting
the cognates of a given word in the modern languges and trying to "undo"
the sound changes, like reverse engineering.

For example, I'm trying to reconstruct Proto-Hispano-Romance,
Proto-Gallo-Ibero-Romance and Proto-Gallo-Romance (and maybe
Proto-Western-Romance too).

To create the former one I am trying to reconstruct Old
Gallician-Portuguese through eliminating nasal vowels and making the nasal
codas pronounced, allowing open vowels before nasal codas, undoing the
global raising that happened to the unstressed vowels and some other stuff
that don't exist anymore in Portuguese (nor in Gallician). I am trying to
reconstrct Old Spanish too, by transforming /je/ in /E/ and /we/ in /O/ and
so on.
Through this I recreated words like òvo /"Ovo/ for egg (huevo /"weBo/ in
Spanish - reconstructed as /"OBo/ and ovo /"ovu/ in Portuguese
reconstructed as /"Ovo/, because its plural form is irregular ovos
/"OvuS/), èlmo /"Elmo/ for helmet (yelmo /"jelmo/ in Spanish and elmo
/"E5mu/ in Portuguese) and so on.
Some words required much more complex procedures. /jt/ in Old Spanish
became /tS/, so milk, which is leite /"lejt1/ in Portuguese and leche
/"letSe/ in Spanish would become something like leite /"lejte/. Fight is
also an interesting word it is luta /"lut@/ in Portuguese and lucha
/"lutSa/ in Spanish, through Spanish "ch" I came to discover that it should
be reconstructed as luita /"lujta/ (this "i" would have been elided through
time), which is confirmed by Catalan word lluita /"Lujt@/.

But there are still some problematic cases. To drink is beber /"b1be4/ in
Portuguese and it is also beber /"beBe4/ in Spanish, but its reconstructed
form is bever /"beBe4/, which is confirmed by Catalan word beure /"bew4@/
and French word boire /"bwaR/ (and also whrough Englsih word *bever*age ;-))

Anyway, there are also word which are not cognate, like the word form home
- PT lar and ES hogar, or the word stay/become - PT ficar and ES quedar.
These words are yet a big problem for me.

And I still must search these words in Asturian-Leonese languages.

In other words, to create these "reconstructions" you need to analyze all
(or at least a reasonable amount of) languages in these sets, you must
consider words outside the set in order to reconstruct it correctly
("bever" case) and there are still a group of words with no direct cognate,
so it can really be a difficult task (but yet very interesting ;-))


On Sun, Dec 18, 2011 at 12:27 AM, Gary Shannon <[email protected]> wrote:

> We have a number of different terms for contact languages such as
> "pidgin", "creole", "jargon", etc.
>
> But what about a language with these properties:
>
> 1) Constructed rather than spontaneous.
> 2) Created by blending both grammar AND lexicon of two related languages.
> 3) With a full and complex grammar rather than the simplified grammar
> of a pidgin.
> 4) With the goal of being mutually intelligible to both of the source
> languages.
>
> In other words, a carefully engineered re-unification of the two
> source languages. Imagine, for example, re-unifying Spanish and
> Italian, or Dutch and German. And by that I don't mean a generic
> Germanic conlang, or a generic Romance conlang, but a very specific
> targeting of two natlangs to be the source of every piece of grammar
> and lexicon that is used.
>
> What would be a good term for such a midway language? A morph? A
> bridge language? A blend-lang? A linguistic halfway house?
>
> Then imagine using the result as one of the source languages to create
> a second generation blend. For example, having blended Dutch and
> German into one hybrid, and English and Frisian into another,
> different hybrid, blend those two hybrids with each other. You could
> go right back up the IE family tree, not trying to recreate the
> ancestral languages, but trying to create NEW unifications of the
> MODERN languages. By going all the way up the tree to the root you
> could eventually create a NEW proto-Indo-European. Not a
> reconstruction of the original ancient PIE, mind you, but a MODERN
> balanced blend of all modern IE languages. What fun!
>
> The key is to not try to over-reach. Trying to blend Serbian with
> Irish Gaelic in one step would be just silly. It would take probably 6
> or 8 levels of blending before you even go to the blended
> Balto-Slavic, and that's still along ways from Irish Gaelic. And just
> to keep it simple, maybe the blending should only happen between
> living languages so that Latin and Sanskrit don't qualify. Only their
> modern descendants are used.
>
> But of course, that's a much more ambitious project than merely
> blending two languages. It is, however, a good candidate for
> "distributed processing". If one team were to be working on the
> Central Indic branch while another was working on the Western Romance
> branch, then eventually they could pool their results into the grand
> reunification. Hehe. Now there's a conlang project of truly monumental
> proportions.
>
> Has anyone tried a conlang of this sort? It just sounds fascinating to
> me. I wonder what "Modern PIE" would sound and look like.
>
> --gary
>





Messages in this topic (9)
________________________________________________________________________
1e. Re: Is there a CONLANG term for this?
    Posted by: "Gary Shannon" [email protected] 
    Date: Sun Dec 18, 2011 9:41 pm ((PST))

Your project is very interesting to me as a student of Spanish. The
biggest difference, of course, is that my idea was not to reconstruct
backwards in time, but to re-mix, forward in time, so as not to
reconstruct proto-hispano-romance, but to speculate on what some
future mixture of, for example, Spanish and Portuguese might become if
a dozen native speakers of each were marooned on a distant planet for
ten generations with no outside contact (and no books in either
language to provide a stabilizing influence).

I would like to hear more about your project, though. It sounds fascinating.

--gary

On Sun, Dec 18, 2011 at 8:02 PM, Hugo Cesar de Castro Carneiro
<[email protected]> wrote:
> I agree with all other people who answered your question. The best name for
> it would be a "mixlang" or "blendlang";
>
> This reminds me of a project of mine.
>
> I'm trying to reconstruct the proto-languages that descended from Vulgar
> Latin. As Vulgar Latin was a dialect continuum there is no true
> proto-language, becaiuse of this it is a conlang and not a real
> reconstruction.

[[snipping muchas cosas interesantes]]





Messages in this topic (9)
________________________________________________________________________
1f. Re: Is there a CONLANG term for this?
    Posted by: "Peter Cyrus" [email protected] 
    Date: Mon Dec 19, 2011 1:54 am ((PST))

Busca "LENGUAS DE LA PENINSULA IBERICA", editoriales Ibarantia, Palencia

On Mon, Dec 19, 2011 at 6:41 AM, Gary Shannon <[email protected]> wrote:
> Your project is very interesting to me as a student of Spanish. The
> biggest difference, of course, is that my idea was not to reconstruct
> backwards in time, but to re-mix, forward in time, so as not to
> reconstruct proto-hispano-romance, but to speculate on what some
> future mixture of, for example, Spanish and Portuguese might become if
> a dozen native speakers of each were marooned on a distant planet for
> ten generations with no outside contact (and no books in either
> language to provide a stabilizing influence).
>
> I would like to hear more about your project, though. It sounds fascinating.
>
> --gary
>
> On Sun, Dec 18, 2011 at 8:02 PM, Hugo Cesar de Castro Carneiro
> <[email protected]> wrote:
>> I agree with all other people who answered your question. The best name for
>> it would be a "mixlang" or "blendlang";
>>
>> This reminds me of a project of mine.
>>
>> I'm trying to reconstruct the proto-languages that descended from Vulgar
>> Latin. As Vulgar Latin was a dialect continuum there is no true
>> proto-language, becaiuse of this it is a conlang and not a real
>> reconstruction.
>
> [[snipping muchas cosas interesantes]]





Messages in this topic (9)
________________________________________________________________________
1g. Re: Is there a CONLANG term for this?
    Posted by: "Charlie Brickner" [email protected] 
    Date: Mon Dec 19, 2011 5:06 am ((PST))

On Mon, 19 Dec 2011 02:02:57 -0200, Hugo Cesar de Castro Carneiro 
<[email protected]> wrote:

>Anyway, there are also word which are not cognate, like the word form home
>- PT lar and ES hogar,....

Is it really helpful to consider the meaning "home" for 'hogar' in your 
analysis, 
when that is not its original meaning?

The original meaning is (according to "Peque�o Larousse") "El sitio donde se 
enciende lumbre", with synonyms 'horno', 'fuego', et al.  I would imagine that 
the connotation "home" derives from the denotation "hearth".

The word only figuratively means 'casa'.

Charlie





Messages in this topic (9)
________________________________________________________________________
1h. Re: Is there a CONLANG term for this?
    Posted by: "Padraic Brown" [email protected] 
    Date: Mon Dec 19, 2011 5:32 am ((PST))

--- On Mon, 12/19/11, Charlie Brickner <[email protected]> wrote:

> From: Charlie Brickner <[email protected]>
> Subject: Re: [CONLANG] Is there a CONLANG term for this?
> To: [email protected]
> Date: Monday, December 19, 2011, 8:06 AM
> On Mon, 19 Dec 2011 02:02:57 -0200,
> Hugo Cesar de Castro Carneiro 
> <[email protected]>
> wrote:
> 
> >Anyway, there are also word which are not cognate, like
> the word form home
> >- PT lar and ES hogar,....
> 
> Is it really helpful to consider the meaning "home" for
> 'hogar' in your analysis, 
> when that is not its original meaning?
> 
> The original meaning is (according to "Pequeño Larousse")
> "El sitio donde se 
> enciende lumbre", with synonyms 'horno', 'fuego', et
> al.  I would imagine that 
> the connotation "home" derives from the denotation
> "hearth".
> 
> The word only figuratively means 'casa'.

Yep. Just like Caesar said: fogar est d' unde est fogo.

> Charlie

Padraic





Messages in this topic (9)
________________________________________________________________________
________________________________________________________________________
2.1. Re: embodiment in language
    Posted by: "David McCann" [email protected] 
    Date: Sun Dec 18, 2011 8:43 am ((PST))

On Sat, 17 Dec 2011 16:05:22 -0500
Douglas Koller <[email protected]> wrote:
>  
> Indeed, though I certainly concur with Sam's definition. This is
> related to "reserve" for me; "grace" is doing "reserve" well :)
>  
> As I understand "God's grace" ... it's more
> than just God's being a nice guy, beaming benevolence and
> doing-kindness-to-people-edness.

The English word is of course borrowed from the Latin 'gratia', which
is a close match for the Greek 'charis'. The original sense is the
physical one, the quality of pleasing by appearance or conduct; the
Greek is the most general, enabling one to talk of the 'charis' of
a piece of jewelry.

The metaphorical sense then refers to 'beautiful' behaviour: grace is
benevolence and kindness done or received. This sense is commoner in
Latin and Greek than in English. The Greek also covers 'gratitude':
tois theois charis hoti... 'thank the Gods that...'

The English 'there but for the grace of God go I'  has nothing to do
with original sin: it just means that my avoidance of the situation is
felt to be so much more than I deserve that it feels like a act of
divine providence.





Messages in this topic (113)
________________________________________________________________________
________________________________________________________________________
3a. A Self-Segmenting Orthography
    Posted by: "Gary Shannon" [email protected] 
    Date: Sun Dec 18, 2011 12:04 pm ((PST))

Here's a random idea for a writing system.

There is an alphabet of 25 or 30 characters that share the same visual
aspect ratio being roughly the same proportions as the Roman capitals.

Then there is a second set of perhaps only 8 or 10 characters that are
visually distinct from the first group, perhaps by being very slender,
or by having ascenders or descenders.

The roots of the language are spelled with letters from the first set.
But no word is complete until some suffix is attached to specify what
aspect of the root that word represents. Thus we might have a root
that deal generally with human relationships of an agreeable nature.
>From this root, with appropriate suffixes, we might derive specific
words like "friend", "friendship", "friendly", "befriend",
"cooperate", "partner", "companion", etc., and by using negating and
intensifying characters in the suffix, such words as "love", "spouse",
"enmity", "enemy", "hatred", and so on.

With 8 characters in the suffix alphabet, and suffixes of 1 to 3
characters in length, there could be up to 8+64+512 = 584 specific
words derived from each root.

With 30 characters in the root alphabet and roots of one or two
characters in length there could be as many as 930 roots.

That would give a total maximum lexicon of 930*584 = 543,120 words,
where every word was at least two and at most five characters in
length.

Now any unbroken stream of symbols could be separated into individual
words by virtue of the fact that every word begins with a character
from the root alphabet and ends with a character from the suffix
alphabet. For example, using the uppercase roman letters as the root
characters and the lowercase letters a,b,c,d,e,f,g,h as the tails, or
suffix alphabet, the stream: HNbYusKUeBqoPNi can unambiguously be
parsed into the words: HNb Yus KUe Bqo PNi.

In addition, the words could be unambiguously parsed into root and
suffix just as easily.

Of course the alphabet should look cooler than the Roman alphabet, so
using Roman letters for this is not a serious suggestion. The two
groups might be distinguished from each other by aspect ratio, by
angular vs. rounded, (Runic vs Tai Lue), by stylistic differences
(Tibetan vs Greek), or by any other obvious visual distinction. In
fact the individual characters might be selected from a variety of
different Unicode alphabets so that a new font wouldn't even have to
be created for it.

Now the really interesting aspect of this is that the assignment of
the character sequences to the roots could be completely arbitrary,
without regard to the spoken sound of the word. That being the case,
the system of writing would be independent of any specific phonology.
A string of characters could just as easily represent a single English
word as a single Italian word, assuming that two specific words with
some degree of semantic commonality exist in those two languages. Or
the root+suffix could represent a concept for which a single English
or Italian word does not exist, but for which close approximations, or
multi-word circumlocutions could be made.

So the writing would be arbitrary, but not pictographic or
ideographic, and not connected in any way to the phonology of any
language. It should probably used with a non-inflecting language since
we wouldn't want to waste lexical space on inflections. That would
just needlessly reduce the maximum potential lexicon size. (Or perhaps
a third class of alphabetic characters, or even diacritics, used only
to represent inflections.)

--gary





Messages in this topic (7)
________________________________________________________________________
3b. Re: A Self-Segmenting Orthography
    Posted by: "MorphemeAddict" [email protected] 
    Date: Sun Dec 18, 2011 12:22 pm ((PST))

This is similar to how I process written Japanese. I start a new phrase
every time I encounter a punctuation mark or a change from kana to kanji.
There may be more such rules, but I can't think of them right now.

stevo

On Sun, Dec 18, 2011 at 3:04 PM, Gary Shannon <[email protected]> wrote:

> Here's a random idea for a writing system.
>
> There is an alphabet of 25 or 30 characters that share the same visual
> aspect ratio being roughly the same proportions as the Roman capitals.
>
> Then there is a second set of perhaps only 8 or 10 characters that are
> visually distinct from the first group, perhaps by being very slender,
> or by having ascenders or descenders.
>
> The roots of the language are spelled with letters from the first set.
> But no word is complete until some suffix is attached to specify what
> aspect of the root that word represents. Thus we might have a root
> that deal generally with human relationships of an agreeable nature.
> From this root, with appropriate suffixes, we might derive specific
> words like "friend", "friendship", "friendly", "befriend",
> "cooperate", "partner", "companion", etc., and by using negating and
> intensifying characters in the suffix, such words as "love", "spouse",
> "enmity", "enemy", "hatred", and so on.
>
> With 8 characters in the suffix alphabet, and suffixes of 1 to 3
> characters in length, there could be up to 8+64+512 = 584 specific
> words derived from each root.
>
> With 30 characters in the root alphabet and roots of one or two
> characters in length there could be as many as 930 roots.
>
> That would give a total maximum lexicon of 930*584 = 543,120 words,
> where every word was at least two and at most five characters in
> length.
>
> Now any unbroken stream of symbols could be separated into individual
> words by virtue of the fact that every word begins with a character
> from the root alphabet and ends with a character from the suffix
> alphabet. For example, using the uppercase roman letters as the root
> characters and the lowercase letters a,b,c,d,e,f,g,h as the tails, or
> suffix alphabet, the stream: HNbYusKUeBqoPNi can unambiguously be
> parsed into the words: HNb Yus KUe Bqo PNi.
>
> In addition, the words could be unambiguously parsed into root and
> suffix just as easily.
>
> Of course the alphabet should look cooler than the Roman alphabet, so
> using Roman letters for this is not a serious suggestion. The two
> groups might be distinguished from each other by aspect ratio, by
> angular vs. rounded, (Runic vs Tai Lue), by stylistic differences
> (Tibetan vs Greek), or by any other obvious visual distinction. In
> fact the individual characters might be selected from a variety of
> different Unicode alphabets so that a new font wouldn't even have to
> be created for it.
>
> Now the really interesting aspect of this is that the assignment of
> the character sequences to the roots could be completely arbitrary,
> without regard to the spoken sound of the word. That being the case,
> the system of writing would be independent of any specific phonology.
> A string of characters could just as easily represent a single English
> word as a single Italian word, assuming that two specific words with
> some degree of semantic commonality exist in those two languages. Or
> the root+suffix could represent a concept for which a single English
> or Italian word does not exist, but for which close approximations, or
> multi-word circumlocutions could be made.
>
> So the writing would be arbitrary, but not pictographic or
> ideographic, and not connected in any way to the phonology of any
> language. It should probably used with a non-inflecting language since
> we wouldn't want to waste lexical space on inflections. That would
> just needlessly reduce the maximum potential lexicon size. (Or perhaps
> a third class of alphabetic characters, or even diacritics, used only
> to represent inflections.)
>
> --gary
>





Messages in this topic (7)
________________________________________________________________________
3c. Re: A Self-Segmenting Orthography
    Posted by: "Jörg Rhiemeier" [email protected] 
    Date: Sun Dec 18, 2011 12:41 pm ((PST))

Hallo conlangers!

On Sunday 18 December 2011 21:21:34 MorphemeAddict wrote:

> This is similar to how I process written Japanese. I start a new phrase
> every time I encounter a punctuation mark or a change from kana to kanji.
> There may be more such rules, but I can't think of them right now.

Yes.  In a Japanese text, you can easily tell what are word
roots and what are suffixes because the former are written in
kanji and the latter in hiragana, and the two scripts are visually
quite different, with hiragana being much simpler than most kanji
and full of curved strokes which do not occur often in kanji (at
least not in carefully written and in printed ones).  Also, as
Japanese doesn't have prefixes, you can be sure about meeting a
word boundary wherever a kanji follows a hiragana.  And thirdly,
most words *do* have suffixes in texts (I can't think of any
situation where a bare stem occurs), so where a kanji follows
another kanji, the two kanji probably form a compound (such
compounds are very common in Japanese).

--
... brought to you by the Weeping Elf
http://www.joerg-rhiemeier.de/Conlang/index.html
"Bêsel asa Êm, a Êm atha cvanthal a cvanth atha Êmel." - SiM 1:1





Messages in this topic (7)
________________________________________________________________________
3d. Re: A Self-Segmenting Orthography
    Posted by: "Logan Kearsley" [email protected] 
    Date: Sun Dec 18, 2011 4:22 pm ((PST))

On 18 December 2011 13:40, Jörg Rhiemeier <[email protected]> wrote:
> Hallo conlangers!
>
> On Sunday 18 December 2011 21:21:34 MorphemeAddict wrote:
>
>> This is similar to how I process written Japanese. I start a new phrase
>> every time I encounter a punctuation mark or a change from kana to kanji.
>> There may be more such rules, but I can't think of them right now.
>
> Yes.  In a Japanese text, you can easily tell what are word
> roots and what are suffixes because the former are written in
> kanji and the latter in hiragana, and the two scripts are visually
> quite different, with hiragana being much simpler than most kanji
> and full of curved strokes which do not occur often in kanji (at
> least not in carefully written and in printed ones).  Also, as
> Japanese doesn't have prefixes, you can be sure about meeting a
> word boundary wherever a kanji follows a hiragana.  And thirdly,
> most words *do* have suffixes in texts (I can't think of any
> situation where a bare stem occurs), so where a kanji follows
> another kanji, the two kanji probably form a compound (such
> compounds are very common in Japanese).

Oh my goodness! Why couldn't a grad student've told me this when I was
trying to write a Japanese tokenizer?

I suspect these rules are far from perfect, 'cause I was recently
informed that the state-of-the-art in automatic tokenization of
Japanese only got around 80% correct, but it's nice to know that
there's some sort of system.

Are there any similar (or not similar) useful rules for tokenizing
written Chinese?

-l.





Messages in this topic (7)
________________________________________________________________________
3e. Re: A Self-Segmenting Orthography
    Posted by: "MorphemeAddict" [email protected] 
    Date: Sun Dec 18, 2011 6:19 pm ((PST))

On Sun, Dec 18, 2011 at 7:22 PM, Logan Kearsley <[email protected]>wrote:

> On 18 December 2011 13:40, Jörg Rhiemeier <[email protected]> wrote:
> > Hallo conlangers!
> >
> > On Sunday 18 December 2011 21:21:34 MorphemeAddict wrote:
> >
> >> This is similar to how I process written Japanese. I start a new phrase
> >> every time I encounter a punctuation mark or a change from kana to
> kanji.
> >> There may be more such rules, but I can't think of them right now.
> >
> > Yes.  In a Japanese text, you can easily tell what are word
> > roots and what are suffixes because the former are written in
> > kanji and the latter in hiragana, and the two scripts are visually
> > quite different, with hiragana being much simpler than most kanji
> > and full of curved strokes which do not occur often in kanji (at
> > least not in carefully written and in printed ones).  Also, as
> > Japanese doesn't have prefixes, you can be sure about meeting a
> > word boundary wherever a kanji follows a hiragana.  And thirdly,
> > most words *do* have suffixes in texts (I can't think of any
> > situation where a bare stem occurs), so where a kanji follows
> > another kanji, the two kanji probably form a compound (such
> > compounds are very common in Japanese).
>
> Oh my goodness! Why couldn't a grad student've told me this when I was
> trying to write a Japanese tokenizer?
>
> I suspect these rules are far from perfect, 'cause I was recently
> informed that the state-of-the-art in automatic tokenization of
> Japanese only got around 80% correct, but it's nice to know that
> there's some sort of system.
>
> Many common words, such as postpositions, conjunctions and particles, as
well as certain nouns and verbs, are written in hiragana, and maybe that's
why the tokenizer did so poorly.


> Are there any similar (or not similar) useful rules for tokenizing
> written Chinese?
>
I wish.

stevo





Messages in this topic (7)
________________________________________________________________________
3f. Re: A Self-Segmenting Orthography
    Posted by: "Christophe Grandsire-Koevoets" [email protected] 
    Date: Mon Dec 19, 2011 1:50 am ((PST))

On 18 December 2011 21:04, Gary Shannon <[email protected]> wrote:

>
> So the writing would be arbitrary, but not pictographic or
> ideographic, and not connected in any way to the phonology of any
> language. It should probably used with a non-inflecting language since
> we wouldn't want to waste lexical space on inflections. That would
> just needlessly reduce the maximum potential lexicon size. (Or perhaps
> a third class of alphabetic characters, or even diacritics, used only
> to represent inflections.)
>
>
This is not unlike the writing system for Azak, one of my first conlang
(you can read about it here:
http://rainbow.conlang.free.fr/Conlang/MesConlangs/Azak/ecriture.html,
although I must warn you that it's in French. There are pictures of the
writing system though :) ). Basically, in Azak stems are written using an
angular alphabet, while suffixes are written with a syllabary (a weird one,
most VC, with a few VCVC signs). There are no prefixes, and all words have
at least one suffix, so segmenting is obvious and spaces are not necessary.
-- 
Christophe Grandsire-Koevoets.

http://christophoronomicon.blogspot.com/
http://www.christophoronomicon.nl/





Messages in this topic (7)
________________________________________________________________________
3g. Re: A Self-Segmenting Orthography
    Posted by: "Christophe Grandsire-Koevoets" [email protected] 
    Date: Mon Dec 19, 2011 1:59 am ((PST))

On 18 December 2011 21:40, Jörg Rhiemeier <[email protected]> wrote:

> Hallo conlangers!
>
> On Sunday 18 December 2011 21:21:34 MorphemeAddict wrote:
>
> > This is similar to how I process written Japanese. I start a new phrase
> > every time I encounter a punctuation mark or a change from kana to kanji.
> > There may be more such rules, but I can't think of them right now.
>
> Yes.  In a Japanese text, you can easily tell what are word
> roots and what are suffixes because the former are written in
> kanji and the latter in hiragana, and the two scripts are visually
> quite different, with hiragana being much simpler than most kanji
> and full of curved strokes which do not occur often in kanji (at
> least not in carefully written and in printed ones).  Also, as
> Japanese doesn't have prefixes, you can be sure about meeting a
> word boundary wherever a kanji follows a hiragana.


That's not completely true. Japanese has at least one prefix, the honorific
o-/go-, which is most often written in hiragana. So this complicates things
a bit.


>  And thirdly,
> most words *do* have suffixes in texts (I can't think of any
> situation where a bare stem occurs), so where a kanji follows
> another kanji, the two kanji probably form a compound (such
> compounds are very common in Japanese).
>
>
In formal texts that may be mostly true (but not completely: numbers, for
instance, are often put in front of the verb without a suffix), but as soon
as you're looking at more informal texts it's not necessarily the case.
Song texts for instance are full of instances of bare stems without
suffixes, and informal speech often does away with particles.
Also, you have words that just aren't written in kanji, so you can have
hiragana following other hiragana but being part of a different word. And
then you have katakana :) .

Japanese orthography isn't as self-segmenting as people often think. As
with everything natlangy, it's messy :P .
-- 
Christophe Grandsire-Koevoets.

http://christophoronomicon.blogspot.com/
http://www.christophoronomicon.nl/





Messages in this topic (7)
________________________________________________________________________
________________________________________________________________________
4a. Re: New Blog Post: Moten Part IV
    Posted by: "neo gu" [email protected] 
    Date: Sun Dec 18, 2011 2:35 pm ((PST))

On Sun, 18 Dec 2011 05:50:36 +0100, Christophe Grandsire-Koevoets 
<[email protected]> wrote:

>On 16 December 2011 21:26, neo gu <[email protected]> wrote:
>
>> Some first thoughts: I had trouble following the examples at first
>> due to not having read the other blog posts (at least not recently).
>> Let's see; there are 36 constructions: 3 auxiliary tenses, 2
>> auxiliaries, 2 non-finite forms, and 3 cases, right? I'm partial to
>> having tables summarize things, in this case, the interpretation of
>> the 36 things, or maybe just a 12-entry table, omitting tense. It
>> might make the connection between interpretation* and case
>> clearer, if possible.
>
> You're definitely right, so I did something I rarely do: I updated an
> already published post. I added a 12-entry table at the end of the
> description of the various periphrastic conjugations. Go take a look
> at it and see if it might make things easier to understand. As far as
> I can see, it does shed some light on the modalities, but the rest
> seems rather random (hint: it's not *completely* random :) ).

Thanks, it does help.

>
>> * I'm not sure what the appropriate word is here.
>
>I just call it "meaning assignment".

In the next part, you write

"Bdan ipelda|n todvaj ige: (I) want to keep seeing you (the verb ipe|laj 
is in the imperfective aspect, while the auxiliary atom is in the 
desiderative mood)."

I think _atom_ in the description is inconsistent with _ige_ in the 
example.

That part isn't perfectly clear yet; I'll have to read it again.

>--
>Christophe Grandsire-Koevoets.
>
>http://christophoronomicon.blogspot.com/
>http://www.christophoronomicon.nl/





Messages in this topic (10)
________________________________________________________________________
4b. Re: New Blog Post: Moten Part IV
    Posted by: "Christophe Grandsire-Koevoets" [email protected] 
    Date: Mon Dec 19, 2011 1:27 am ((PST))

On 18 December 2011 23:35, neo gu <[email protected]> wrote:

> >
> > You're definitely right, so I did something I rarely do: I updated an
> > already published post. I added a 12-entry table at the end of the
> > description of the various periphrastic conjugations. Go take a look
> > at it and see if it might make things easier to understand. As far as
> > I can see, it does shed some light on the modalities, but the rest
> > seems rather random (hint: it's not *completely* random :) ).
>
> Thanks, it does help.
>
>
You're welcome :) .


> >
> >> * I'm not sure what the appropriate word is here.
> >
> >I just call it "meaning assignment".
>
> In the next part, you write
>
> "Bdan ipelda|n todvaj ige: (I) want to keep seeing you (the verb ipe|laj
> is in the imperfective aspect, while the auxiliary atom is in the
> desiderative mood)."
>
> I think _atom_ in the description is inconsistent with _ige_ in the
> example.
>
>
Why? It's a concatenation of two periphrastic forms: _ipe|laj_ is in the
imperfective aspect (accusative infinitive + _atom_), and to put the whole
thing in the desiderative mood the auxiliary _atom_ is put in the
desiderative mood (genitive participle + _agem_). It's not that much
different from English forms like "I have been thinking" except that the
order of elements is opposite.


> That part isn't perfectly clear yet; I'll have to read it again.
>
>
The compound periphrastic forms can easily become complicated. That's why
they are not used that often.
-- 
Christophe Grandsire-Koevoets.

http://christophoronomicon.blogspot.com/
http://www.christophoronomicon.nl/





Messages in this topic (10)
________________________________________________________________________
________________________________________________________________________
5. Quick poll about the usage of "Googler"
    Posted by: "Sai" [email protected] 
    Date: Sun Dec 18, 2011 4:10 pm ((PST))

https://plus.google.com/103112149634414554669/posts/BSavuCLE5w7

If you're on Google+, please vote there. ;-)

- Sai





Messages in this topic (1)
________________________________________________________________________
________________________________________________________________________
6. Real Research on the Origins/Types of Linguistic Universals
    Posted by: "Logan Kearsley" [email protected] 
    Date: Sun Dec 18, 2011 9:49 pm ((PST))

Speculative Grammarian has proved to be Actually Useful, because it
prompted me to lookup the word 'subjacency', which lead me to these:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.138.5171&rep=rep1&type=pdf
http://cnl.psych.cornell.edu/papers/EandC-cogsci2000.pdf

Both papers have essentially the same content, by the same authors;
the second one just has more statistics.

Someone actually got a bunch of test subjects, had them learn some
extremely simplified "artificial languages" (just semantically empty
character strings which could be classified as grammatical or
ungrammatical), and figured out which grammar was easier to process,
thus identifying subjacency constraints as being the type of universal
that is derived from incidental (i.e., not specifically linguistic in
basis) properties of human cognition (and unsupervised learning
algorithms in general, actually).

-l.





Messages in this topic (1)





------------------------------------------------------------------------
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/conlang/

<*> Your email settings:
    Digest Email  | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/conlang/join
    (Yahoo! ID required)

<*> To change settings via email:
    [email protected] 
    [email protected]

<*> To unsubscribe from this group, send an email to:
    [email protected]

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 
------------------------------------------------------------------------

Reply via email to