Re "Wanderers, Kings, Merchants: The Story of India through Its Languages
by Peggy Mohan":
look forward to reading it. (History or historical events aside, does it
have to do with how language(s) became an indicator of
power/privilege/status? If humankind has anything in common, this could be
one general observation...)

It does to some extent, particularly with regard to the history of
languages and related things in India.

Re "If people are used to writing in their own language on computers, then
that language is more likely to survive":
I don't disagree, but ---
[Forewarning: one might not like my reply to the following, but I ask for
it to be interpreted with a scientific mindset rather, not with emotions
and sentiments related to language identity and particular cultural
practices.]
"Languages" (as in, particular language varieties, not "language" as in
language-at-large) come and go, born and die (and, sometimes, come back),
like trends, styles, cultures ("culture" as in a particular way of living,
a set of habits...). Esp. for users of varieties that have undergone
oppression/suppression, I understand that there is or can be much meaning
to many users in having the varieties be alive or in use. It is important
to the users. It is a symbol of their existence.

Yes, that is important.

But having witnessed how language has been abused (e.g. with research greed
by some CL/NLPers), I sometimes think one might have gone too far with how
much identity one attaches to any particular language.

Coming trom the background that I do, your statement above seems very
similar to saying that we might have gone too far with political
correctness or about opposing racism/misogyny/etc. Like most Europeans or
Americans, you seem to have no (or very little) idea about the toll that
discrimination -- even unintended discrimination -- takes on a very large
part of the human population.

In early 2009, I had written a rant in a blog post and the title of the
post was "English is Language Independent". I had made roughly the same
points which the now famous Bender Rule makes. I have been writing about
it, although that blog is now defunct. I did not, however, make any
proposal, as the Bender Rule does. I just pointed out the problem, so I am
in no way undermining the importance of that.

And one anonymous comment on this blog post was this: "Why don't you work
on a good project. I don't see the prejudice persisting for long once you
do that."

It is like saying about gender or racism: "Why don't you make
accomplishments equal to us? Once you do that, I don't see the prejudice
persisting for long."

Not to be picky here, but "I have heard some native speakers *users *of
some "Dravidian languages" say that there are some (I guess minor) problems
with Unicode for their languages": using the terms "native" and "speakers"
to refer to "users" (or as I sometimes use among knowers of language:
"languagers") has been an unhealthy baggage from our past practices in the
language space. I can't comment on the issue of "Dravidian languages" and
Unicode, but it seems one might need more info/details on the complaints to
act further.

Well, yes, the word "native" has indeed a very dark history. How could I
not know it? But, the manner in which linguists use the term "native
speaker" is very different. At least I think so. Note that I am only
mentioning linguists here, not "CL/NLPer". By the way, I thought I had
coined the term "NLPer" in my blog long ago, but I may be wrong about that.

I am pretty sure one can find out these details in some online forum or
some academic publication etc. I have not so far done that, but it is a
good idea to do that. I will try.

Re "psycholinguistic validity from computational validity":
in (cognitive/psycholinguistic) modeling, there is / can be not much
difference between the two. (When one has enough experiences with modeling
or with language/psycholinguistic phenomena, it's not hard to to see that
results from computational modeling could also hold elsewhere. The art then
is to be able to connect the two "realms". But then again, it depends on
the claims, of course.)

Yes, of course, it depends on the claim. The two realms can definitely be
connected. We don't disagree about that. In fact, I think, we don't really
disagree about that many things it seems. Even so, isn't it possible to
implement the same thing in many different ways when it comes to
computation? That is not the case with the brain/mind, of course. Here, I
am again making a distinction between computation and mathematics. Perhaps
you don't agree with that? In that case, perhaps we mean different things
by the term "computation".

Re science and engineering:
I am not sure if engineering has to be "just about" heuristics or short
cuts. There is good engineering and there is bad engineering, for the sake
of my arguments here. In the context of ML with language/textual data, one
ought to be careful with "computing based on values of surface
elements/strings".
Much of what the CL/NLP community/communities have been doing for the past
few decades has been "computing based on values of surface
elements/strings". This practice deserves serious re-evaluation (there are
lots of grey areas here and opportunities to compare processing across
finer granularities (without all the preprocessing
hacks/"heuristics"/"engineering") for various tasks and data types/formats,
without "words", "sentences", "linguistic structure(s)", "grammar" et al.).
I don't think of it as "it's engineering!", but some bad practices/culture
having been promoted as such and normalized (for a couple of decades?).
Good engineering can also be fine, thoughtful, and robust.

I completely agree. But sometimes I do work on things which, theoretically,
seem ridiculous to me, but they may be practically useful. At least half
(perhaps more) of my motive to work on language processing is to address
somehow, to any extent, the issue of linguistic empowerment. I am prepared
to compromise theoretically for that purpose.

Re "I don't think there is anything wrong with what you call grammar
hacking from the engineering point of view":
I do (think there is something wrong), because:
i. "all grammars leak" (from Sapir, also in Manning and Schütze (1999));

I know. I tell this to students every time I teach NLP or any related
subject.

ii. "words" (whatever they are) are too coarse-grained for computing.

I already agreed to that, but if they help in my benign motives, I am
prepared to use them.

Re "it [language(s)] is still likely to have an 'organic' structure":
couldn't that structure (one not associated with
"words"/"sentences"/"grammar") be one from math or computing? Or one that
is a by-product of a combination of these factors?

It certainly could.

Some CL/NLPers have made various claims concerning "structures" in the
past, borrowing the concepts from "linguistic structure(s)", from
"grammar". There was a lot of chiming along, many often have neglected the
fact that grammar could effect the impression of "structures" through
"words" etc. or that it all in turn patterns some of our thoughts/judgments
sub-/unconsciously. And the loop goes on.
(See also: See also:
https://twitter.com/adawan919/status/1532335891448057858)

Well, if you prefer the term "patterns" to "grammar" or "structure", I am
completely fine with that. As I said earlier, I am moving towards the
language games view of language, even for this exchange. We can't avoid
that if we are talking in a human/natural language. The only way to avoid
that, if there is a way, is to use only mathematical notation, but I don't
think we have reached that stage so far with the study of language.

Re "in English "John loves Mary" is in fact a very different thing than
"Mary loves John"":
one has to re-evaluate to which extent this matters in whichever form of
computing/computation one is engaged in and how often this "canonical form"
that you are implicitly referring to really occurs in data as well as how
this actually surfaces in data.
One should look at the data in front of one, not the framework/theory in
one's mind. (I believe in achieving better designs/systems through testing
from both a data-centric as well as an algorithm-centric perspective.
Hardware counts too!)

I mostly agree, but I am not sure whether you are saying that "John loves
Mary" may not perhaps be different from "Mary loves John"?

Re " it is unfair to blame Linguistics for that":
My focus in "[t]he "non-native speakers of X" has been a plague in
Linguistics" was on the "native" part. That has been my understanding, at
least to a great extent "nativeness" was so promoted/reinforced, esp.
within the school of generative Linguistics in the 2nd half of the 20th
century, when it comes to "linguistic judgments". I thought the propagation
stemmed from there. Who/What else do you think started it?

I think the word "native" was used in a derogatory/condescending way
throughout the English speaking world, even before the "birth of Modern
Linguistics". It was, in fact, the more polite word. One other common word
was "savage". I remember being shocked to find (very long ago) in Jane Eyre
the phrase "savages living on the banks of the Ganges". Savages? On the
bank of the Ganges?

But these usages are much older (than "Modern Linguistics").

The matter of "linguistic judgements" or "grammaticality" is very different
from that, regardless of what one's opinions are about the existence of
grammar.

All in all, your replies remind me of many of the reviewers' responses
"typical" of (i.e. I often got from) the *CL circle (of those who remained
in the past decade or so).

I don't know about that. I thought I had very unconventional views of NLP,
but I could be wrong about that, at least relatively.

If I may guess:
i. you don't have an academic background in Linguistics (esp. general
Linguistics,

That is true. I have learned about language(s) mostly on my own. So, if you
want me to show my degree in Linguistics, I have none, except a PhD in
Computational Linguistics. I was the second person to get a PhD
specifically in CL in India.

note that there is a difference between linguistics of particular languages
and that in a more general/theoretical manner (not about (p-)language
grammatical particularities),

Of course there is. What makes you think I don't know that? The fact is, my
knowledge of "(p)-language" is relatively very limited. Even my knowledge
of the syntax of Hindi (my "mother tongue"), in a formal sense, is very
limited. I mostly know about language in general.

ii. you learn about language(s) through mostly non-academic books or
through your own language experience(s) (which counts too, I am not
invalidating it/them here),

I have no idea what makes you say that. Am I supposed to list the
Linguistics books I have read, in addition to showing my Linguistics
degree? (Sorry if that sounds bitter, but it has happened in the past, not
literally, but effectively).

I can only say -- and it is strange to even have to say this -- that I
definitely know more about language in general and linguistic theory than
-- at least -- most graduates and postgraduates (including PhDs) of
Linguistics in India.

iii. you never had phonetics and phonology, nor

In my replies on this thread I have not mentioned anything related to
phonetics or phonology. So it must be from somewhere else that you have
this impression(?). Is it from some of the papers co-authored by me? I
think there are some which could give that impression. Explaining why that
could be so, will take this discussion somewhere else. I don't think it is
relevant here at all.

Is your point that I don't know about phonetics or phonology? If it is, I
would prefer not to answer that.

iv. do you realize how you can practice without "words"

I do. But I am also prepared to use "words" wherever they help. As I wrote
earlier, I have worked without "words" sometimes and have argued against
them.

--- did I get any right?

You got -- sort of -- the first and the third, assuming you were asking me
to show my Linguistics degree and whether I had formally taken courses on
Phonetics and Phonology.

I wanted to note this because --- and please do not take offense, it is not
meant personally for I respect your expertise and appreciate our exchanges
--- for a while, I didn't know where(else) to submit my findings. It wasn't
until I got all the rejections with rather shallow comments about language
(or language and computing) did I realize the "solidarity" one has built
with people with a background similar to yours might have been the driving
force of how some computational (general) linguists (as in, "general
language scientists" who also do computational work --- there are only a
few of us) got chased out of the arena. The "typical" excuses for practices
of this "culture" have been "engineering", "useful", "it works" --- but
without any/much grounding/interest in good generalizations. One puts
excess focus on processing but not on evaluation or interpretation. I think
it's time for a "culture" change in this regard.

To your other reply below (in triple quotes):

Sorry, but I didn't understand what you mean by triple quotes. I could find
any triple quotes in your comments.

re "language policy":
not everything has to be or can be regulated. Policies can help with
promoting/reinforcing/rectifying a particular situation/initiative.

I agree.

Forcing people e.g. to use language in one particular way or to use "one
language" only (whatever "language" "means"*)?

Again, I agree. However, like most Europeans and Americans, you seem to
have no (or very little) idea how people are already being forced to use
some language or another. And it is hardly a new phenomenon, but it has
become much more serious now due mainly to colonization and all its
effects. Do you have any idea how much hundreds of millions of Indians
suffer simply from being forced to use English? My primary motive in my
whole life, for good reasons, has been to counter linguistic
discrimination, mainly due to the imposition of English on any Indian who
has any ambition at all. I think that alone makes me sufficiently qualified
to "work on language". This is analogous to any other kind of
discrimination or prejudice.

I do not think that would be a good direction to go. For any regions, we
have seen both good/better and bad/worse policies throughout the course of
history. One would really have to evaluate the proposed policy in question
carefully.

I never said anything about forcing people to speak one language. That's
why I said I don't know what exactly the connection is with the language
policy. But it sure has a very strong connection, because the problem in
the first place is due to language policy, written and unwritten. Do you
know that there are and have been schools in the world, including India,
where students are punished if they are caught speaking in their mother
tongue (or first, "native" language). I still remember feeling recognition
and the impression it had on me when I read the famous novel about life in
Wales, How Green was My Valley. I had just become fluent in English then.

Depending on the situation, some may best change things through the
economy, some through government support, some through education and/or
grassroot-type of initiatives, some a combination of all these and more....

I agree.

*I have an answer to this... please wait for my next pub or so.

I would love to read it. I am eternally hungry for any fresh look on
language in general. Not so much for particular languages or varieties.
That is to me, to some extent, boring.

Re "it is very much like conservation of ecology or of species. I don't
think it (the latter) will be considered unwarranted prescriptivism":
see my 2nd response above.

The same for my reply to that comment.

Also, with language documentation, one can just document data without
promoting grammar. (That's probably the less unethical thing one can do
with language or language data.)

Again I agree. I never said anything about promoting grammar. I don't like
to read grammar books. It's painful to me, compared to almost any other
topic under the sun, except perhaps finance, commerce, and the intricacies
of legal procedures.

For the sake of completeness, I should clarify -- as it seems to matter --
that I have "never had Syntax or even Semantics or Pragmatics". I am mostly
self-taught, not just in Linguistics, but also in Computer Science and
almost everything else. Do you really think it matters in the context of
this discussion?

Again, for the sake of completeness, I should mention that for decades, I
have been reading all kinds of books that had anything to do with language,
mostly in general, but also about Hindi or Indian languages, not to mention
English. These have included what you call academic books on language in
general and about Linguistics. I still keep reading, as I know very well
that, being self-taught, I have some gaps in my knowledge of Linguistics
and Computer Science. My undergraduate degree was in Mechanical Engineering
(from 1990), but I hardly remember anything in that area. I have similarly
been reading all kinds of books for decades about computers and Computer
Science.

I am unable to see how any of this matters in the context of this
discussion.

By the way, I like the metaphor you use for language: It being like a
graphical user interface for the brain. That reminds me of the views of
Daniel Dennett about consciousness. He constantly compares elements making
up consciousness to graphical user interfaces on computers. Not that I
completely agree with him about consciousness, but I still find the
metaphor quite good, perhaps as an approximation.
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to