Hi Ada, As these threads consists in a discussion rather than a set of scientific statements (the first one being motivated by responding to a stimuli, while the second consists in defining/motivating a scientific position that is supposed to stand aside of any specific discussion), I forbid you to use any of my writings made on the corpora list in any of your web sites.
Of course, I still authorise corpora list to keep archives (as these are maintained along with the full discussion context). Regards, Gilles Sérasset, > On 31 Oct 2023, at 19:19, Ada Wan <[email protected]> wrote: > > Dear all > > I am about to post CorporaList threads which I have responded to on my own > website, as it seems some of my replies are not yet showing on the public > website > (https://list.elra.info/mailman3/hyperkitty/list/[email protected]/ > <https://list.elra.info/mailman3/hyperkitty/list/[email protected]/>). > If any of you should have any objections to this (because you don't want your > replies to be seen), please let me know asap. > > Thanks and best > Ada > > > On Mon, Oct 30, 2023 at 9:31 PM Ada Wan <[email protected] > <mailto:[email protected]>> wrote: > [Disregard if not interested] > > Dear all > > Thanks for your emails. The issue of where the misunderstanding might lie is > clearer to me now, esp. given Gilles' example with his niece. > (@Anil: perhaps you are right in your observations in a possible style change > in my correspondences --- I may well have been running out of patience at > this point (considering I have been in rebuttal mode since at least 2019 > [1]?! So it's a good thing that morphology is coming to an end!). In the > beginning, I had expected the professionals whom I expect to be experienced > in "language"/data matters (and the subscribers of the CorporaList) to be the > first to appreciate my results, but it turned out to be the other way around, > it seems. Those who have been exposed to fewer "language tales" [2] can be > quicker in getting it. But anyway, please allow me to explain again below.) > > Most importantly, in the niece example, there are 2 things that should be > discerned from one another: > i. what the niece uttered [i.e. data/observation (do note also how the data > is collected: recorded or transcribed?)], and > ii. what one's interpretation/analysis of her utterance is [i.e. > interpretation/analysis of observation]. > > In "grammarese" formulation, the case in question is as follows: Gilles' > niece conjugated an irregular verb with a regular verb conjugation > pattern.[3] > > Gilles suspects that (linguistic) morphology exists (and/or is universal?) > because the pattern of the niece's utterance resembled one of the patterns > (sometimes formulated from "rules" [4]) often studied in literature on > morphology. > > Re "she clearly showed me that her way of learning languages did not > consisted in reading/listening to huge amounts of utterances ...": > even if the niece had only been exposed to 10 utterances, if 8 of which > exhibit a certain pattern, and 2 of which are more irregular/outlier-like, > chances of her applying habits that are in line with the pattern observed > more often in the rarer/unobserved cases can be high --- and would you not > agree that's rather reasonable? > There are or may be un-/subconscious *patterns*, sure. But I do not argue > against these, for such patterns do not have to be formulated in terms of > "stems"/"roots"/"affixes", and more importantly, most of these patterns > surface more often in books than in real life anyway. So the fact that one > believes that a morphological paradigm is to be formulated in a certain way > is pretty much a matter of preference of a (group of) researcher(s). > > Re "but she was able to learn some word formation rules from very few > examples": > what she "learned" might just be some patterns --- at least according to > your/our analysis here. That is, she might not have yet had much exposure to > "rules", but Gilles might have. (Hence his conviction of the reality of > morphology may be stronger.) > > Re "In my humble opinion, this proves that morphology exists, if not in the > LLM matrixes, at least in the human brain": > I don't disagree with how one's mind can be clouded by archaic ideals or > theories. But shouldn't a better theory exist outside of the mind of a person > or a group of scientists as well? > > If one accounts for text data in its entirety, i.e. without disregarding or > adding in whitespaces, evaluate in bigger span (as mentioned in the rebuttal > here [5]), the notion of morphology is actually irrelevant to a comprehensive > study of (language) data. Wouldn't you agree? > With your plane and bird analogy: so you could claim that if you do insist on > cherry-picking from data, shouldn't your analyses still matter? Well, if they > don't generalize well, they may end up mattering to you only. > > Re "... (or issued from a colonialist point of view of Aves on the task at > hand…) and asking them to renounce this oh so obsolete bad habit": > I suppose it depends on which side of history one would like to be on too. > > I understand that it can be much harder for those who have lived in a country > where "language" activities (and/or the concept of "language") have been > officially and explicitly supported/promoted. This "privilege" now puts many > of us in a rather disadvantageous position in unlearning much. > > Re "ML based language models": > I don't know what you understand of these, but the logic behind such (e.g. a > probabilistic processing/interpretation of sequences) is often not far from > how "humans" are known to "process language(s)" --- which is why many > modeling experiments can bridge "both spheres" (though I believe many > experienced in modeling would buy less into this "human 'versus' machine" > narrative). > > @Gilles: I am also curious what your takeaway is from Quine's "Word and > Object" (e.g. at https://mitpress.mit.edu/9780262670012/word-and-object/ > <https://mitpress.mit.edu/9780262670012/word-and-object/>) in relation to our > conversation here. > > @Anil: the computational phenomenology is already in "Fairness in > Representation" (note that the insights were obtained from a collection of > many models, i.e. most of them are epi-phenomena). So I think what I have in > mind is orthogonal to what you described. Crimes and other misconduct have > also been around for millenia, are these things we want to keep? > That having been clarified, do you have other objections to my contributions? > > I hope I have addressed your concerns sufficiently. If not, please let me > know. > > Thanks and best > Ada > > > [1] The results that ending up getting published in Fairness in > Representation <https://openreview.net/forum?id=-llS6TiOew> (ICLR 2022) had > been rejected about 5 times, those in "Statistical (Un-)typology" (even with > "greedy" research incentives so to fit in) about another 5 times from May > 2019 to April 2022, in addition to other attempts/withdrawals. Then all I > have been dealing with is just retaliation. In fact, I just got some stuff > stolen and had to get things reported to the police, so please pardon my > delay in reply. > [2] At a point, I thought perhaps it'd be best to have no disciplines. Then I > realized not all disciplines are like "language", "linguistics", or > "structural linguistics". > That having been expressed, can having "no disciplines" be still a good > thing? Possibly, but another debate, another time, perhaps. > [3] But let's bear in mind: what one'd consider a "regular verb" (vs > "irregular verb") is nothing but some sequence/utterance seen/heard more > frequently than others. > [4] esp. in the history of "transformational grammar" that was popular around > the mid 20th century. "Grammar rules" might have been around for longer, but > branding things as within the domain of "morphology" as a module of a bigger > "structure"/"structural framework" of "linguistic analysis" is a matter that > has become more popular only in the past half a century or so due to > "transformational grammar" / "structural linguistics". > But please do note that even in "structural linguistics", many patterns are > explained away in terms of (the ranking of) constraints (i.e. no > "transformation"). There are no/few reasons to posit the notion of "deep > structure(s)", from/through which, in the case of morphological analyses, > "stems"/"roots" get to be held often as the bases of inflection. That is, > aside from "grammar rules" taught in e.g. schools and those inside of > researchers' mind, evidence for the existence of "rules" is actually rather > little, if any. [N.B. this can be considered advanced for those who didn't > have a theoretical background in Linguistics.] > [5] https://openreview.net/forum?id=-llS6TiOew > <https://openreview.net/forum?id=-llS6TiOew> > > > > On Thu, Oct 26, 2023 at 6:05 PM Anil Singh <[email protected] > <mailto:[email protected]>> wrote: > I have also been carefully reading the exchanges. Although I was planning not > to add to this exchange, at this point I am tempted to reply. > > Ada's early emails were adding something to the discussion and debate, but at > this point they are simply saying 'I am right, you are wrong', without giving > any explanation or evidence. > > I was also thinking of the same kind of examples as given by Gilles. Till Ada > provides some very good reasoning and evidence, it is hard for me to > completely agree with her, although as I said earlier, I do agree with her on > many, perhaps most of things. > > Ada, I sincerely respect your learning and competence. However, you said > earlier you are proposing an alternative computational phenomenology. That > would be really interesting. Won't it be better to first propose it and argue > in more specific terms and with more convincing arguments and evidence that > it is the right one, or at least 'more right' than the existing ones (there > are more than one). Given that there is already Information Theory, it has to > go beyond byte, which is an accidental unit of computation, and character, > which is also not well-defined, sometimes even for one specific writing > system. To give one such example, perhaps not the best one, I always thought > of Indic script dependent vowel (maatraa) as a character, but I recently > found that languages like Java and Python do not treat such written symbols > as character, so when I try to get the length of an Indic-script string, the > in-built string length functions give only the number of consonant symbols > and independent vowels in the string. We got wrong results using these > functions and I only accidentally discovered that this is the case. The > reason, of course, is that these functions and programming languages treat > such dependent vowels as diacritics, which is also correct in some ways. I > did not realize this earlier because in India we often use a Latin > script-based notation called WX for Indic scripts in NLP due to the encoding > and input method related problems that I referred to in one of my earlier > replies. The WX notation, however, does not distinguish between dependent and > independent vowels and treats both of them as the same character, which is > how most of us, if not all, think of them in India to the best of my > knowledge. On the other hand, the consonant symbol modifier 'halant' is not > used in WX, but is used in Indic-scripts and its presence might also cause > disagreements about what the string length is. In other words, character as a > unit does not work in your terms. In fact, who knows how many errors for > Indic script text have made their way into computational results due to this > simple fact. And perhaps they still do because it took me a long time to > realize this, which at first led to consternation, because in text processing > if you can't rely on the string length function, what can you rely on? > > As for phonemes, major ML researchers like Vincent Ng don't believe it to be > a real unit of language. The argument is that we don't need phonemes for > applications like speech recognition. > > If not byte and character, what are we left with in terms of computational > phenomenology? At the very least there has to be such a well-argued and > well-evidenced alternative in order to try to persuade others to agree to > your views. I would be very much interested in thinking about such an > alternative even if at present I don't think you are right about all your > views. After all, to throw away millenia of work on language-science, very > strong reasoning and evidence for an alternative is not an unrealistic > expectation. > > On Thu, Oct 26, 2023 at 8:44 PM Gilles Sérasset via Corpora > <[email protected] <mailto:[email protected]>> wrote: > Hi Ada, > > When my niece was 3 year old, she said to her little brother “Maman, elle > venira plus tard…” (Mum will come back later, in “incorrect” French). > > She made a “mistake" here by using “venira” (a wrong future form for verb > venir (to come)) instead of the “correct" “viendra”. It was wrong, but > perfectly predictable using the most productive morphological rules of French > future formation. > > She was 3 years old, so I doubt she was really understanding what morphology > is, nevertheless, with this mistake, she clearly showed me that her way of > learning languages did not consisted in reading/listening to huge amounts of > utterances but she was able to learn some word formation rules from very few > examples. And indeed, human is still able to perfectly learn complex things > with very small explanation and/or very few example (something that is > totally beyond ML based language models). > > In my humble opinion, this proves that morphology exists, if not in the LLM > matrixes, at least in the human brain. Hence modelling such rules (and even > using them to analyse or produce) is a valid approach, independently of any > other (also valid) approaches. > > If I want to say it another way : > > There has been many scientific proofs that human will not be able to fly… And > these proofs were valid under their own hypothesis. > > Indeed, planes do not flap their wings… they are using other ways to perform > a task that was performed by birds. > > Nevertheless, I have never been the witness of any plane (or pilot) trying to > convince birds that their way of flying is obsolete (or issued from a > colonialist point of view of Aves on the task at hand…) and asking them to > renounce this oh so obsolete bad habit. > > Regards, > > Gilles, > > _______________________________________________ > Corpora mailing list -- [email protected] <mailto:[email protected]> > https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ > <https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/> > To unsubscribe send an email to [email protected] > <mailto:[email protected]> > > > -- > - Anil
_______________________________________________ Corpora mailing list -- [email protected] https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to [email protected]
