Hi Ada, Of course what counts as a morpheme or as a lexical expression, and what inventory of compositional rules one assumes, is subject to one's theory, to language change, and also to ad-hoc playful reinterpretation (that's what I would see your "s or p?" example as). But these are notions that need not be 100% precise in order to delineate a research area such as MWE. There can be a gray area as to what counts as an MWE and what doesn't. For example, many MWE researchers would probably not count a prefix verb like *beguile* as an MWE, simply because it fulfills all the criteria of wordhood in traditional Western NLP. But if we assume a wide definition such as "lexical expression consisting of more than morpheme", it too would fall under the MWE label. In fact, it exhibits the same competition between a lexical/idiomatic reading and a compositional reading that is typical of more complex MWEs: "kick the bucket" could mean to kick the bucket or to die, "beguile" could mean to affect with guile or to deceive.
I would support a name change from MWE to CLE or similar, because I agree that "word" is not a very useful notion cross-linguistically. (Then again, the notion of MWE might still work okay if we assume Martin Haspelmath's retro-definition <https://dlc.hypotheses.org/2621> of "word".) Cheers, Kilian Am Sa., 11. Feb. 2023 um 18:29 Uhr schrieb Ada Wan <[email protected]>: > Hi Archna > > Thanks for your reply. > > Your justification of the continual usage of "MWEs"/"words" is based on > history and shared understanding (from 09Feb2023: "since the term has been > used for a long while, there is a bit of a shared understanding of this > term, including about these stipulations"), both of these criteria are > achievable with alternate formulations. > > Re "the category of items, of which idioms is a subset, has been referred > to as multiwords for a long time": "MWE" does not have that long of a > history --- what is the earliest use of "MWEs" that you have in your > records? And even if terms have been used for a long while, it doesn't mean > that we cannot change them for the better, esp. when they have been > inappropriately adopted or found outdated. What objections do you have with > "lexical expressions", for example? > > The issue/problem with "word" is that, aside from it not being necessary > or sufficient in the study of language or in computing, there is also an > implicit, shared understanding that it is arbitrary, redundant, and > indeterminate. (This applies also to the notion of wordhood within one > language.) The indeterminacy part is evident in your not having provided me > with a definition of "words" thus far as well. Furthermore, as you > confirmed earlier: "the notion of wordhood may not be applicable to every > single language and in the same way", then how should "words" be robust > enough for computational processing? > > Re emojis: here are some examples of emoji combinations that show a sense > of idiosyncrasy when they (co-)occur: > 🤩 for "star-struck" (from > https://unicode.org/emoji/charts/full-emoji-list.html) > Or from from > https://www.elitedaily.com/lifestyle/funny-emoji-combinations-tiktok: > 👉 👈 (feeling shy/simping) > 🚪🏃♀️💨 (time to leave) > 🍿🤏😯 (when drama is happening/when something is going down) > 👁👄👁 (blank stare) > 🕳👨🦯 (I didn't see anything) > 👩🤏👩🦲 (wig snatched) > 🐂💩 (bullsh*t) > > My concern is on "wordhood" in the "language space" > (science/engineering/technology) in general, not just on lexical > expressions. I do think, however, that SIGLEX could help play an important > role in effecting some positive changes in this regard. > > ---------- > > Hi Kilian > > Let's suppose that what we have thus far known as "grammar" (the one that > has been based on or related to "words" or "sentences", i.e. > morphology/syntax (and some phonology)) can be decomposed into (sequential) > ordering and linguistic attitudes/normativity [1]. I do think > judgments/attitudes play a role in language as it exists in the social > world and can affect, or even determine, how registers/styles etc. are > defined, but I also think that there is more rigorous science of (the > remaining aspects of) language possible if we were to separate such > attitudes/prescriptivism from a more descriptive stance (e.g. in the > direction of information sciences). > > Once we remove the attitudes/normativity part from the science of > language, lexical and contextual information as well as function/use > remain. > > The reason why I hesitated in referring to MWEs as "complex" is because > (lexical) "complexity" can be broken down into vocabulary and length, with > use/frequency accounting for pragmatic/functional one. Hence every > expression (or any character string) is lexical. > The element of idiosyncrasy/idiomaticity is really a pragmatic one (e.g. > in the rarity/archaic-ness/uniqueness of the use of the > expressions/segment/span or character n-grams). > So "sing" can be seen as a lexical expression, just like "bing" or "ping". > Let's not forget that (even according to traditional grammatical analyses) > various linguistic effects can happen to expressions when they undergo > frequent use over an extended period of time. E.g. "ping me" may be seen > thus far as relatively more idiomatic than "sing me a song", but that's due > to the former expression being more specialized, less general, or rarer in > use. Also, e.g. in a conversation, if one said "sing me" and the other > didn't quite catch the first bit of the phrase, they might ask "[s] or > [p]?" or "'s' or 'p'?". And one can well imagine that if this becomes in > use more frequently, "s" and "p" can be regarded as what we'd now interpret > as "idiomatic". Hence "sing" does not have to be seen as a "single > morpheme". > > [1] I have tweeted this before on 28Jan2023: > https://twitter.com/adawan919/status/1619401653962297344?cxt=HHwWgMDS0a3OovksAAAA > In a way, I am reinterpreting "(non)-compositionality" as > normalization/frequency effects via the decomposed view of "grammar" above. > > > ----------------------------------------------------------------------------------------- > > > > > *Hence, my proposal (not just for MWE workshop folks but perhaps for all > who might be interested) would be: > https://docs.google.com/document/d/1n4QRn0CxbVMj6kbLWo-byT3S26ODJOjicU-ZYw7jRG0/edit?usp=sharing > <https://docs.google.com/document/d/1n4QRn0CxbVMj6kbLWo-byT3S26ODJOjicU-ZYw7jRG0/edit?usp=sharing>* > > *Comments welcome. * > Thanks and best > Ada > > On Sat, Feb 11, 2023 at 2:01 PM Kilian Evang <[email protected]> > wrote: > >> Hi Ada, >> >> The problem I have with the term "expression" without further >> qualification is that to my mind it includes any kind of linguistic sign, >> including ones like "to pay a visit to my dear aunt Ruth" which can clearly >> be interpreted compositionally. So I think we do have to specify "lexical" >> to delineate what we are studying in the MWE community. "Lexical item" or, >> sure, "lexical expression". Either would also include signs, of course. I >> do also feel we have to add "complex" or similar, because otherwise it >> includes single-morpheme lexical expressions like "sing". >> >> Cheers, >> Kilian >> >> Am Fr., 10. Feb. 2023 um 23:32 Uhr schrieb Ada Wan <[email protected]>: >> >>> Hi Archna >>> >>> "Idioms"/"Idiomatic expressions" are established terms in the study of >>> language [1], with a longer history than MWE [2]. "Fixed", e.g. in "fixed >>> phrases", is mentioned in, inter alia, [3], which was the earliest cite >>> from the earliest work on MWEs in the ACL Anthology [4]. If I understand >>> correctly, "MWEs" was a term so coined in order to establish a practice >>> based on "words" (if anyone should view this differently, please do correct >>> me here). >>> >>> You're right, the task I suggested can be seen as orthogonal to >>> distinguishing between lexical expressions or non-lexical expressions. I >>> think it's important to document also the contexts surrounding expressions, >>> instead of just picking expressions out and studying them in an isolated >>> manner. It was just a suggestion for those who might be interested in >>> building a multilingual parallel lexical database as well as those who >>> might want to get a more holistic understanding of language while weaning >>> oneself of "words" --- now that it's become even more obvious how >>> superfluous the term/concept is. >>> >>> [1] See e.g. https://en.wikipedia.org/wiki/Phraseme >>> [2] "Idiomatic expression" is just another formulation of "idiom" (see >>> https://www.thefreedictionary.com/idiomatic+expression). >>> According to Collins English Dictionary (accessed via >>> https://www.thefreedictionary.com/idiom), "idiom" stems from the 16th >>> century Latin idiōma, denoting "pecularity of language". >>> [3] Nunberg, Geoffrey, Ivan A. Sag, and Thomas Wasow. 1994. Idioms. >>> Language, 70:491–538. https://doi.org/10.2307/416483 >>> (Many older references on "idioms" by linguists can be found therein.) >>> [4] Ann Copestake, Fabre Lambeau, Aline Villavicencio, Francis Bond, >>> Timothy Baldwin, Ivan A. Sag, and Dan Flickinger. 2002. Multiword >>> expressions: linguistic precision and reusability. In Proceedings of the >>> Third International Conference on Language Resources and Evaluation >>> (LREC’02), Las Palmas, Canary Islands - Spain. European Language Resources >>> Association (ELRA). >>> >>> ------------------------------ >>> >>> Hi Kilian >>> >>> Sorry about my oversight on "item". I do think "item" could be better >>> than "term" in this case, but it does carry a sense of "a single element", >>> a more discrete "singleton". It's ok to combine it with "complex" to >>> mitigate the sense of "singleton", but then "complex" as you suggested is >>> dependent on morphology, which can be problematic. >>> >>> Re "lexical": sure. (I think there have been so many different >>> views/traditions/conventions among linguists and computational linguists in >>> the past, we don't necessarily have to agree on how we or our >>> definitions/methods might differ or might have differed, as long as we have >>> the same goal now?) >>> >>> One argument for "expressions" would be that they could include a sign >>> (e.g. hand sign in motion). >>> >>> So how about updating "MWEs" to: >>> i. "lexical expressions", or >>> ii. "lexical expressions (of one character or more when written)*", or >>> iii. [i] or [ii] without "lexical", or >>> iv. others? >>> >>> * I'm trying to incorporate how expressions with emojis would/should be >>> treated too. >>> >>> ------------------------------ >>> >>> What do you all think? >>> >>> Thanks and best >>> Ada >>> >>> On Fri, Feb 10, 2023 at 10:58 AM Kilian Evang via Corpora < >>> [email protected]> wrote: >>> >>>> Forwarded message from Archna below >>>> >>>> ---------- Forwarded message --------- >>>> Von: Archna Bhatia <[email protected]> >>>> Date: Do., 9. Feb. 2023 um 19:58 Uhr >>>> Subject: Re: [Corpora-List] Deadline extension: 19th Workshop on >>>> Multiword Expressions (MWE 2023) >>>> To: Ada Wan <[email protected]>, kilian Evang <[email protected] >>>> > >>>> Cc: Mike Scott <[email protected]>, [email protected] < >>>> [email protected]>, [email protected] < >>>> [email protected]> >>>> >>>> >>>> Thanks, Ada. I think using the terms “fixed” and “idiomatic” make the >>>> category appear more restrictive, and would need qualifications such as >>>> “fixed” is a relative term here, etc. With “multiwords/multiword >>>> expressions” also, there are stipulations (the notion of wordhood may not >>>> be applicable to every single language and in the same way) but since the >>>> term has been used for a long while, there is a bit of a shared >>>> understanding of this term, including about these stipulations. I am open >>>> to better terminology. Using just “expressions”, however, seems too vague >>>> and loses some generalizations about the idiosyncrasies that "multiword >>>> expressions” demonstrate. Every expression in not the same, “multiword >>>> expressions” show characteristics different from other expressions. I >>>> understand there is some fluidity also there when trying to distinguish >>>> between multiwords and non multiword expressions. >>>> >>>> There are so many angles that one could look at language from. I don’t >>>> see anything wrong with the view that studies expressions covering all >>>> aspects as you suggest without distinguishing between expressions based on >>>> notions of wordhood. The task you suggest will help in developing >>>> understanding about language and how languages are similar or different and >>>> how they are used. I don’t think it disqualifies efforts that distinguish >>>> between “multiword expressions” and non-multiword expressions though, and >>>> the idiosyncrasies are not limited to morphology/syntax, idiosyncrasies are >>>> found in other linguistic aspects too when characterizing "multiword >>>> expressions”. >>>> >>>> ~ Archna >>>> >>>> On Feb 9, 2023, at 11:17 AM, Ada Wan <[email protected]> wrote: >>>> >>>> Hi Archna, hi Kilian, hi all >>>> >>>> Thanks for your replies. >>>> >>>> TLDR on my part: I'd be fine going with "expressions" (instead of >>>> "fixed/idiomatic expressions"). Neither "word" nor "morphology/syntax" >>>> (apart from the ordering of elements and/or sequential patterns) is >>>> necessary in the analyses of such. >>>> >>>> ----- >>>> >>>> More specifically: >>>> >>>> [@Archna] Re "fixed/idiomatic expressions": I don't think it matters >>>> much whether they are "fixed" or "idiomatic". A "fixed expression" is one >>>> that is usually more impervious to (lexical) change. One can measure this >>>> quality in a longitudinal study, e.g. in relation to other aspects of >>>> language change etc.. Re how "fixed" is "fixed": it's relative, much like >>>> many other aspects of language studies. By "idiomatic", one could mean that >>>> there is an element of idiosyncrasy (as "idiom"/"idioma"). >>>> >>>> The message that I am trying to get across is that "word" is a >>>> superflous category in the study of language. Would you mind please >>>> justifying why you need "words"? >>>> >>>> The same goes for morphology, actually. In essence, morphological >>>> analyses involve selective decomposition, not decomposition of all >>>> decomposable units. Hence if one is only accounting for variations within >>>> an expression as a ((sub-)character) sequence involving "morphemes" >>>> (assuming definable rigorously) and discounting the changes in other parts >>>> of the sequence, that would be an incomplete analysis of the expression. >>>> Instead, one can just refer to expressions as "expressions", as e.g. >>>> sequences/strings of various lengths/vocabs in (sub-)characters --- such an >>>> account is also more flexible and accommodating to diverse >>>> languages/registers/modalities. >>>> >>>> A study of "expressions" can cover all other aspects --- not just >>>> lexical but also functional ones. One doesn't need to incorporate/impose >>>> any ad hoc notions of "wordhood" in these studies. >>>> >>>> Suggestion: I believe there are many more interesting tasks in this >>>> area, instead of trying to find/define "words" within expressions, or to >>>> "parse" them according to some structuralist assumptions (i.e. >>>> morphologically/syntactically). For example, the community could start >>>> (some multi-year project) building an international multilingual parallel >>>> (note: not everything would be parallelizable) database of all expressions >>>> and terminologies ever existed with contextual (historical/cultural/social) >>>> information and start verifying their sources and status of current use. >>>> (Just be aware, though, that one is not reinforcing values that shouldn't >>>> be further emphasized / transfered to posterity --- as an ethical >>>> consideration. So if something is in the grey area now, document clearly >>>> what the current attitudes towards a certain value are, so posterity can >>>> look back and evaluate with respect to their point of view.) >>>> >>>> Counter questions to Archna: >>>> What are the motivations behind your suggestion to access/interpret >>>> language using "words"? How do you define "words" and justify the >>>> sufficiency/necessity of morphology/syntax in relation to the study of >>>> these expressions, esp. when the morphological decomposition of these >>>> expressions is arbitrary and helps little (or not at all) with explanation >>>> or prediction? >>>> >>>> Re "complex lexical terms", @Kilian: I'm just wondering what kind of >>>> terms that would be considered "terms" that wouldn't be considered lexical >>>> (I was tempted to add "lexical" to "expressions" as well, but thought that >>>> might be a bit redundant)? It depends on how one defines "terms", of >>>> course. And how "complex" are expressions really? They are just more >>>> calcified units after all, aren't they? (Why do we/some always seem to want >>>> to add the term "complex" to everything? Things that aren't "complex" are >>>> also worthy of studying!) >>>> >>>> Curious what you think... >>>> >>>> Thanks and best >>>> Ada >>>> >>>> Why I'm advocating #noWords: >>>> Fairness in Representation for Multilingual NLP: Insights from >>>> Controlled Experiments on Conditional Language Modeling >>>> https://openreview.net/forum?id=-llS6TiOew >>>> >>>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenreview.net%2Fforum%3Fid%3D-llS6TiOew&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jea7YNI7295cJ2CY0jwxrsjID7DcDqerqI3IQxj9hUc%3D&reserved=0> >>>> https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view >>>> >>>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Ffile%2Fd%2F1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9%2Fview&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZZ%2F8v%2FsH6RRAlIxLYsG1tYvFOFaTZFzVtCfvsQ8ZcuY%3D&reserved=0> >>>> (It took me a while for everything to sink in.) >>>> >>>> >>>> On Thu, Feb 9, 2023 at 3:27 PM Mike Scott via Corpora < >>>> [email protected]> wrote: >>>> >>>>> I must say I'm perfectly happy with "multi-word expression", or >>>>> "multi-word unit". >>>>> >>>>> I feel sympathy with Archna's post (and incidentally wish Archna >>>>> didn't have to go through a friend!) >>>>> Cheers -- Mike >>>>> >>>>> -- >>>>> >>>>> Mike Scottlexically.net >>>>> <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flexically.net%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lnpEPfv%2B4UmB1e0xVkC4hsIs%2B9GqwDnSzzMpwiFWZHw%3D&reserved=0> >>>>> Lexical Analysis Software and Aston University >>>>> >>>>> _______________________________________________ >>>>> Corpora mailing list -- [email protected] >>>>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ >>>>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra.info%2Fmailman3%2Fpostorius%2Flists%2Fcorpora.list.elra.info%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=78A%2BL24tQ0GBhZ3lAGipq9tLPZU%2FmydmGBGX1yE4BSA%3D&reserved=0> >>>>> To unsubscribe send an email to [email protected] >>>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "MWE Workshop 2023 Organizers" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/mweworkshop2023/CAB7Mis_GSyFjZOVw_XWp431VMJJBo0BnPqjFsqqTP_sEE58Ezw%40mail.gmail.com >>>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fmweworkshop2023%2FCAB7Mis_GSyFjZOVw_XWp431VMJJBo0BnPqjFsqqTP_sEE58Ezw%2540mail.gmail.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FX%2BoXH1j9XL5X0tJuqc%2BfKdFkuugawZvrtzdXNUG2%2FA%3D&reserved=0> >>>> . >>>> For more options, visit https://groups.google.com/d/optout >>>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1SGOAvFNmKwsKKOx6Kc%2Fm1wHzDbm%2F4xiEge3RY5etrE%3D&reserved=0> >>>> . >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Archna Bhatia, Ph.D. >>>> Research Scientist, Institute for Human & Machine Cognition >>>> 15 SE Osceola Ave, Ocala, FL 34471 >>>> (352) 387-3061 >>>> >>>> _______________________________________________ >>>> Corpora mailing list -- [email protected] >>>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ >>>> To unsubscribe send an email to [email protected] >>>> >>>
_______________________________________________ Corpora mailing list -- [email protected] https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to [email protected]
