Hi Ada, The problem I have with the term "expression" without further qualification is that to my mind it includes any kind of linguistic sign, including ones like "to pay a visit to my dear aunt Ruth" which can clearly be interpreted compositionally. So I think we do have to specify "lexical" to delineate what we are studying in the MWE community. "Lexical item" or, sure, "lexical expression". Either would also include signs, of course. I do also feel we have to add "complex" or similar, because otherwise it includes single-morpheme lexical expressions like "sing".
Cheers, Kilian Am Fr., 10. Feb. 2023 um 23:32 Uhr schrieb Ada Wan <[email protected]>: > Hi Archna > > "Idioms"/"Idiomatic expressions" are established terms in the study of > language [1], with a longer history than MWE [2]. "Fixed", e.g. in "fixed > phrases", is mentioned in, inter alia, [3], which was the earliest cite > from the earliest work on MWEs in the ACL Anthology [4]. If I understand > correctly, "MWEs" was a term so coined in order to establish a practice > based on "words" (if anyone should view this differently, please do correct > me here). > > You're right, the task I suggested can be seen as orthogonal to > distinguishing between lexical expressions or non-lexical expressions. I > think it's important to document also the contexts surrounding expressions, > instead of just picking expressions out and studying them in an isolated > manner. It was just a suggestion for those who might be interested in > building a multilingual parallel lexical database as well as those who > might want to get a more holistic understanding of language while weaning > oneself of "words" --- now that it's become even more obvious how > superfluous the term/concept is. > > [1] See e.g. https://en.wikipedia.org/wiki/Phraseme > [2] "Idiomatic expression" is just another formulation of "idiom" (see > https://www.thefreedictionary.com/idiomatic+expression). > According to Collins English Dictionary (accessed via > https://www.thefreedictionary.com/idiom), "idiom" stems from the 16th > century Latin idiōma, denoting "pecularity of language". > [3] Nunberg, Geoffrey, Ivan A. Sag, and Thomas Wasow. 1994. Idioms. > Language, 70:491–538. https://doi.org/10.2307/416483 > (Many older references on "idioms" by linguists can be found therein.) > [4] Ann Copestake, Fabre Lambeau, Aline Villavicencio, Francis Bond, > Timothy Baldwin, Ivan A. Sag, and Dan Flickinger. 2002. Multiword > expressions: linguistic precision and reusability. In Proceedings of the > Third International Conference on Language Resources and Evaluation > (LREC’02), Las Palmas, Canary Islands - Spain. European Language Resources > Association (ELRA). > > ------------------------------ > > Hi Kilian > > Sorry about my oversight on "item". I do think "item" could be better than > "term" in this case, but it does carry a sense of "a single element", a > more discrete "singleton". It's ok to combine it with "complex" to mitigate > the sense of "singleton", but then "complex" as you suggested is dependent > on morphology, which can be problematic. > > Re "lexical": sure. (I think there have been so many different > views/traditions/conventions among linguists and computational linguists in > the past, we don't necessarily have to agree on how we or our > definitions/methods might differ or might have differed, as long as we have > the same goal now?) > > One argument for "expressions" would be that they could include a sign > (e.g. hand sign in motion). > > So how about updating "MWEs" to: > i. "lexical expressions", or > ii. "lexical expressions (of one character or more when written)*", or > iii. [i] or [ii] without "lexical", or > iv. others? > > * I'm trying to incorporate how expressions with emojis would/should be > treated too. > > ------------------------------ > > What do you all think? > > Thanks and best > Ada > > On Fri, Feb 10, 2023 at 10:58 AM Kilian Evang via Corpora < > [email protected]> wrote: > >> Forwarded message from Archna below >> >> ---------- Forwarded message --------- >> Von: Archna Bhatia <[email protected]> >> Date: Do., 9. Feb. 2023 um 19:58 Uhr >> Subject: Re: [Corpora-List] Deadline extension: 19th Workshop on >> Multiword Expressions (MWE 2023) >> To: Ada Wan <[email protected]>, kilian Evang <[email protected]> >> Cc: Mike Scott <[email protected]>, [email protected] < >> [email protected]>, [email protected] < >> [email protected]> >> >> >> Thanks, Ada. I think using the terms “fixed” and “idiomatic” make the >> category appear more restrictive, and would need qualifications such as >> “fixed” is a relative term here, etc. With “multiwords/multiword >> expressions” also, there are stipulations (the notion of wordhood may not >> be applicable to every single language and in the same way) but since the >> term has been used for a long while, there is a bit of a shared >> understanding of this term, including about these stipulations. I am open >> to better terminology. Using just “expressions”, however, seems too vague >> and loses some generalizations about the idiosyncrasies that "multiword >> expressions” demonstrate. Every expression in not the same, “multiword >> expressions” show characteristics different from other expressions. I >> understand there is some fluidity also there when trying to distinguish >> between multiwords and non multiword expressions. >> >> There are so many angles that one could look at language from. I don’t >> see anything wrong with the view that studies expressions covering all >> aspects as you suggest without distinguishing between expressions based on >> notions of wordhood. The task you suggest will help in developing >> understanding about language and how languages are similar or different and >> how they are used. I don’t think it disqualifies efforts that distinguish >> between “multiword expressions” and non-multiword expressions though, and >> the idiosyncrasies are not limited to morphology/syntax, idiosyncrasies are >> found in other linguistic aspects too when characterizing "multiword >> expressions”. >> >> ~ Archna >> >> On Feb 9, 2023, at 11:17 AM, Ada Wan <[email protected]> wrote: >> >> Hi Archna, hi Kilian, hi all >> >> Thanks for your replies. >> >> TLDR on my part: I'd be fine going with "expressions" (instead of >> "fixed/idiomatic expressions"). Neither "word" nor "morphology/syntax" >> (apart from the ordering of elements and/or sequential patterns) is >> necessary in the analyses of such. >> >> ----- >> >> More specifically: >> >> [@Archna] Re "fixed/idiomatic expressions": I don't think it matters much >> whether they are "fixed" or "idiomatic". A "fixed expression" is one that >> is usually more impervious to (lexical) change. One can measure this >> quality in a longitudinal study, e.g. in relation to other aspects of >> language change etc.. Re how "fixed" is "fixed": it's relative, much like >> many other aspects of language studies. By "idiomatic", one could mean that >> there is an element of idiosyncrasy (as "idiom"/"idioma"). >> >> The message that I am trying to get across is that "word" is a superflous >> category in the study of language. Would you mind please justifying why you >> need "words"? >> >> The same goes for morphology, actually. In essence, morphological >> analyses involve selective decomposition, not decomposition of all >> decomposable units. Hence if one is only accounting for variations within >> an expression as a ((sub-)character) sequence involving "morphemes" >> (assuming definable rigorously) and discounting the changes in other parts >> of the sequence, that would be an incomplete analysis of the expression. >> Instead, one can just refer to expressions as "expressions", as e.g. >> sequences/strings of various lengths/vocabs in (sub-)characters --- such an >> account is also more flexible and accommodating to diverse >> languages/registers/modalities. >> >> A study of "expressions" can cover all other aspects --- not just lexical >> but also functional ones. One doesn't need to incorporate/impose any ad hoc >> notions of "wordhood" in these studies. >> >> Suggestion: I believe there are many more interesting tasks in this area, >> instead of trying to find/define "words" within expressions, or to "parse" >> them according to some structuralist assumptions (i.e. >> morphologically/syntactically). For example, the community could start >> (some multi-year project) building an international multilingual parallel >> (note: not everything would be parallelizable) database of all expressions >> and terminologies ever existed with contextual (historical/cultural/social) >> information and start verifying their sources and status of current use. >> (Just be aware, though, that one is not reinforcing values that shouldn't >> be further emphasized / transfered to posterity --- as an ethical >> consideration. So if something is in the grey area now, document clearly >> what the current attitudes towards a certain value are, so posterity can >> look back and evaluate with respect to their point of view.) >> >> Counter questions to Archna: >> What are the motivations behind your suggestion to access/interpret >> language using "words"? How do you define "words" and justify the >> sufficiency/necessity of morphology/syntax in relation to the study of >> these expressions, esp. when the morphological decomposition of these >> expressions is arbitrary and helps little (or not at all) with explanation >> or prediction? >> >> Re "complex lexical terms", @Kilian: I'm just wondering what kind of >> terms that would be considered "terms" that wouldn't be considered lexical >> (I was tempted to add "lexical" to "expressions" as well, but thought that >> might be a bit redundant)? It depends on how one defines "terms", of >> course. And how "complex" are expressions really? They are just more >> calcified units after all, aren't they? (Why do we/some always seem to want >> to add the term "complex" to everything? Things that aren't "complex" are >> also worthy of studying!) >> >> Curious what you think... >> >> Thanks and best >> Ada >> >> Why I'm advocating #noWords: >> Fairness in Representation for Multilingual NLP: Insights from Controlled >> Experiments on Conditional Language Modeling >> https://openreview.net/forum?id=-llS6TiOew >> >> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenreview.net%2Fforum%3Fid%3D-llS6TiOew&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jea7YNI7295cJ2CY0jwxrsjID7DcDqerqI3IQxj9hUc%3D&reserved=0> >> https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view >> >> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Ffile%2Fd%2F1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9%2Fview&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZZ%2F8v%2FsH6RRAlIxLYsG1tYvFOFaTZFzVtCfvsQ8ZcuY%3D&reserved=0> >> (It took me a while for everything to sink in.) >> >> >> On Thu, Feb 9, 2023 at 3:27 PM Mike Scott via Corpora < >> [email protected]> wrote: >> >>> I must say I'm perfectly happy with "multi-word expression", or >>> "multi-word unit". >>> >>> I feel sympathy with Archna's post (and incidentally wish Archna didn't >>> have to go through a friend!) >>> Cheers -- Mike >>> >>> -- >>> >>> Mike Scottlexically.net >>> <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flexically.net%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lnpEPfv%2B4UmB1e0xVkC4hsIs%2B9GqwDnSzzMpwiFWZHw%3D&reserved=0> >>> Lexical Analysis Software and Aston University >>> >>> _______________________________________________ >>> Corpora mailing list -- [email protected] >>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ >>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra.info%2Fmailman3%2Fpostorius%2Flists%2Fcorpora.list.elra.info%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=78A%2BL24tQ0GBhZ3lAGipq9tLPZU%2FmydmGBGX1yE4BSA%3D&reserved=0> >>> To unsubscribe send an email to [email protected] >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "MWE Workshop 2023 Organizers" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/mweworkshop2023/CAB7Mis_GSyFjZOVw_XWp431VMJJBo0BnPqjFsqqTP_sEE58Ezw%40mail.gmail.com >> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fmweworkshop2023%2FCAB7Mis_GSyFjZOVw_XWp431VMJJBo0BnPqjFsqqTP_sEE58Ezw%2540mail.gmail.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FX%2BoXH1j9XL5X0tJuqc%2BfKdFkuugawZvrtzdXNUG2%2FA%3D&reserved=0> >> . >> For more options, visit https://groups.google.com/d/optout >> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1SGOAvFNmKwsKKOx6Kc%2Fm1wHzDbm%2F4xiEge3RY5etrE%3D&reserved=0> >> . >> >> >> >> >> >> >> -- >> Archna Bhatia, Ph.D. >> Research Scientist, Institute for Human & Machine Cognition >> 15 SE Osceola Ave, Ocala, FL 34471 >> (352) 387-3061 >> >> _______________________________________________ >> Corpora mailing list -- [email protected] >> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ >> To unsubscribe send an email to [email protected] >> >
_______________________________________________ Corpora mailing list -- [email protected] https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to [email protected]
