[Corpora-List] Re: Fwd: Deadline extension: 19th Workshop on Multiword Expressions (MWE 2023)

Kilian Evang via Corpora Wed, 15 Feb 2023 04:24:36 -0800

Hi Ada,

Of course what counts as a morpheme or as a lexical expression, and what
inventory of compositional rules one assumes, is subject to one's theory,
to language change, and also to ad-hoc playful reinterpretation (that's
what I would see your "s or p?" example as). But these are notions that
need not be 100% precise in order to delineate a research area such as MWE.
There can be a gray area as to what counts as an MWE and what doesn't. For
example, many MWE researchers would probably not count a prefix verb like
*beguile* as an MWE, simply because it fulfills all the criteria of
wordhood in traditional Western NLP. But if we assume a wide definition
such as "lexical expression consisting of more than morpheme", it too would
fall under the MWE label. In fact, it exhibits the same competition between
a lexical/idiomatic reading and a compositional reading that is typical of
more complex MWEs: "kick the bucket" could mean to kick the bucket or to
die, "beguile" could mean to affect with guile or to deceive.


I would support a name change from MWE to CLE or similar, because I agree
that "word" is not a very useful notion cross-linguistically. (Then again,
the notion of MWE might still work okay if we assume Martin Haspelmath's
retro-definition <https://dlc.hypotheses.org/2621> of "word".)

Cheers,
Kilian

Am Sa., 11. Feb. 2023 um 18:29 Uhr schrieb Ada Wan <[email protected]>:

> Hi Archna
>
> Thanks for your reply.
>
> Your justification of the continual usage of "MWEs"/"words" is based on
> history and shared understanding (from 09Feb2023: "since the term has been
> used for a long while, there is a bit of a shared understanding of this
> term, including about these stipulations"), both of these criteria are
> achievable with alternate formulations.
>
> Re "the category of items, of which idioms is a subset, has been referred
> to as multiwords for a long time": "MWE" does not have that long of a
> history --- what is the earliest use of "MWEs" that you have in your
> records? And even if terms have been used for a long while, it doesn't mean
> that we cannot change them for the better, esp. when they have been
> inappropriately adopted or found outdated. What objections do you have with
> "lexical expressions", for example?
>
> The issue/problem with "word" is that, aside from it not being necessary
> or sufficient in the study of language or in computing, there is also an
> implicit, shared understanding that it is arbitrary, redundant, and
> indeterminate. (This applies also to the notion of wordhood within one
> language.) The indeterminacy part is evident in your not having provided me
> with a definition of "words" thus far as well. Furthermore, as you
> confirmed earlier: "the notion of wordhood may not be applicable to every
> single language and in the same way", then how should "words" be robust
> enough for computational processing?
>
> Re emojis: here are some examples of emoji combinations that show a sense
> of idiosyncrasy when they (co-)occur:
> 🤩 for "star-struck" (from
> https://unicode.org/emoji/charts/full-emoji-list.html)
> Or from from
> https://www.elitedaily.com/lifestyle/funny-emoji-combinations-tiktok:
> 👉 👈 (feeling shy/simping)
> 🚪🏃‍♀️💨 (time to leave)
> 🍿🤏😯 (when drama is happening/when something is going down)
> 👁👄👁 (blank stare)
> 🕳👨‍🦯 (I didn't see anything)
> 👩🤏👩‍🦲 (wig snatched)
> 🐂💩 (bullsh*t)
>
> My concern is on "wordhood" in the "language space"
> (science/engineering/technology) in general, not just on lexical
> expressions. I do think, however, that SIGLEX could help play an important
> role in effecting some positive changes in this regard.
>
> ----------
>
> Hi Kilian
>
> Let's suppose that what we have thus far known as "grammar" (the one that
> has been based on or related to "words" or "sentences", i.e.
> morphology/syntax (and some phonology)) can be decomposed into (sequential)
> ordering and linguistic attitudes/normativity [1]. I do think
> judgments/attitudes play a role in language as it exists in the social
> world and can affect, or even determine, how registers/styles etc. are
> defined, but I also think that there is more rigorous science of (the
> remaining aspects of) language possible if we were to separate such
> attitudes/prescriptivism from a more descriptive stance (e.g. in the
> direction of information sciences).
>
> Once we remove the attitudes/normativity part from the science of
> language, lexical and contextual information as well as function/use
> remain.
>
> The reason why I hesitated in referring to MWEs as "complex" is because
> (lexical) "complexity" can be broken down into vocabulary and length, with
> use/frequency accounting for pragmatic/functional one. Hence every
> expression (or any character string) is lexical.
> The element of idiosyncrasy/idiomaticity is really a pragmatic one (e.g.
> in the rarity/archaic-ness/uniqueness of the use of the
> expressions/segment/span or character n-grams).
> So "sing" can be seen as a lexical expression, just like "bing" or "ping".
> Let's not forget that (even according to traditional grammatical analyses)
> various linguistic effects can happen to expressions when they undergo
> frequent use over an extended period of time. E.g. "ping me" may be seen
> thus far as relatively more idiomatic than "sing me a song", but that's due
> to the former expression being more specialized, less general, or rarer in
> use. Also, e.g. in a conversation, if one said "sing me" and the other
> didn't quite catch the first bit of the phrase, they might ask "[s] or
> [p]?" or "'s' or 'p'?". And one can well imagine that if this becomes in
> use more frequently, "s" and "p" can be regarded as what we'd now interpret
> as "idiomatic". Hence "sing" does not have to be seen as a "single
> morpheme".
>
> [1] I have tweeted this before on 28Jan2023:
> https://twitter.com/adawan919/status/1619401653962297344?cxt=HHwWgMDS0a3OovksAAAA
> In a way, I am reinterpreting "(non)-compositionality" as
> normalization/frequency effects via the decomposed view of "grammar" above.
>
>
> -----------------------------------------------------------------------------------------
>
>
>
>
> *Hence, my proposal (not just for MWE workshop folks but perhaps for all
> who might be interested) would be:
> https://docs.google.com/document/d/1n4QRn0CxbVMj6kbLWo-byT3S26ODJOjicU-ZYw7jRG0/edit?usp=sharing
> <https://docs.google.com/document/d/1n4QRn0CxbVMj6kbLWo-byT3S26ODJOjicU-ZYw7jRG0/edit?usp=sharing>*
>
> *Comments welcome. *
> Thanks and best
> Ada
>
> On Sat, Feb 11, 2023 at 2:01 PM Kilian Evang <[email protected]>
> wrote:
>
>> Hi Ada,
>>
>> The problem I have with the term "expression" without further
>> qualification is that to my mind it includes any kind of linguistic sign,
>> including ones like "to pay a visit to my dear aunt Ruth" which can clearly
>> be interpreted compositionally. So I think we do have to specify "lexical"
>> to delineate what we are studying in the MWE community. "Lexical item" or,
>> sure, "lexical expression". Either would also include signs, of course. I
>> do also feel we have to add "complex" or similar, because otherwise it
>> includes single-morpheme lexical expressions like "sing".
>>
>> Cheers,
>> Kilian
>>
>> Am Fr., 10. Feb. 2023 um 23:32 Uhr schrieb Ada Wan <[email protected]>:
>>
>>> Hi Archna
>>>
>>> "Idioms"/"Idiomatic expressions" are established terms in the study of
>>> language [1], with a longer history than MWE [2]. "Fixed", e.g. in "fixed
>>> phrases", is mentioned in, inter alia, [3], which was the earliest cite
>>> from the earliest work on MWEs in the ACL Anthology [4]. If I understand
>>> correctly, "MWEs" was a term so coined in order to establish a practice
>>> based on "words" (if anyone should view this differently, please do correct
>>> me here).
>>>
>>> You're right, the task I suggested can be seen as orthogonal to
>>> distinguishing between lexical expressions or non-lexical expressions. I
>>> think it's important to document also the contexts surrounding expressions,
>>> instead of just picking expressions out and studying them in an isolated
>>> manner. It was just a suggestion for those who might be interested in
>>> building a multilingual parallel lexical database as well as those who
>>> might want to get a more holistic understanding of language while weaning
>>> oneself of "words" --- now that it's become even more obvious how
>>> superfluous the term/concept is.
>>>
>>> [1] See e.g. https://en.wikipedia.org/wiki/Phraseme
>>> [2] "Idiomatic expression" is just another formulation of "idiom" (see
>>> https://www.thefreedictionary.com/idiomatic+expression).
>>> According to Collins English Dictionary (accessed via
>>> https://www.thefreedictionary.com/idiom), "idiom" stems from the 16th
>>> century Latin idiōma, denoting "pecularity of language".
>>> [3] Nunberg, Geoffrey, Ivan A. Sag, and Thomas Wasow. 1994. Idioms.
>>> Language, 70:491–538. https://doi.org/10.2307/416483
>>> (Many older references on "idioms" by linguists can be found therein.)
>>> [4] Ann Copestake, Fabre Lambeau, Aline Villavicencio, Francis Bond,
>>> Timothy Baldwin, Ivan A. Sag, and Dan Flickinger. 2002. Multiword
>>> expressions: linguistic precision and reusability. In Proceedings of the
>>> Third International Conference on Language Resources and Evaluation
>>> (LREC’02), Las Palmas, Canary Islands - Spain. European Language Resources
>>> Association (ELRA).
>>>
>>> ------------------------------
>>>
>>> Hi Kilian
>>>
>>> Sorry about my oversight on "item". I do think "item" could be better
>>> than "term" in this case, but it does carry a sense of "a single element",
>>> a more discrete "singleton". It's ok to combine it with "complex" to
>>> mitigate the sense of "singleton", but then "complex" as you suggested is
>>> dependent on morphology, which can be problematic.
>>>
>>> Re "lexical": sure. (I think there have been so many different
>>> views/traditions/conventions among linguists and computational linguists in
>>> the past, we don't necessarily have to agree on how we or our
>>> definitions/methods might differ or might have differed, as long as we have
>>> the same goal now?)
>>>
>>> One argument for "expressions" would be that they could include a sign
>>> (e.g. hand sign in motion).
>>>
>>> So how about updating "MWEs" to:
>>> i. "lexical expressions", or
>>> ii. "lexical expressions (of one character or more when written)*", or
>>> iii. [i] or [ii] without "lexical", or
>>> iv. others?
>>>
>>> * I'm trying to incorporate how expressions with emojis would/should be
>>> treated too.
>>>
>>> ------------------------------
>>>
>>> What do you all think?
>>>
>>> Thanks and best
>>> Ada
>>>
>>> On Fri, Feb 10, 2023 at 10:58 AM Kilian Evang via Corpora <
>>> [email protected]> wrote:
>>>
>>>> Forwarded message from Archna below
>>>>
>>>> ---------- Forwarded message ---------
>>>> Von: Archna Bhatia <[email protected]>
>>>> Date: Do., 9. Feb. 2023 um 19:58 Uhr
>>>> Subject: Re: [Corpora-List] Deadline extension: 19th Workshop on
>>>> Multiword Expressions (MWE 2023)
>>>> To: Ada Wan <[email protected]>, kilian Evang <[email protected]
>>>> >
>>>> Cc: Mike Scott <[email protected]>, [email protected] <
>>>> [email protected]>, [email protected] <
>>>> [email protected]>
>>>>
>>>>
>>>> Thanks, Ada. I think using the terms “fixed” and “idiomatic” make the
>>>> category appear more restrictive, and would need qualifications such as
>>>> “fixed” is a relative term here, etc. With “multiwords/multiword
>>>> expressions” also, there are stipulations (the notion of wordhood may not
>>>> be applicable to every single language and in the same way) but since the
>>>> term has been used for a long while, there is a bit of a shared
>>>> understanding of this term, including about these stipulations. I am open
>>>> to better terminology. Using just “expressions”, however, seems too vague
>>>> and loses some generalizations about the idiosyncrasies that "multiword
>>>> expressions” demonstrate. Every expression in not the same, “multiword
>>>> expressions” show characteristics different from other expressions. I
>>>> understand there is some fluidity also there when trying to distinguish
>>>> between multiwords and non multiword expressions.
>>>>
>>>> There are so many angles that one could look at language from. I don’t
>>>> see anything wrong with the view that studies expressions covering all
>>>> aspects as you suggest without distinguishing between expressions based on
>>>> notions of wordhood. The task you suggest will help in developing
>>>> understanding about language and how languages are similar or different and
>>>> how they are used.  I don’t think it disqualifies efforts that distinguish
>>>> between “multiword expressions” and non-multiword expressions though, and
>>>> the idiosyncrasies are not limited to morphology/syntax, idiosyncrasies are
>>>> found in other linguistic aspects too when characterizing "multiword
>>>> expressions”.
>>>>
>>>> ~ Archna
>>>>
>>>> On Feb 9, 2023, at 11:17 AM, Ada Wan <[email protected]> wrote:
>>>>
>>>> Hi Archna, hi Kilian, hi all
>>>>
>>>> Thanks for your replies.
>>>>
>>>> TLDR on my part: I'd be fine going with "expressions" (instead of
>>>> "fixed/idiomatic expressions"). Neither "word" nor "morphology/syntax"
>>>> (apart from the ordering of elements and/or sequential patterns) is
>>>> necessary in the analyses of such.
>>>>
>>>> -----
>>>>
>>>> More specifically:
>>>>
>>>> [@Archna] Re "fixed/idiomatic expressions": I don't think it matters
>>>> much whether they are "fixed" or "idiomatic". A "fixed expression" is one
>>>> that is usually more impervious to (lexical) change. One can measure this
>>>> quality in a longitudinal study, e.g. in relation to other aspects of
>>>> language change etc.. Re how "fixed" is "fixed": it's relative, much like
>>>> many other aspects of language studies. By "idiomatic", one could mean that
>>>> there is an element of idiosyncrasy (as "idiom"/"idioma").
>>>>
>>>> The message that I am trying to get across is that "word" is a
>>>> superflous category in the study of language. Would you mind please
>>>> justifying why you need "words"?
>>>>
>>>> The same goes for morphology, actually. In essence, morphological
>>>> analyses involve selective decomposition, not decomposition of all
>>>> decomposable units. Hence if one is only accounting for variations within
>>>> an expression as a ((sub-)character) sequence involving "morphemes"
>>>> (assuming definable rigorously) and discounting the changes in other parts
>>>> of the sequence, that would be an incomplete analysis of the expression.
>>>> Instead, one can just refer to expressions as "expressions", as e.g.
>>>> sequences/strings of various lengths/vocabs in (sub-)characters --- such an
>>>> account is also more flexible and accommodating to diverse
>>>> languages/registers/modalities.
>>>>
>>>> A study of "expressions" can cover all other aspects --- not just
>>>> lexical but also functional ones. One doesn't need to incorporate/impose
>>>> any ad hoc notions of "wordhood" in these studies.
>>>>
>>>> Suggestion: I believe there are many more interesting tasks in this
>>>> area, instead of trying to find/define "words" within expressions, or to
>>>> "parse" them according to some structuralist assumptions (i.e.
>>>> morphologically/syntactically). For example, the community could start
>>>> (some multi-year project) building an international multilingual parallel
>>>> (note: not everything would be parallelizable) database of all expressions
>>>> and terminologies ever existed with contextual (historical/cultural/social)
>>>> information and start verifying their sources and status of current use.
>>>> (Just be aware, though, that one is not reinforcing values that shouldn't
>>>> be further emphasized / transfered to posterity --- as an ethical
>>>> consideration. So if something is in the grey area now, document clearly
>>>> what the current attitudes towards a certain value are, so posterity can
>>>> look back and evaluate with respect to their point of view.)
>>>>
>>>> Counter questions to Archna:
>>>> What are the motivations behind your suggestion to access/interpret
>>>> language using "words"? How do you define "words" and justify the
>>>> sufficiency/necessity of morphology/syntax in relation to the study of
>>>> these expressions, esp. when the morphological decomposition of these
>>>> expressions is arbitrary and helps little (or not at all) with explanation
>>>> or prediction?
>>>>
>>>> Re "complex lexical terms", @Kilian: I'm just wondering what kind of
>>>> terms that would be considered "terms" that wouldn't be considered lexical
>>>> (I was tempted to add "lexical" to "expressions" as well, but thought that
>>>> might be a bit redundant)? It depends on how one defines "terms", of
>>>> course. And how "complex" are expressions really? They are just more
>>>> calcified units after all, aren't they? (Why do we/some always seem to want
>>>> to add the term "complex" to everything? Things that aren't "complex" are
>>>> also worthy of studying!)
>>>>
>>>> Curious what you think...
>>>>
>>>> Thanks and best
>>>> Ada
>>>>
>>>> Why I'm advocating #noWords:
>>>> Fairness in Representation for Multilingual NLP: Insights from
>>>> Controlled Experiments on Conditional Language Modeling
>>>> https://openreview.net/forum?id=-llS6TiOew
>>>>
>>>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenreview.net%2Fforum%3Fid%3D-llS6TiOew&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jea7YNI7295cJ2CY0jwxrsjID7DcDqerqI3IQxj9hUc%3D&reserved=0>
>>>> https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view
>>>>
>>>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Ffile%2Fd%2F1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9%2Fview&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZZ%2F8v%2FsH6RRAlIxLYsG1tYvFOFaTZFzVtCfvsQ8ZcuY%3D&reserved=0>
>>>> (It took me a while for everything to sink in.)
>>>>
>>>>
>>>> On Thu, Feb 9, 2023 at 3:27 PM Mike Scott via Corpora <
>>>> [email protected]> wrote:
>>>>
>>>>> I must say I'm perfectly happy with "multi-word expression", or
>>>>> "multi-word unit".
>>>>>
>>>>> I feel sympathy with Archna's post (and incidentally wish Archna
>>>>> didn't have to go through a friend!)
>>>>> Cheers -- Mike
>>>>>
>>>>> --
>>>>>
>>>>> Mike Scottlexically.net 
>>>>> <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flexically.net%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lnpEPfv%2B4UmB1e0xVkC4hsIs%2B9GqwDnSzzMpwiFWZHw%3D&reserved=0>
>>>>> Lexical Analysis Software and Aston University
>>>>>
>>>>> _______________________________________________
>>>>> Corpora mailing list -- [email protected]
>>>>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>>>>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra.info%2Fmailman3%2Fpostorius%2Flists%2Fcorpora.list.elra.info%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=78A%2BL24tQ0GBhZ3lAGipq9tLPZU%2FmydmGBGX1yE4BSA%3D&reserved=0>
>>>>> To unsubscribe send an email to [email protected]
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "MWE Workshop 2023 Organizers" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/mweworkshop2023/CAB7Mis_GSyFjZOVw_XWp431VMJJBo0BnPqjFsqqTP_sEE58Ezw%40mail.gmail.com
>>>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fmweworkshop2023%2FCAB7Mis_GSyFjZOVw_XWp431VMJJBo0BnPqjFsqqTP_sEE58Ezw%2540mail.gmail.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FX%2BoXH1j9XL5X0tJuqc%2BfKdFkuugawZvrtzdXNUG2%2FA%3D&reserved=0>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout
>>>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1SGOAvFNmKwsKKOx6Kc%2Fm1wHzDbm%2F4xiEge3RY5etrE%3D&reserved=0>
>>>> .
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Archna Bhatia, Ph.D.
>>>> Research Scientist, Institute for Human & Machine Cognition
>>>> 15 SE Osceola Ave, Ocala, FL 34471
>>>> (352) 387-3061
>>>>
>>>> _______________________________________________
>>>> Corpora mailing list -- [email protected]
>>>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>>>> To unsubscribe send an email to [email protected]
>>>>
>>>

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] Re: Fwd: Deadline extension: 19th Workshop on Multiword Expressions (MWE 2023)

Reply via email to