Forwarded message from Archna below

---------- Forwarded message ---------
Von: Archna Bhatia <[email protected]>
Date: Sa., 11. Feb. 2023 um 01:57 Uhr
Subject: Re: [Corpora-List] Fwd: Deadline extension: 19th Workshop on
Multiword Expressions (MWE 2023)
To: Ada Wan <[email protected]>
Cc: Kilian Evang <[email protected]>, [email protected] <
[email protected]>


Thanks, Ada. My point was not that the term “multiword expressions”
predates the term “idioms/idiomatic expressions” but that the category of
items, of which idioms is a subset, has been referred to as multiwords for
a long time. It may not be the perfect terminology but there’s some shared
understanding what kinds of expressions or constructions constitute this
category labeled as multiwords/multiword expressions/multiword units. I
would like to see strong evidence of better suitability of a new term to
refer to this category before I adopt it.

BTW, I’m curious do you have examples in mind which belong to this category
but show that wordhood might be a problematic notion? How frequent is this
phenomenon?

Also regarding emojis etc, I’m curious: are there combinations of emojis
that show some sort of idiosyncrasy when they cooccur? Or even combinations
of emojis and textual words or other utterances which show such
idiosyncratic behavior as we generally associate with “MWEs”? (It’s
possible but I had not thought of it until now and it would be interesting
to see that.)

BTW, are you planning on attending the MWE 2023 workshop? There would be a
lot of opportunity to discuss this with researchers currently working on
multiword expressions.

Thanks,
Archna

On Feb 11, 2023, at 4:02 AM, Ada Wan <[email protected]> wrote:


Hi Archna

"Idioms"/"Idiomatic expressions" are established terms in the study of
language [1], with a longer history than MWE [2]. "Fixed", e.g. in "fixed
phrases", is mentioned in, inter alia, [3], which was the earliest cite
from the earliest work on MWEs in the ACL Anthology [4]. If I understand
correctly, "MWEs" was a term so coined in order to establish a practice
based on "words" (if anyone should view this differently, please do correct
me here).

You're right, the task I suggested can be seen as orthogonal to
distinguishing between lexical expressions or non-lexical expressions. I
think it's important to document also the contexts surrounding expressions,
instead of just picking expressions out and studying them in an isolated
manner. It was just a suggestion for those who might be interested in
building a multilingual parallel lexical database as well as those who
might want to get a more holistic understanding of language while weaning
oneself of "words" --- now that it's become even more obvious how
superfluous the term/concept is.

[1] See e.g. https://en.wikipedia.org/wiki/Phraseme
<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPhraseme&data=05%7C01%7Cabhatia%40ihmc.org%7Cc5cf46a96be042a8bf0a08db0bb6b042%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638116651510036218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=54r5i9%2B%2FbEpySH7c52CPy6is3DR%2FyxeJ70EzF3VzcWg%3D&reserved=0>
[2] "Idiomatic expression" is just another formulation of "idiom" (see
https://www.thefreedictionary.com/idiomatic+expression
<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.thefreedictionary.com%2Fidiomatic%2Bexpression&data=05%7C01%7Cabhatia%40ihmc.org%7Cc5cf46a96be042a8bf0a08db0bb6b042%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638116651510036218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=wPtWJaUV2OOhYppVr4iiVquLKbkWdCvzT18BOJBrgvk%3D&reserved=0>).

According to Collins English Dictionary (accessed via
https://www.thefreedictionary.com/idiom
<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.thefreedictionary.com%2Fidiom&data=05%7C01%7Cabhatia%40ihmc.org%7Cc5cf46a96be042a8bf0a08db0bb6b042%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638116651510036218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=kIOsRYAQL34H1cxdRsv5b4z273zqF8OoRsCCSztXgSU%3D&reserved=0>),
"idiom" stems from the 16th century Latin idiōma, denoting "pecularity of
language".
[3] Nunberg, Geoffrey, Ivan A. Sag, and Thomas Wasow. 1994. Idioms.
Language, 70:491–538. https://doi.org/10.2307/416483
<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoi.org%2F10.2307%2F416483&data=05%7C01%7Cabhatia%40ihmc.org%7Cc5cf46a96be042a8bf0a08db0bb6b042%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638116651510036218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WrNafs6G%2Bd0eOVCj8XezwSKXDlK%2FTvygHRQyhfyXVqk%3D&reserved=0>
(Many older references on "idioms" by linguists can be found therein.)
[4] Ann Copestake, Fabre Lambeau, Aline Villavicencio, Francis Bond,
Timothy Baldwin, Ivan A. Sag, and Dan Flickinger. 2002. Multiword
expressions: linguistic precision and reusability. In Proceedings of the
Third International Conference on Language Resources and Evaluation
(LREC’02), Las Palmas, Canary Islands - Spain. European Language Resources
Association (ELRA).

------------------------------

Hi Kilian

Sorry about my oversight on "item". I do think "item" could be better than
"term" in this case, but it does carry a sense of "a single element", a
more discrete "singleton". It's ok to combine it with "complex" to mitigate
the sense of "singleton", but then "complex" as you suggested is dependent
on morphology, which can be problematic.

Re "lexical": sure. (I think there have been so many different
views/traditions/conventions among linguists and computational linguists in
the past, we don't necessarily have to agree on how we or our
definitions/methods might differ or might have differed, as long as we have
the same goal now?)

One argument for "expressions" would be that they could include a sign
(e.g. hand sign in motion).

So how about updating "MWEs" to:
i. "lexical expressions", or
ii. "lexical expressions (of one character or more when written)*", or
iii. [i] or [ii] without "lexical", or
iv. others?

* I'm trying to incorporate how expressions with emojis would/should be
treated too.

------------------------------

What do you all think?

Thanks and best
Ada

On Fri, Feb 10, 2023 at 10:58 AM Kilian Evang via Corpora <
[email protected]> wrote:

> Forwarded message from Archna below
>
> ---------- Forwarded message ---------
> Von: Archna Bhatia <[email protected]>
> Date: Do., 9. Feb. 2023 um 19:58 Uhr
> Subject: Re: [Corpora-List] Deadline extension: 19th Workshop on Multiword
> Expressions (MWE 2023)
> To: Ada Wan <[email protected]>, kilian Evang <[email protected]>
> Cc: Mike Scott <[email protected]>, [email protected] <
> [email protected]>, [email protected] <
> [email protected]>
>
>
> Thanks, Ada. I think using the terms “fixed” and “idiomatic” make the
> category appear more restrictive, and would need qualifications such as
> “fixed” is a relative term here, etc. With “multiwords/multiword
> expressions” also, there are stipulations (the notion of wordhood may not
> be applicable to every single language and in the same way) but since the
> term has been used for a long while, there is a bit of a shared
> understanding of this term, including about these stipulations. I am open
> to better terminology. Using just “expressions”, however, seems too vague
> and loses some generalizations about the idiosyncrasies that "multiword
> expressions” demonstrate. Every expression in not the same, “multiword
> expressions” show characteristics different from other expressions. I
> understand there is some fluidity also there when trying to distinguish
> between multiwords and non multiword expressions.
>
> There are so many angles that one could look at language from. I don’t see
> anything wrong with the view that studies expressions covering all aspects
> as you suggest without distinguishing between expressions based on notions
> of wordhood. The task you suggest will help in developing understanding
> about language and how languages are similar or different and how they are
> used.  I don’t think it disqualifies efforts that distinguish between
> “multiword expressions” and non-multiword expressions though, and the
> idiosyncrasies are not limited to morphology/syntax, idiosyncrasies are
> found in other linguistic aspects too when characterizing "multiword
> expressions”.
>
> ~ Archna
>
> On Feb 9, 2023, at 11:17 AM, Ada Wan <[email protected]> wrote:
>
> Hi Archna, hi Kilian, hi all
>
> Thanks for your replies.
>
> TLDR on my part: I'd be fine going with "expressions" (instead of
> "fixed/idiomatic expressions"). Neither "word" nor "morphology/syntax"
> (apart from the ordering of elements and/or sequential patterns) is
> necessary in the analyses of such.
>
> -----
>
> More specifically:
>
> [@Archna] Re "fixed/idiomatic expressions": I don't think it matters much
> whether they are "fixed" or "idiomatic". A "fixed expression" is one that
> is usually more impervious to (lexical) change. One can measure this
> quality in a longitudinal study, e.g. in relation to other aspects of
> language change etc.. Re how "fixed" is "fixed": it's relative, much like
> many other aspects of language studies. By "idiomatic", one could mean that
> there is an element of idiosyncrasy (as "idiom"/"idioma").
>
> The message that I am trying to get across is that "word" is a superflous
> category in the study of language. Would you mind please justifying why you
> need "words"?
>
> The same goes for morphology, actually. In essence, morphological analyses
> involve selective decomposition, not decomposition of all decomposable
> units. Hence if one is only accounting for variations within an expression
> as a ((sub-)character) sequence involving "morphemes" (assuming definable
> rigorously) and discounting the changes in other parts of the sequence,
> that would be an incomplete analysis of the expression. Instead, one can
> just refer to expressions as "expressions", as e.g. sequences/strings of
> various lengths/vocabs in (sub-)characters --- such an account is also more
> flexible and accommodating to diverse languages/registers/modalities.
>
> A study of "expressions" can cover all other aspects --- not just lexical
> but also functional ones. One doesn't need to incorporate/impose any ad hoc
> notions of "wordhood" in these studies.
>
> Suggestion: I believe there are many more interesting tasks in this area,
> instead of trying to find/define "words" within expressions, or to "parse"
> them according to some structuralist assumptions (i.e.
> morphologically/syntactically). For example, the community could start
> (some multi-year project) building an international multilingual parallel
> (note: not everything would be parallelizable) database of all expressions
> and terminologies ever existed with contextual (historical/cultural/social)
> information and start verifying their sources and status of current use.
> (Just be aware, though, that one is not reinforcing values that shouldn't
> be further emphasized / transfered to posterity --- as an ethical
> consideration. So if something is in the grey area now, document clearly
> what the current attitudes towards a certain value are, so posterity can
> look back and evaluate with respect to their point of view.)
>
> Counter questions to Archna:
> What are the motivations behind your suggestion to access/interpret
> language using "words"? How do you define "words" and justify the
> sufficiency/necessity of morphology/syntax in relation to the study of
> these expressions, esp. when the morphological decomposition of these
> expressions is arbitrary and helps little (or not at all) with explanation
> or prediction?
>
> Re "complex lexical terms", @Kilian: I'm just wondering what kind of terms
> that would be considered "terms" that wouldn't be considered lexical (I was
> tempted to add "lexical" to "expressions" as well, but thought that might
> be a bit redundant)? It depends on how one defines "terms", of course. And
> how "complex" are expressions really? They are just more calcified units
> after all, aren't they? (Why do we/some always seem to want to add the term
> "complex" to everything? Things that aren't "complex" are also worthy of
> studying!)
>
> Curious what you think...
>
> Thanks and best
> Ada
>
> Why I'm advocating #noWords:
> Fairness in Representation for Multilingual NLP: Insights from Controlled
> Experiments on Conditional Language Modeling
> https://openreview.net/forum?id=-llS6TiOew
>
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenreview.net%2Fforum%3Fid%3D-llS6TiOew&data=05%7C01%7Cabhatia%40ihmc.org%7Cc5cf46a96be042a8bf0a08db0bb6b042%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638116651510036218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CkNGheFci5rc7kmgQnpuRv8EsWFF4wsqWS45tFWHxAQ%3D&reserved=0>
> https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view
>
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Ffile%2Fd%2F1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9%2Fview&data=05%7C01%7Cabhatia%40ihmc.org%7Cc5cf46a96be042a8bf0a08db0bb6b042%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638116651510036218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tB2d5wKbL9DESN%2F3jw4nknrseFGMIduW1ohTbYw3tkw%3D&reserved=0>
> (It took me a while for everything to sink in.)
>
>
> On Thu, Feb 9, 2023 at 3:27 PM Mike Scott via Corpora <
> [email protected]> wrote:
>
>> I must say I'm perfectly happy with "multi-word expression", or
>> "multi-word unit".
>>
>> I feel sympathy with Archna's post (and incidentally wish Archna didn't
>> have to go through a friend!)
>> Cheers -- Mike
>>
>> --
>>
>> Mike Scottlexically.net 
>> <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flexically.net%2F&data=05%7C01%7Cabhatia%40ihmc.org%7Cc5cf46a96be042a8bf0a08db0bb6b042%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638116651510036218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=cIdWC1CLbsYTf5oMjmJpBeZTqQh%2BLQUqbDuznt3Cygs%3D&reserved=0>
>> Lexical Analysis Software and Aston University
>>
>> _______________________________________________
>> Corpora mailing list -- [email protected]
>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra.info%2Fmailman3%2Fpostorius%2Flists%2Fcorpora.list.elra.info%2F&data=05%7C01%7Cabhatia%40ihmc.org%7Cc5cf46a96be042a8bf0a08db0bb6b042%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638116651510036218%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2Fhaa%2B2tbcOnwwqvvB2ga5m4aaK9p2DAr6hlR%2BEPZdmo%3D&reserved=0>
>> To unsubscribe send an email to [email protected]
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "MWE Workshop 2023 Organizers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mweworkshop2023/CAB7Mis_GSyFjZOVw_XWp431VMJJBo0BnPqjFsqqTP_sEE58Ezw%40mail.gmail.com
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fmweworkshop2023%2FCAB7Mis_GSyFjZOVw_XWp431VMJJBo0BnPqjFsqqTP_sEE58Ezw%2540mail.gmail.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=05%7C01%7Cabhatia%40ihmc.org%7Cc5cf46a96be042a8bf0a08db0bb6b042%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638116651510192467%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=72hKsAxd2kNY4P0jh7GxRJbhD3dpmUOZD8SWn1LBbo8%3D&reserved=0>
> .
> For more options, visit https://groups.google.com/d/optout
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=05%7C01%7Cabhatia%40ihmc.org%7Cc5cf46a96be042a8bf0a08db0bb6b042%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638116651510192467%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gz1TLszGvRDR7mEbEf0GVJ2AW8T4K%2FwOm6KFNlOQt%2Fc%3D&reserved=0>
> .
>
>
>
>
>
>
> --
> Archna Bhatia, Ph.D.
> Research Scientist, Institute for Human & Machine Cognition
> 15 SE Osceola Ave, Ocala, FL 34471
> (352) 387-3061
>
> _______________________________________________
> Corpora mailing list -- [email protected]
> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra.info%2Fmailman3%2Fpostorius%2Flists%2Fcorpora.list.elra.info%2F&data=05%7C01%7Cabhatia%40ihmc.org%7Cc5cf46a96be042a8bf0a08db0bb6b042%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638116651510192467%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VUHmtKHAKK7fvy5sRAiy3oKArRnMAsEvfawVKP%2B0Jng%3D&reserved=0>
> To unsubscribe send an email to [email protected]
>
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to