Dear all

Thank you for all your feedback.

As George mentioned in his reply "Please, let us always remember that there
is a *person with feelings *on the other side of a communication. We need
to gently and respectfully handle cases where we have objections.", I
cannot agree more.

There is a certain degree of empathy that one needs to exercise in reading,
writing, and in research (even for technical research, esp. if one has only
been educated in one discipline. If one does not understand why other
disciplines might have different assumptions and developmental histories or
(perceived) narratives, it is best to check/verify that first before
"attacking" others or their arguments. Interdisciplinary/Transdisciplinary
work is also difficult for that reason (e.g. in
translating/addressing/aligning assumptions/expectations).
As Jonas noted, "(And everyone on the list -- I am sending this to the
whole list in case I am wrong about some things, so others can add their
thoughts)." --- I agree! This practice can also lead to better
transdisciplinary understanding and exchange.
It can also be hard to believe our research/tech space has come to this,
but if you'd allow me to explain ---

First of all, I think most of you might know me from my public rebuttals
for my ICLR2021 & 2022 submissions. For the latter, in which I decomposed
"words" more explicitly, I had to really "fight" hard to convince the
reviewers. That also has to do with the fact that the concept of "word"
(and also "morphology") and the decades-long assumption and adoption of
these in CL/NLP/Linguistics might have been too casual/imprecise/negligent
of a choice and practice. As my work has shown, a mistake therein was /
might have been made. Some students have been miseducated --- myself partly
included, but since CL/NLP/Linguistics were not the only subject(s) that I
have studied, it might have been easier for me to abandon these
assumptions, but for many others, this may not have been the case. If we
continue with these practices in the research space through conferences or
research activities, such malpractice would be exacerbated.

Textual data can be *processed* without word tokenization or sentence
segmentation [1]. One can process data in full --- in character/byte
representations (depending on the task and computational resources, e.g.
for pattern matching for strings, one would work with characters, for other
tasks, bytes). Depending on the nature of the tasks and methods, our
*evaluation
and interpretation* strategies may differ. Computational neural network
models are statistical models and need to be evaluated and interpreted
statistically --- this is the perspective of many computer scientists and
statisticians and it is correct. In the tradition of CL/NLP/Linguistics (or
even many in data analytics or in digital humanities), there had been an
erroneous assumption and practice that one could evaluate statistical
models based on textual output only.

As with areas related to "language" outside the context of computing, e.g.
Linguistics (without the use of computational tools), there are certain
structural assumptions (from the past decades) that need to be refined. I
have been trying to advocate the broadening of one's
perspectives/interpretations of "language" to ones that are without
"words", "sentences", "linguistic structure(s)", "grammar", and
"p-language(s)". These concepts denote nothing universal (or determinate
--- not without circularity) and the amplification of these through
technology/computing can lead to unethical/unhealthy consequences. I have
the impression our understanding of this (may one be a linguist, CL/NLPer,
computing professional or AI-practitioner) may not be aligned.

As many disciplines/sectors are now leveraging similar/same methods, I feel
that there is a responsibility to clarity this.

Last but not least, please note/notice that I have only been *responsive*
to announcements with potential concerns (e.g.those scientific or ethical
in nature). I did/do not proactively advertise my own work or have the
intent to do so on this mailing list just for fun or to offend others.

As always, I remain open for your feedback.

Thank you for your attention.

Best regards
Ada

[1] @Jonas:
re "sentences":
i. "sentence" is not a universal concept crosslinguistically or
cross-stylistically (e.g. across genres) or across modalities
(speech/signing does not occur in form of "sentences", esp. natural
speech/signing);
ii. even if "sentence" were defined "x-centrically" (if definable at all),
where x denotes a certain style, for example; stylistic hegemony would
occur, not to mention that overfitting to any one style is likely to lead
to bad generalizations;
iii. re "I don't think conference organizers usually make hard
prescriptions on what constitutes a sentence" --- that is a problem, isn't
it? There is no standardization possible either. "Sentences" are also
indeterminate, esp. in the context of computing. We wouldn't want to
encourage "sentence"-hacking, would we?
iv. in many NLP toolkits, "sentence" often refers to "line" (as delimited
by linebreaks),
v. for those who have worked on data collection and curation before, esp.
for parallel data, content is often aligned by line (and that can already
be difficult).
Thanks for your content-rich comments, btw!


On Fri, Aug 25, 2023 at 12:59 PM George Giannakopoulos <
[email protected]> wrote:

> Dear All,
>
> I would like to warmly suggest/remind the following to all of us (as a
> friendly suggestion, on which I will not follow up):
>
>    - One can find online good examples for the *"netiquette"* of mailing
>    lists to reduce problems (see here
>    <https://www.snort.org/faq/what-is-the-mailing-list-etiquette>, here
>    <https://en.opensuse.org/openSUSE:Mailing_list_netiquette> and here
>    <https://sites.ualberta.ca/~pletendr/list-net.html> for examples,
>    which can be useful for all of us).
>    - Please, let us always remember that there is a *person with feelings
>    * on the other side of a communication. We need to gently and
>    respectfully handle cases where we have objections.
>    - If you feel that a conversation grows too big or is somehow
>    problematic, address a *personal** e-mail to a main contributor
>    suggesting nicely an alternative* you consider more appropriate. If
>    this fails systematically, then scale it up through a list moderator (or
>    the list itself) politely.
>    - Specific *suggestions for appropriate digital spaces* that can hold
>    e.g. long discussions may allow all such discussion to find their own nest
>    after a given point, so that we all have a common additional resource
>    connected to the list, for topics that do need the added interaction.
>    - If you feel that a topic you contribute to really ignites
>    interesting conversation or if you simply receive an e-mail suggesting you
>    to move a long conversation elsewhere due to its size, *consider an
>    alternative* (or even ask the list for one),  to facilitate the use of
>    the mailing list itself.
>    - Let us remember that what is *uninteresting to us may be interesting
>    to others*.
>
> As a final comment, before best practices comes *common understanding*
> and *good will*. Let us primarily build on these, as we have done in this
> list for many years.
>
> Having said the above, I would like to thank Ada (and all the others) for
> the contributions (past, current and future) and discussions that keep this
> list alive.
>
> Best regards,
> George G.
>
> P.S. I would also like to thank Gully for trying to keep the list humane.
>
> On 23/8/23 00:53, Gully Burns via Corpora wrote:
>
> Dear all,
>
> I was shocked to see a vitriolic ad-hominem attack on a colleague posted
> to this mailing list. It is entirely inappropriate to post this type of
> diatribe against an individual even though someone might disagree with
> either the tone or the content of an individual's messages or arguments.
> The fact that other members of the community chimed in to reinforce the
> attack is also appalling and entirely inappropriate.
>
> Sincerely,
>
> Gully Burns
>
> On Tue, Aug 22, 2023 at 1:23 PM Ada Wan via Corpora <
> [email protected]> wrote:
>
>> Dear all on the Corpora-List
>>
>> I understand it is possible that some of you may harbor some negative
>> sentiments towards me and/or my recent replies on the list.
>> That having been expressed, I would like to remind everyone on this list
>> it is important to understand that many subjects such as computational [x,
>> where x can be e.g. linguistics, biology, physics, modeling...], digital
>> humanities, data analytics, data science, and many of their dependencies
>> have been / are in the public domain, much of which academic and scientific
>> in nature. Science is in the public domain.
>>
>> What we are experiencing here is sort of a computational and statistical
>> turn in the computational sciences and studies --- anything that involves
>> data (computational and otherwise). Previously (or even currently in many
>> disciplines/practices), one has modeled / has been modeling many symbolic
>> concepts and values computationally, directly inheriting these from
>> "traditional sciences" (i.e. sciences from a time when all was done without
>> any computational machinery), assuming that these values and the
>> relationship between such would not only hold but also hold as the only
>> ground truth. But as e.g. my results have shown, many of these scientific
>> concepts, values, and relationships deserve to be re-evaluated and
>> re-interpreted.
>>
>> What I have been trying to do is to communicate this, as without any
>> updates and/or self-correction, we could be experiencing many discrepancies
>> in our experimental results. Good scientific practice (including good
>> assumptions therefor) is fundamental to everyone. This includes but is not
>> limited to having good assumptions, leveraging appropriate methods, being
>> responsible in evaluation as well as addressing ethical concerns, e.g. in
>> the case of my findings: a combination of false assumptions and
>> miseducation. (Sorry to re-iterate this but it is just such an important
>> lesson for many on this list... it may be painful for some too.)
>>
>> Corpora-list might have changed more or less like how the field of CL/NLP
>> has in the past decades. While these areas might have become more
>> generalized and thus the audience more "diverse" in terms of background and
>> areas of familiarity, there are certainly some on this list who are
>> concerned about some of the "bad" science/values that could get propagated
>> through the use of data/corpora. That is one of the reasons behind my many
>> replies of late.
>>
>>
>> *If you should find my comments/replies an issue of concern, please let
>> me know what in specifics you disagree with. I'd be happy to modify my
>> formulations or discuss further. If you think I have been wrong somewhere,
>> please do let me know. I'd be happy to update.  *
>>
>> Thanks and best
>> Ada
>>
>> On Mon, Aug 21, 2023 at 5:39 PM Ada Wan <[email protected]> wrote:
>>
>>> Amendment:
>>> In short, there are no symbolic concepts relevant in computing /
>>> computational processing except for those which also align with statistics.
>>> (There are various levels of assumptions/abstractions that could be
>>> relevant depending on the goals/tasks. But much of what one might have been
>>> doing in "symbolic computing" surely deserves a critical re-examination.
>>>
>>> On Mon, Aug 21, 2023 at 4:48 PM Ada Wan <[email protected]> wrote:
>>>
>>>> Dear Ben, Rodolfo, and Toms
>>>>
>>>> Please accept that there is a responsibility to science, technology,
>>>> engineering, and education (or anything that we undertake).
>>>>
>>>> If you could point out the specific arguments as to which of what I
>>>> wrote may be problematic to you, perhaps we can have a constructive
>>>> exchange. The way in which you three expressed your sentiments on this
>>>> thread can be interpreted as mobbing.
>>>>
>>>> Please note the intent behind my statement and lend me the benefit of a
>>>> doubt as to why I would have invested my time and energy to write the reply
>>>> that I did to the list:
>>>> "As language sciences (e.g. Linguistics) and NLP are still taught at
>>>> some universities, i.e. part of publicly accessible education, there is a
>>>> general responsibility that one should bear when promoting/hosting events
>>>> that would be explicitly/implicitly supporting biases and/or in violation
>>>> of scientific integrity."
>>>> This applies to the whole area of computing, including digital
>>>> humanities and the computational social sciences.* In short, there are
>>>> no symbolic concepts relevant in computing / computational processing.*
>>>> I am sorry if that has not been clear.
>>>>
>>>> I understand that there are members in the CL/NLP community/communities
>>>> who might be interested in (or used/addicted to) "word" hacking. But it is
>>>> now high time to stop.
>>>>
>>>> @Ben: Please note that I am not doing this "for fun". I am not trying
>>>> to ridicule anyone. My remarks are not ad personam. For each of the
>>>> research directions/practices that I commented on, there are opportunities
>>>> for all practitioners to do a better job, to refine our analyses.
>>>>
>>>> Thanks and best
>>>> Ada
>>>>
>>>>
>>>> On Mon, Aug 21, 2023 at 9:45 AM Toms Bergmanis via Corpora <
>>>> [email protected]> wrote:
>>>>
>>>>> Can’t agree more.
>>>>>
>>>>> Toms
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *From:* Rodolfo Delmonte via Corpora <[email protected]>
>>>>> *Sent:* Monday, August 21, 2023 10:06 AM
>>>>> *To:* Ben Sir <[email protected]>
>>>>> *Cc:* corpora <[email protected]>
>>>>> *Subject:* [Corpora-List] Re: RANLP 2023 Call for Participation
>>>>>
>>>>>
>>>>>
>>>>> Fully agree with you Ben.
>>>>>
>>>>> Rodolfo
>>>>>
>>>>>
>>>>>
>>>>> Il lun 21 ago 2023, 01:00 Ben Sir via Corpora <[email protected]>
>>>>> ha scritto:
>>>>>
>>>>> Hi Ada,
>>>>>
>>>>> It's understandable that enthusiasm can sometimes lead to excessive
>>>>> engagement, but your disruptive posting on the mailing list has reached an
>>>>> intolerable level. Please keep your conversations private instead of
>>>>> spamming everyone and curb your enthusiasm. Your obnoxious behavior
>>>>> reflects poorly on you.
>>>>>
>>>>> Thanks.
>>>>> _______________________________________________
>>>>> Corpora mailing list -- [email protected]
>>>>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>>>>> To unsubscribe send an email to [email protected]
>>>>>
>>>>>
>>>>>
>>>>> Nota automatica aggiunta dal sistema di posta
>>>>>
>>>>>
>>>>>
>>>>> *Sostieni il futuro*
>>>>>
>>>>> Dona il tuo 5x1000 al Collegio Internazionale Ca' Foscari
>>>>>
>>>>> *FINANZIAMENTO DELLA RICERCA SCIENTIFICA E DELLA UNIVERSITÀ | CODICE
>>>>> FISCALE: 80007720271*
>>>>> _______________________________________________
>>>>> Corpora mailing list -- [email protected]
>>>>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>>>>> To unsubscribe send an email to [email protected]
>>>>>
>>>> _______________________________________________
>> Corpora mailing list -- [email protected]
>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>> To unsubscribe send an email to [email protected]
>>
>
> _______________________________________________
> Corpora mailing list -- 
> [email protected]https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
> To unsubscribe send an email to [email protected]
>
> --
> ------------------------------
>
> *George Giannakopoulos, PhD*
>
> *Researcher*
> Home page <http://www.iit.demokritos.gr/~ggianna>
> SKEL Lab - NCSR Demokritos <http://www.iit.demokritos.gr>
> and
>
> *Scientific Officer*
> ahedd DIH - NCSR "Demokritos" <https://ahedd.demokritos.gr>
> and
>
> *Co-founder, Chief Executive Officer*
> SciFY Not-for-Profit Company <http://www.scify.org>
>
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to