Dear RANLP organizers

I looked through your program and have a few questions:

i. I noticed there is a parallel session for "Sentence-level Representation
and Analysis". Would you mind please letting me (or us all on this list)
know why "sentence(s)" would be relevant and necessary in computing? Does
the term "sentence" refer to line (as delimited by line breaks)? Or are you
segmenting via some heuristics, e.g. with some punctuation indicators, for
each dataset --- if so, would there not be a concern on grounds of fairness
and diversity as well as on robustness, sufficiency, and applicability?
[I also understand that in NLP there has been (an undue) grammarian
influence, leading to the false assumption that processing/evaluation based
on "sentences" would be necessary (when one can do so based on a wider span
of text instead).
As neither machines nor humans need the concept of "sentence" to
produce/understand language, I wonder: wouldn't the restriction to
"sentence"-level analyses lead to overfitting?]

ii. Re "multilinguality": would that be a session in which work would focus
on computationally relevant topics such as character encoding (see
https://openreview.net/forum?id=-llS6TiOew)? Or would it further perpetuate
the adverse effects of grammatical teachings and concepts?

iii. Re Isabelle's keynote:
I hope her social scientific studies would be ones leveraging comprehensive
data (i.e. not ones with selection bias) and rigorous statistical testing.
There are many ethical aspects in the experimentation aspect(s) in the
social sciences, including but not limited to hypothesis formulation, that
one should be very careful about. Otherwise, the study could be interpreted
as sentiment/identity manipulation (e.g. your formulation with "origin"
[1]).
There has been some work in the CL/NLP space that touches on identity
politics in ways that may not be necessary/inappropriate, hence my remark
here.

iv. Re Ed's keynote on "neuro-symbolic approaches":
I have previously replied to Alexander Koller's call on such for his DFG
project on 16Jul2023 on this mailing list, as follows:
"As we know, neural models are statistical models in nature. Symbolic
representations could create/reinforce unnecessary circularity. The
symbolic representations could obfuscate the precision needed.
The findings of Mielke et al. (2019) <https://arxiv.org/abs/1906.04726> and Wan
(2022) <https://openreview.net/forum?id=-llS6TiOew> were a painful/bitter
lesson to many. I'd hate to see another generation of students being
misled."

(To Ed: I suspect that you are already familiar with these works. So I
wonder what "symbolic approaches" refer to in your case, and whether they
are being applied as a post-processing (e.g. post ML) strategy. If so, and
if these are based on "grammar" etc., please be careful as it is not
necessary for processing. One can post-edit ML-outputted texts according to
some stylistic preferences as part of post-processing heuristics --- but I
have concerns as to how much the employment of such heuristics could get
abused. As you may know, many CL/NLPers might already be too "hooked" on
grammar and textual representations. There are many dependencies to grammar
teaching, so ethical concerns in pedagogy of such need to be considered.)

v. Re Sandra's keynote:
Please see literature mentioned in [iv] above.
Re "[u]sing transformers for hate speech detection tends to give good
results ...": how do these results and domain effects reconcile with data
statistics, even if/when one does not segment text into "words" or
"sentences" (or any categories that grammarians like and many CL/NLPers
used to be "addicted" to)?
It might be a higher bar, but there is work to do to see where such
correspondences (between language phenomena and data statistics, for
example) exist and if they do! (And if not, negative results are also
results! And with any data processing/interpretation, it's information, not
"meaning", that matters.)

vi. Re Efstathios' keynote:
Please see notes above.

As language sciences (e.g. Linguistics) and NLP are still taught at some
universities, i.e. part of publicly accessible education, there is a
general responsibility that one should bear when promoting/hosting events
that would be explicitly/implicitly supporting biases and/or in violation
of scientific integrity.

Thank you for reading, for tolerating my rant here. There has been some
"bad research" (and some miseducation) in the area of CL/NLP. Hence, I
thought to send a reminder (and call for action/correction).

Thanks and best
Ada

[1] "origin": (mother's womb? [Jest... but yes and no.]) How current is
this analysis from a globalized perspective? Do people categorically use
language in one way or another based on some "types" related to "origin"
(whatever that refers to), or more based on contexts and/or habits
(personal and/or group-based, if the latter, what "group identity" is
assumed in the data and in the experiment)?


On Thu, Aug 17, 2023 at 11:17 AM amalhaddad--- via Corpora <
[email protected]> wrote:

> RANLP 2023 Call for Participation
>
> We are pleased to share the programme of the international conference
> ‘Recent Advances in Natural Language Processing’ (RANLP’2023). To view the
> programme, please click here
> https://ranlp.org/ranlp2023/index.php/main-conference-programme/
>
> To register, please visit
> https://ranlp.org/ranlp2023/index.php/fees-registration/
>
> We very much hope to welcoming you at RANLP’2023 in Varna!
> _______________________________________________
> Corpora mailing list -- [email protected]
> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
> To unsubscribe send an email to [email protected]
>
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to