Dear Gabriella Thanks for your clarification.
One does not have to be necessarily "pro" any approach/method, e.g. "pro-ML" or "pro-statistics". One doesn't have to be "anti" either. Important and imperative is to keep a scientific mindset (don't just "believe"), be neutral, fair, and work conscientiously (be honest and transparent in reporting findings and willing to self-correct when a particular direction/method seems wrong --- may the reasons be scientific or ethical). There are some opportunities in re-evaluating much of what has been practiced in the area of language and computing (NLP, digital humanities, or in fact, any applied ML areas) in the past decades, including but not limited to the interaction of ML systems and data statistics. This may apply to your project/initiative as well. One needs to be esp. careful with "textual computing". (But sure, annotators' perspectives can be more explicitly tested/examined, the issue is how that is being tested (consider also whether it is ethical to execute the human testing and what kind of testing would be in question (i.e. experiment operationalization)), and how the data is being annotated, what's being measured and claimed.) There is a need for more clarity in computational tasks. Please report statistics transparently and explicitly. Thanks and best Ada On Tue, Aug 15, 2023 at 3:11 PM Gabriella Lapesa < [email protected]> wrote: > Dear Ada, > > Thanks a lot for giving me the opportunity to clarify these points, which > are very important, and to do so on the public list! > > On 15. Aug 2023, at 13:44, Ada Wan <[email protected]> wrote: > > Dear Gabriella > > I have 2 concerns about your post/project: > > i. I noticed your formulation here in your call "as machine learning > approaches which rely on gold standards which average annotators’ > perspectives are particularly unsuitable for the highly subjective > phenomena tackled in CSS research (e.g., persuasion in online discussions; > harmful communication online; polarization)". I find that a bit > unnecessarily antagonistic towards machine learning (ML). As we know, the > driver of textual processing is data statistics. Statistics (may it refer > to data statistics or statistical methods) is also the science that > underlies much of our computational research in the sciences, including the > social sciences. Are you trying to work on smaller data problems --- > nothing wrong with that, btw, but it could be clearer in the announcement > if that's what you are trying to do? What are you using as gold standard(s) > (may it involve ML or not)? Will you be using ML/statistical > approaches/methods, if not, what methods will you be using for data > science? > > [Why I am replying to all on this list here:] > *I understand that there is sometimes an "anti-ML" and "anti-statistics" > sentiment in the tradition of Computational Linguistics --- it probably > started when grammar/grammarian values were found to not be > portrayable/essential in language data. I just wanted to make sure that > this project does not and would not steer students and practitioners into > an erroneous path of thinking/practice. * > > > I absolutely don’t have an anti-ML and anti-statistics sentiment, the > contrary! I have been and will continue using ML/statistical approaches and > methods. The line of research I have in mind goes in the perspectivist > direction ( refer to https://pdai.info/ for an overview), which aims at > 1. Developing better data collection/distribution strategies which give > credit to annotator perspectives and 2. Developing ML strategies that can > help us make better generalizations out of these data. So I hope this > makes it clear, that we are all pro-ML and pro-statistics. > > > > ii. Perhaps I misunderstood, but how should "persuasion in online > discussions; harmful communication online; polarization" be treated as > "highly subjective phenomena" in the context of statistical computing? > Where / in which direction are you trying to go with "high subjectivity"? > And what are the ethical consequences of naming and modeling "highly > subjective phenomena"? > > > I think that acknowledging the high subjectivity of these phenomena (and > therefore of the annotations we would use to tackle them in a data-driven > approach) gives full credit to the multiple perspectives involved in > dealing with them. I think acknowledging this challenge is a very important > step in the direction of avoiding ethical consequences. > > Again, thanks for pointing this out and giving me/us the occasion to think > about these points! > > Best > Gabriella > > > Thanks in advance for your clarification. > > Best > Ada > > > > > On Tue, Aug 15, 2023 at 9:17 AM Gabriella Lapesa via Corpora < > [email protected]> wrote: > >> Postdoc and PhD position in NLP/CL/CSS at GESIS (Cologne) >> >> The newly established Data Science Methods team led by Gabriella Lapesa >> [2,3] (Leibnitz Institute for Social Sciences GESIS, Cologne [1], >> Computational >> Social Science department [4]) has two positions available from November >> 2023: >> - one postdoctoral researcher (100%, 4 years, with possibility of tenure) >> - one doctoral researcher (75%, 4 years). The PhD project will be >> pursued at the Heinrich Heine University of Düsseldorf (where Gabriella >> Lapesa is a junior professor in Responsible Data Science and Machine >> Learning). >> >> ** The team ** >> >> The Data Science Methods team will contribute to build and mantain the >> GESIS infrastructure for Computational Social Science (CSS) research by >> developing novel methods and making them available, documented, and >> accessible through the GESIS services. The team will focus on fostering >> the interaction between Natural Language Processing and Social Science by >> developing solutions that allow for the integration of multiple >> information sources (e.g., different textual sources for the same debate; >> socio-demographic features of speakers and audiences; integration of >> textual and multimodal data) and address recent challenges in NLP (modeling >> subjective phenomena; low-resource scenarios; identifying and mitigating >> bias). >> >> The team will tackle research questions at the interface between >> computational argumentation and CSS, and target political communication >> from a very broad perspective involving different types of actors >> (citizens, politicians, parties) and discourse contexts (e.g., online >> discussions vs. newspapers). From a methodological perspective, at the core >> of the team's research agenda will be the “learning from disagreements” >> challenge, as machine learning approaches which rely on gold standards >> which average annotators’ perspectives are particularly unsuitable for the >> highly subjective phenomena tackled in CSS research (e.g., persuasion in >> online discussions; harmful communication online; polarization). >> >> ** How to apply ** >> >> The official job announcement with more details about the >> requirements/tasks and the application procedure can be found at the >> following links: >> Postdoctoral researcher (deadline: September 5th): >> https://www.hidden-professionals.de//HPv3.Jobs/Gesis//stellenangebot/33073/1 >> Doctoral researcher (deadline: September 6th): >> https://www.hidden-professionals.de//HPv3.Jobs/Gesis//stellenangebot/33084/1 >> >> >> [1] https://www.gesis.org/en/home >> [2] >> https://www.gesis.org/institut/mitarbeitendenverzeichnis/person/Gabriella.Lapesa >> [3] https://www.ims.uni-stuttgart.de/institut/team/Lapesa/ >> [4] >> https://www.gesis.org/en/institute/departments/computational-social-science >> _______________________________________________ >> Corpora mailing list -- [email protected] >> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ >> To unsubscribe send an email to [email protected] >> > >
_______________________________________________ Corpora mailing list -- [email protected] https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to [email protected]
