Call for Internship applications in Natural Language Processing 




Title : Study on the accuracy of citations in scientific papers 

Starting date : February 2023 

Application deadline : December 5th, 2022 

Location: LIG laboratory, Grenoble Alps University, France 

Keywords: Natural language processing, Scientific literature, citation accuracy 






Context : 

The NanoBubbles ERC Synergy project’s objective ( 
https://nanobubbles.hypotheses. org ) is to understand how, when and why 
science fails to correct itself. The project focuses on claims made within the 
field of nanobiology. Project members combine approaches from the natural 
sciences, computer science, and the social sciences and humanities (Science and 
Technology Studies) to understand how error correction in science works and 
what obstacles it faces. For this purpose, we aim to trace claims and 
corrections through various channels of scientific communication (journals, 
social media, advertisements, conference programs, etc.) via both qualitative 
and digital methods. 




Internship objectives : 

In scientific papers, citations acknowledge the sources and help the reader to 
find more information about the citation context. Citations are also an 
important indicator ex ploited to identify significant publications in a 
specific scientific field (Arag on 2013). They are used for different purposes, 
e.g. referring to state of the art, to a specific method or result, and they 
reflect how authors frame their work and this diversity impacts future 
academics adoption (Jurgens 2018). 
Recently, there have been numerous research in Natural Language Processing on 
citation analysis in scientific literature. Studies of citation behavior aim at 
understanding how researchers cited a paper in their work. Existing works on 
citation analysis deal with determining citation sentiment (Liu 2017, Athar 
2011), finding out citation function (Yu 2020, Pride 2019, Bakhti 2018) and 
identifying criti cal citation contexts (Te 2022). Nevertheless, studies that 
evaluate the accuracy of citations are scarce. 
Studies on the accuracy of citations in various scientific disciplines 
demonstrate an error rate of 25%-54% (Jergas 2015, Siebers 2000, Kristof 1997, 
Key 1977). These errors alter the original content and meaning of the cited 
paper. They can range from minor to major errors in citation accuracy. Several 
studies describe various issues that may arise when citing original research 
done by others. 
For example, in the following sentence “ it has been shown that bubblegum is 
much more pink than flamingo while running very fast [Einstein A., 1916] ”: 

    * “ [Einstein A., 1916] ” represents the "citation" 
    * The citation refers to the following scientific paper “ Einstein, A. 
(1916 (translation 1920)), Relativity: The Special and General Theory ” 
    * The Einstein’s paper represents the cited paper 
    * The cited paper is not correlated with the meaning of the sentence, i.e. 
there is no relation between the colors and the relativity notion. 



The aim of this internship is to assess the content of both cited and citing 
papers in scientific literature, i.e. study the correlation between the 
citation and its context in the citing paper in order 
to identify miss-citations. 
The intern tasks would then be to (1) test and compare unsupervised NLP methods 
and pre-trained embedding models (SciBert, BioBert, etc. ) in order to measure 
the accuracy of citations using available datasets, and to (2) provide project 
members with a set of reliable tools. 




Skills : 

    * Being enrolled in a Master in Natural Language Processing, computer 
science or data science. 
    * Good programming skills in Python, experience with natural language 
processing tools and frameworks, knowledge of machine learning methods and deep 
learning technics. 
    * Ability to communicate and write in English is a plus 



Scientific environment : 

The work will be conducted within the Sigma team of the LIG labora tory 
(http://sigma.imag.fr). The recruited person will be welcomed within the team 
which offer a stimulating, multinational and pleasant working environment. 




Instructions for applying : 

Applications must contain a CV + letter/message of motivation + mas ter grades 
+ letter(s) of recommendation (or names for potential letters), and be 
addressed to Cyril Labbé ([email protected]) and Amira Barhoumi 
([email protected]). Applica tions will be considered on 
the fly. It is therefore advisable to apply as soon as possible. 




References : 

    * (Arag on 2013) Arag on M. A measure for the impact of research. 
Scientific reports. 2013;3(1):1–5. 
    * (Jurgens 2018) Jurgens D, Kumar S, Hoover R, Mc-Farland D, Jurafsky D. 
Measuring the Evo lution of a Scientific Field through Citation Frames. 
Transactions of the Association for Com putational Linguistics. 2018;6:391–406. 
    * (Jergas 2015) Jergas H, Baethge C. Quotation accuracy in medical journal 
articles-a systematic review and meta-analysis. PeerJ. 2015;3:e1364. 
    * (Kristof 1997) Kristof C. Accuracy of reference citations in five 
entomology journals. Am Ento mol. 1997;43(4):246-251. 
    * (Key 1977) Key JD, Roland CG. Reference accuracy in articles accepted for 
publication in the Archives of Physical Medicine and Rehabilitation. Arch Phys 
Med Rehabil. 1977;58(3):136-137. 
    * (Siebers 2000) Siebers R, Holt S. Accuracy of references in five leading 
medical journals. Lancet. 2000;356(9239):1445. 
    * (Te 2022) Te S, Barhoumi A, Lentschat M, Bordignon F, Labb ? e C, Portet 
F. Citation Context Classification: Critical vs Non-critical. In proceedings of 
the Third Workshop on Scholarly Document Processing. 2022:49-53. 
    * (Liu 2017) Liu H. Sentiment analysis of citations using word2vec. 
2017;CoRR, abs/1704.00177. 
    * (Athar 2011) Athar A. Sentiment analysis of citations using sentence 
structure-based features. In Proceedings of the ACL 2011 Student Session. 
2011:81–87. 
    * (Bakhti 2018) Bakhti K, Niu Z, Yousif A, Nyamawe A. Citation Function 
Classification Based on Ontologies and Convolutional Neural Networks. 
2018:105–115. 
    * (Pride 2019) Pride D, Knoth P, Jozef Harag J. Act: An annotation platform 
for citation typing at scale. In 2019 ACM/IEEE Joint Conference on Digital 
Libraries (JCDL). 2019:329–330. 
    * (Yu 2020) Yu W, Yu M, Zhao T, Jiang M. Identifying referential intention 
with heterogeneous contexts. 2020:962–972. 



_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to