[in plain text]

School of Computing and Communications, Lancaster University 
Salary:   £29,619 to £34,308 
Closing Date:   Friday 30 September 2022
Interview Date:   Wednesday 19 October 2022 
Reference:  0978-22
Post URL: https://hr-jobs.lancs.ac.uk/Vacancy.aspx?ref=0978-22

Research Associate – Natural Language Processing

Salary: Grade 6 £29,619 to £34,308

The School of Computing and Communications (SCC) within Lancaster University’s 
Faculty of Science and Technology, is seeking to appoint a Research Associate 
(RA) to work on research project on Natural Language Processing (NLP) for 
Canadian Annual Report Extraction (CARE) project. CARE is funded by the 
Canadian Mitacs (https://www.mitacs.ca/) and HEC Montreal 
(https://www.hec.ca/en/). 

 Working together with project partners at HEC Montreal (led by Dr Kim 
Trottier), and Representatives from Chartered Professional Accountants (CPA) of 
Canada, the RA will develop novel NLP Python tool with techniques to 
automatically detect structure and extract from PDF annual reports, assess 
readability and sentiment. The post is based in Lancaster, UK.

 With CARE, users can upload the PDF of an annual report, run the program, and 
receive an output consisting of a set of disaggregated text files, one for each 
section of the annual report. These text files can be useful to investors, 
regulators, and researchers for performing text analysis. For example, an 
investor may want to examine a specific section that is relevant to their 
analysis, such as the chairman’s message or the auditor’s report. A regulator 
may be interested in assessing climate risk disclosure from the MD&A of all Oil 
and Gas companies. A researcher exploring goodwill impairment could extract and 
analyse only the notes on impairments for the firms in their sample. The gains 
from using CARE are not only in terms of transforming PDF files to text format, 
but also through an ability to process a large sample of annual reports all at 
once.

 While users of CARE can develop and apply their own Natural Language 
Processing (NLP) algorithms to the disaggregated text files, CARE provides some 
of the more common metrics for quick analysis. The following information is 
produced for each text file: readability scores, tone measures, causal language 
metrics, and word-frequency counts, where the latter (frequency counts) can be 
tailored to include key words that are relevant to the user.

 In addition to being efficient, capital markets should strive to provide a 
level playing field. Institutional investors are able to create their 
proprietary, in-house programs that extract and analyse the narrative portion 
of annual reports. CARE brings this functionality to the rest of market 
participants, through a simple, open-source tool. By supporting the development 
of CARE, Canadian regulators are positioning themselves to be early-movers in 
making annual report narratives accessible to a growing set of digital users 
among their constituents.

 The RA will be part of an internationally recognised centre of expertise for 
corpus-based natural language processing (UCREL), and will work directly with 
Dr Mo El-Haj in SCC at Lancaster University and Dr Kim Trottier in the 
department of Accounting at HEC Montreal. For more details, please see the 
associated job description and person specification for this position. 
Potential candidates can also make informal enquiries to Dr Mo El-Haj 
([email protected]) and Dr Kim Trottier ([email protected]). 

 This is a 50% part-time position expected to start in November 2022, and the 
RA will join on an indefinite contract, however the role remains contingent on 
external funding, which for this position which for this position ends 30th 
October 2023.

Lancaster University are committed to family-friendly and flexible working 
policies on an individual basis. The School is also an Athena Swan Bronze Award 
holder, driving good employment practice and initiatives to address gender 
inequalities in Computing higher education and research.  

We welcome applications from people in all diversity groups.



Best wishes,

Mo


————————————————————-
Dr Mo El-Haj
 
NLP Lecturer,
School of Computing and Communications,
Lancaster University, UK
https://www.lancaster.ac.uk/staff/elhaj  
@DocElhaj
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to