The Text and Data Mining Unit of the European Commission’s Joint Research
Centre (JRC) is looking to fill *one traineeship position* in the field of:





*            Multilingual Named Entity Recognition and Classification.*





If you are interested, please follow the instructions provided at
http://recruitment.jrc.ec.europa.eu/?type=TR&site=IPR (*Code:
2017-IPR-I-000-8664*).

Deadline: *20/06/2017*



Job description:                  http://recruitment.jrc.ec.euro
pa.eu/showprj.php?type=T&id=5087

Traineeship rules:               https://ec.europa.eu/jrc/sites
/jrcsh/files/jrc_trainee_rules_en.pdf

Conditions/eligibility:           https://ec.europa.eu/jrc/en/wo
rking-with-us/jobs/temporary-positions/jrc-trainees

Starting date:                   1 September or soon thereafter

Duration:                          5 months

Allowance:                        Approximately 1000 Euro per month.

The JRC-

Text and Data Mining Unit : https://ec.europa.eu/jrc/en/te
xt-mining-and-analysis

JRC-EMM products:             http://emm.newsbrief.eu/overview.html

JRC-EMM Publications:         http://optima.jrc.it/Resources
/JRC-EMM_Publications.pdf









*DESCRIPTION OF THE FORESEEN ACTIVITY:*



The JRC’s *Europe Media Monitor *(EMM) team carries out research and
development in the field of highly multilingual text mining (Language
Technology; Computational Linguistics) for the purposes of media
monitoring. EMM gathers an average of 300,000 online news articles per day
in over 70 languages and analyses them to help its large international user
community understand and use this enormous amount of media
information. The *Europe
Media Monitor *EMM is publicly accessible and widely used. The EMM team has
produced over 200 international peer-reviewed publications
<http://optima.jrc.it/Resources/JRC-EMM_Publications.pdf>. The team has
also produced and distributes a number of highly multilingual Language
Technology resources <https://ec.europa.eu/jrc/en/language-technologies>.



The *Text and Data Mining Unit *(I3) of the European Commission’s *Joint
Research Centre *(JRC) in Ispra, Italy, is looking for a trainee to support
the JRC’s *Europe Media Monitor *(EMM) team in its effort to improve its
Named Entity Recognition and Classification (NERC) tools, especially for
multi-word entities such as organisation and event names. EMM gathers and
analyses reports from traditional and social media in dozens of languages
by clustering related news items; categorising them; extracting information
such as entities (persons, organisations, locations), events (who did what
to whom, where and when), quotations by and about people; identifying
sentiment; as well as linking related news clusters over time and across
languages. Methods used are mostly hybrid: machine learning tools are used
to gather evidence, learn vocabulary and rules, but the results are usually
controlled and optimised through human intervention. EMM is used by
European Institutions, by national authorities in EU Member States, by
international organisations and by the public. The public EMM applications
NewsBrief
<https://ec.europa.eu/jrc/en/scientific-tool/europe-media-monitor-newsbrief>,
NewsExplorer
<https://ec.europa.eu/jrc/en/scientific-tool/europe-media-monitor-newsexplorer>
and
MedISys
<https://ec.europa.eu/jrc/en/scientific-tool/medical-information-system> can
be accessed freely by the general public. EMM is part of the JRC’s
Competence Centre on Text Mining and Analysis.



As of now, the EMM team has accumulated several very large independent sets
of multi-word entities and their monolingual and multilingual name
variants. Some of the entities are classified according to an entity type
hierarchy, while others are not. *The successful trainee will help to
improve the current tools to recognise multiword entities, classify
entities, merge the various lists of entities and their variants into one
single repository, and integrate the NERC tools with the EMM processing
chain. *The trainee is also expected to contribute to writing a scientific
publication on the work carried out.





*REQUIRED QUALIFICATIONS:*



*Essential:*

·         a degree (or an almost completed degree) in computational
linguistics, computer science or related areas;(Applications from students
currently preparing a thesis for a University degree are eligible. The
thesis should match with the subject of the project call).

·         Java programming skills;

·         good working knowledge of English. (B2 level)

*Advantage:*

·         knowledge of further foreign languages;

·         proven advanced programming skills, especially in Java;

·         good knowledge of Language Technology related tools and methods;

·         proven ability to work independently and as part of a team.









-- 
*Guillaume Jacquet*

*Ralf Steinberger*

Text and Data Mining Unit
Joint Research Centre
European Commission
_______________________________________________
Mt-list site list
[email protected]
http://lists.eamt.org/mailman/listinfo/mt-list

Reply via email to