Dear all,

we are pleased to announce a new release of the JRC-Names multilingual name 
resource, containing more information and now available as Linked Data.


JRC-Names is a highly multilingual named entity resource for person and 
organisation names (called 'entities') developed by the European Commission’s 
Joint Research Centre (JRC). JRC-Names consists of large lists of names and 
their many spelling variants (up to hundreds for a single person), including 
across scripts (Latin, Greek, Arabic, Cyrillic, Japanese, Chinese, etc.). For 
example, the spellings Jean-Claude Juncker, Jean Cloud Junker, Jean-Claude 
Juencker, Жан-Клод Юнкер, جان كلود جونكر, Ζαν Κλοντ Γιούνκερ, 让-克洛德•容克, and 
many others have all been identified as referring to the 12th President of the 
European Commission.

The resource is the by-product of the Europe Media Monitor (EMM) family of 
applications, which has been analysing up to 300,000 news reports per day, 
since 2004. EMM recognises names mentioned in the news in over twenty languages 
and decides automatically for each newly found name whether it belongs to a new 
entity or whether it is a spelling variant of a previously known entity. This 
resource allows EMM users to display news about people or organisations even if 
their names are spelt differently or if the news articles are written in 
different languages and scripts.

JRC-Names has been available for download since September 2011, consisting of 
name variant lists and accompanying software (JRC-Names text version 
<https://ec.europa.eu/jrc/en/language-technologies/jrc-names> ). 

The new Linked Data resource 
<https://data.europa.eu/euodp/en/data/dataset/jrc-names> , accessible through 
the European Union’s Open Data Portal <http://data.europa.eu/euodp/en/data> , 
offers more information compared to the previously released resource and tool, 
including: 

*       titles and function names that have been historically found next to the 
person mentions; 
*       information about the time period during which name variants and their 
titles were found; 
*       various frequency counts; 
*       links to other linked datasets such as DBpedia, New York Times Open 
Data and Talk of Europe. 

 

The JRC-Names RDF representation is based on lemon (Lexicon Model for 
Ontologies 
<https://www.w3.org/community/ontolex/wiki/Final_Model_Specification> ),  a 
model developed by the W3C Ontology-Lexica Community group which allows the 
expression of lexical information relative to ontologies. A detailed 
description of JRC-Names Linked Data representation is given in the reference 
paper mentioned below.

Examples of usage of the resource include, among others:

*       entity linking, e.g. to deal with entity surface form variations;
*       cross-lingual linked data-set query and mapping;
*       search query expansion;
*       machine translation;
*       learning of transliteration rules;
*       named entity recognition and disambiguation;
*       cross-lingual document clustering.


This new Linked Data edition is available through a SPARQL 
<https://data.europa.eu/euodp/en/data/dataset/jrc-names/resource/da30b11d-a07e-45dd-bdb6-5f2ba5835d27>
  endpoint and via a RDF dump 
<http://cidportal.jrc.ec.europa.eu/ftp/jrc-opendata/EMM/JRC-Names/LATEST/jrcnames_uri.zip>
 .
It is registered on the datahub.io portal as JRC-Names 
<https://datahub.io/dataset/jrc-names-ec> . Additional information is available 
on this page <http://data.europa.eu/euodp/en/data/dataset/jrc-names>  of EU 
Open Data Portal <http://data.europa.eu/euodp/en/data/dataset/jrc-names> .

Examples of queries against the data-set include:

*       Given a person's name, retrieve all of its name variants;
*       Given a person's name, retrieve all of its name variants in a certain 
language;
*       Given a person's name, retrieve all of its titles/function names in a 
certain language;
*       Given a variant and a language, retrieve the corresponding entity;
*       Given a title and a language, retrieve all of the persons with this 
same title.


Reference paper:

Maud Ehrmann, Guillaume Jacquet and Ralf Steinberger (to appear). JRC-Names: 
Multilingual Entity Name variants and titles as Linked Data 
<http://www.semantic-web-journal.net/system/files/swj1307.pdf> , Semantic Web 
Journal (available online since 04/20/2016)


Guillaume Jacquet, Maud Ehrmann, Ralf Steinberger
European Commission
Joint Research Centre
Text and Data Mining Unit
https://ec.europa.eu/jrc/en/language-technologies




 

_______________________________________________
Mt-list site list
Mt-list@eamt.org
http://lists.eamt.org/mailman/listinfo/mt-list

Reply via email to