Dear Guillaume,

nice work. We are trying to enrich DBpedia with more language resources recently. I tried to understand the JRC License [1], but I am not a lawyer, so I still have these questions:


Q1: perspectively, we would fuse some of the information of your data to DBpedia identifiers and re-distribute them in the releases, is this ok?

Q2: Could you post your links here: https://github.com/dbpedia/links, so we can add them as backlinks? This will increase visibility of your data and we are working on providing a download of links pointing to your data for contributors.

all the best,

Sebastian


[1] http://optima.jrc.it/Resources/LICENCE-EULA_JRC-Names_2011.pdf

On 08.09.2016 15:20, Guillaume Jacquet wrote:
Dear all,

we are pleased to announce a new release of the *JRC-Names* multilingual name resource, containing *more information* and now available as *Linked Data*.


JRC-Names is a *highly multilingual named entity resource* for person and organisation names (called 'entities') developed by the European Commission’s Joint Research Centre (JRC). JRC-Names consists of large lists of names and their many spelling variants (up to hundreds for a single person), including across scripts (Latin, Greek, Arabic, Cyrillic, Japanese, Chinese, etc.). For example, the spellings Jean-Claude Juncker, Jean Cloud Junker, Jean-Claude Juencker, Жан-Клод Юнкер, جان كلود جونكر, Ζαν Κλοντ Γιούνκερ, 让-克洛德•容克, and many others have all been identified as referring to the 12th President of the European Commission.

The resource is the by-product of the Europe Media Monitor (EMM) family of applications, which has been analysing up to 300,000 news reports per day, since 2004. EMM recognises names mentioned in the news in over twenty languages and decides automatically for each newly found name whether it belongs to a new entity or whether it is a spelling variant of a previously known entity. This resource allows EMM users to display news about people or organisations even if their names are spelt differently or if the news articles are written in different languages and scripts.

JRC-Names has been available for download since September 2011, consisting of name variant lists and accompanying software (JRC-Names text version <https://ec.europa.eu/jrc/en/language-technologies/jrc-names>).

The new Linked Data resource <https://data.europa.eu/euodp/en/data/dataset/jrc-names>, accessible through the European Union’s Open Data Portal <http://data.europa.eu/euodp/en/data>, offers more information compared to the previously released resource and tool, including:

  * titles and function names that have been historically found next
    to the person mentions;
  * information about the time period during which name variants and
    their titles were found;
  * various frequency counts;
  * links to other linked datasets such as DBpedia, New York Times
    Open Data and Talk of Europe.


The JRC-Names RDF representation is based on /lemon /(Lexicon Model for Ontologies <https://www.w3.org/community/ontolex/wiki/Final_Model_Specification>), a model developed by the W3C Ontology-Lexica Community group which allows the expression of lexical information relative to ontologies. A detailed description of JRC-Names Linked Data representation is given in the reference paper mentioned below.

Examples of usage of the resource include, among others:

  * entity linking, e.g. to deal with entity surface form variations;
  * cross-lingual linked data-set query and mapping;
  * search query expansion;
  * machine translation;
  * learning of transliteration rules;
  * named entity recognition and disambiguation;
  * cross-lingual document clustering.


This new Linked Data edition is available through a SPARQL <https://data.europa.eu/euodp/en/data/dataset/jrc-names/resource/da30b11d-a07e-45dd-bdb6-5f2ba5835d27> endpoint and via a RDF dump <http://cidportal.jrc.ec.europa.eu/ftp/jrc-opendata/EMM/JRC-Names/LATEST/jrcnames_uri.zip>. It is registered on the datahub.io portal as JRC-Names <https://datahub.io/dataset/jrc-names-ec>. Additional information is available on this page <http://data.europa.eu/euodp/en/data/dataset/jrc-names>of EU Open Data Portal <http://data.europa.eu/euodp/en/data/dataset/jrc-names>.

Examples of queries against the data-set include:

  * Given a person's name, retrieve all of its name variants;
  * Given a person's name, retrieve all of its name variants in a
    certain language;
  * Given a person's name, retrieve all of its titles/function names
    in a certain language;
  * Given a variant and a language, retrieve the corresponding entity;
  * Given a title and a language, retrieve all of the persons with
    this same title.


Reference paper:

Maud Ehrmann, Guillaume Jacquet and Ralf Steinberger (to appear). JRC-Names: Multilingual Entity Name variants and titles as Linked Data <http://www.semantic-web-journal.net/system/files/swj1307.pdf>, Semantic Web Journal (available online since 04/20/2016)


Guillaume Jacquet, Maud Ehrmann, Ralf Steinberger
European Commission
Joint Research Centre
Text and Data Mining Unit
https://ec.europa.eu/jrc/en/language-technologies




--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center
at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt <http://www.w3.org/community/ld4lt>
Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora@uib.no
http://mailman.uib.no/listinfo/corpora

Reply via email to