Rupert Westenthaler created STANBOL-1046:
--------------------------------------------
Summary: Create pageId based DBpedia Freebase linker for the
Entiyhub Freebase Indexing Tool
Key: STANBOL-1046
URL: https://issues.apache.org/jira/browse/STANBOL-1046
Project: Stanbol
Issue Type: Bug
Components: Entityhub
Reporter: Rupert Westenthaler
While the Freebase Indexing Tool already supports basic linking between
Freebase topics and DBpedia Entities those links are constructed based on the
local names of the Wikipedia pages what is error prone due to encoding issues.
With STANBOL-1034 [~ninniuz] has pointed out that linking by using the
Wikipedia PageId is superior and that such a linking functionality already
exists for DBpedia [1].
However using this option would require users to import
http://downloads.dbpedia.org/3.8/{language}/page_ids_{language}.nt.bz2
files to the Indexing Source (the Jena TDB holding the Freebase data) or any
other data store that can hold those mappings (also an in-memory representation
would be feasible).
Because of that a mapping based on PageId will be implemented in a custom
EntityProcessor. This Issue covers the implementation of such a processor.
[1] https://github.com/dbpedia/extraction-framework/pull/27
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira