Hi Rupert, Ok, I'll resend this mail in this thread. Again, out of habit I sent it in the gigantic "Named entities coreference" thread instead.
So, I managed to create a dbpedia index with the yago class information but looking into the yago_types.nt file which assigns yago classes to dbpedia entities I realized that there are no yago class labels present, I just have the class uri like : <http://dbpedia/..something../President1829302/. I also need the class labels so that I can compare them to the noun token's string from the text. I can get the labels from one of the yago downloads here : http://www.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoMultilingualClassLabels.txt. I'll need another yago download file to map the yago wordnet classes to dbpedia uris. That could be done via a script maybe. Once I have the dbpedia_yago_class_uri -> label file is it possible to integrate this data in the dbpedia index and later be able to query the labels from the 'dbpedia' Site? How would that work in the dbpedia indexing process? What should I change in the mappings.txt file? At first glance it seems that the indexing is done based on the incoming_links.txt entity scoring and in my case I don't want to include triples involving the actual entity but triples invloving a property of the entity (its yago class). Other than that, I saw that someone will be working on integrating YAGO as part of Gsoc 2014. So maybe waiting for that is an option too but I don't know what the extent of the integration will be. Thanks, Cristi 2014-04-30 12:04 GMT+03:00 Rupert Westenthaler < rupert.westentha...@gmail.com>: > On Wed, Apr 30, 2014 at 10:37 AM, Cristian Petroaca > <cristian.petro...@gmail.com> wrote: > > Hi All, > > > > I'm currently working on > https://issues.apache.org/jira/browse/STANBOL-1279. > > > > I am using the SiteManager to get a Site with referenceId = "dbpedia" and > > am querying data related to some NERs (querying by NER label and type). > > This works and I do get results from the dbpedia index. > > > > What I want to do is this : > > > > 1. I want to be able to store and get yago class types in the dbpedia > data. > > This data is stored in the yago-types.nt file from the dbpedia 3.9 > > downloads. Is it possible to create a new dbpedia index with the 3.9 > files > > using this script > > > https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/dbpedia/dbpedia-3.8/fetch_data_en_int.sh > > ? > > yep. Just make suer you change > > DBPEDIA=http://downloads.dbpedia.org/3.8 > > to dbpedia 3.9 > > BTW: you can also remove > > #corrects encoding and recompress using gz > bzcat ${filename}.bz2 \ > | sed 's/\\\\/\\u005c\\u005c/g;s/\\\([^u"]\)/\\u005c\1/g' \ > | gzip -c > ${filename}.gz > rm -f ${filename}.bz2 > > as this is no longer necessary. > > > > > 2. I want to access some specific dbpedia properties such as > > dbpedia-owl:locationCity and others. These are already present in the > > mappingbased_properties_en.nt > > file which is in the fetch_data_en_int.sh script but are not in the > > > https://svn.apache.org/repos/asf/stanbol/trunk/entityhub/indexing/dbpedia/src/main/resources/indexing/config/mappings.txt > > file. > > Should I include them there and do a dbpedia index rebuild? > > Exactly. If the size of the created SolrIndex is an issue I recommend > also that you remove properties you do not need. > > > > > I've already described this in the "Named entity coref resolution based > on > > dbpedia" mail thread but I thought of creating a new mail for visibility > > and for not clogging the other thread. > > The old thread is anyways already much to long. Please make sure that > important points and decisions of that thread are also reflected in > the description of STANBOL-1279 > > best > Rupert > > > > > Thanks, > > Cristian > > > > -- > | Rupert Westenthaler rupert.westentha...@gmail.com > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen > | > REDLINK.CO.......................................................................... > | http://redlink.co/ >