Hi Harish On Thu, May 16, 2013 at 8:06 PM, harish suvarna <[email protected]> wrote: > Rupert, > 1. dbpedia ontology has dbpedia-owl properties also. In order to index > these, > we need to uncomment #dbp-prop* line in mappings.txt. Is my understanding > Right?
No you need to specify 'dbp-ont:*'. Enabling 'dbp-prop*' would index all 'http://dbpedia.org/property/' properties. > Looks like dbpedia-owl:birthDate IS dbp-ont:birthDate in mappings.txt. Stanbol uses the dbp-ont prefix for the dbpedia ontology namespace. But 'dbo' should also work as it is the one defined by prefix.cc. > > dbpedia-owl:birthDate <http://dbpedia.org/ontology/birthDate> > > - 1961-08-04 (xsd:date) > > dbpedia-owl:birthPlace <http://dbpedia.org/ontology/birthPlace> > > - dbpedia:Honolulu <http://dbpedia.org/resource/Honolulu> > - dbpedia:United_States <http://dbpedia.org/resource/United_States> > - dbpedia:Hawaii <http://dbpedia.org/resource/Hawaii> > > > > > 2. mappings.txt already has > dbp-ont* line uncommented. It then indexes birthDate etc. It is also > specified that birthdate is a dateTime object and populationTotal is a long > integer. Are these must to specify the data types? if we dont specify, what > happens? > True. Looks like the default config has this enabled. But I have never used it to build an actual index. So if you use the default config you should get the dbpedia-owl information (if you also imported the according RDF dump files). > # --- dbpedia specific > # the "dbp-ont" defines knowledge mapped to the DBPedia ontology > dbp-ont:* > dbp-ont:birthDate | d=xsd:dateTime > dbp-ont:populationTotal | d=xsd:long > > > 3. what is the difference between dbp-ont* and dbp-prop* in mappings.txt? > As stated above this are two different namespaces. "dbp-ont" is used for Infobox properties that are semantically mapped. "dbp-prop" is used for extracted key/values pairs that are not aligned with the ontology. You should find more information on the DBpedia webpage. > 4. How do I tell to index some dbpedia-owl properties and not some? For ex, > the dbpedia-owl:abstract has lot of text. I may want to not index it. You can exclude properties by adding a '!' at the first position (e.g. !dbp-ont:abstract). The documentation of the mapping language can be found at [1] (I should definitely move this over to the Stanbol Webpage). best Rupert [1] http://wiki.iks-project.eu/index.php/RepresentationMapping > > > > > > > > > > > > > > > On Wed, May 15, 2013 at 5:53 AM, Rupert Westenthaler < > [email protected]> wrote: > >> On Wed, May 15, 2013 at 12:20 PM, Manish Aggarwal <[email protected]> >> wrote: >> > Hi Rupert, >> > >> > Thanks for your response. I will look into this. >> > >> > Meanwhile I was trying the FieldQuery option available with stanbol ... >> > >> > I have created the following FieldQuery >> > >> > { >> > "selected": [ >> > "http:\/\/dbpedia.org\/ontology\/foundingYear", >> > "http:\/\/dbpedia.org\/ontology\/foundingDate", >> > "http:\/\/dbpedia.org\/ontology\/keyPerson", >> > "http:\/\/dbpedia.org\/ontology\/industry"], >> > "offset": "0", >> > "limit": "10", >> > "constraints": [{ >> > "type": "reference", >> > "field": "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type", >> > "value": "http:\/\/dbpedia.org\/ontology\/Organisation", >> > }] >> > } >> > >> > and running the command: >> > >> > curl -X POST -H "Content-Type:application/json" --data "@fieldQuery.json" >> > http://localhost:8080/entityhub/site/dbpedia/query >> > >> > I am using the dbpedia.solrindex (version 3.8) (from link >> > http://dev.iks-project.eu/downloads/stanbol-indices/dbpedia-3.8/) >> > >> > >> > But I am not getting any of the attributes like foundingYear, industry >> etc >> > in the result. Is this because that the dbpedia.solrindex doesn't >> contains >> > these attributes? >> >> The reason is that those attributes are not included in the Index. >> Adding all dbpedia-owl namespace properties to the index would >> probably increase the index size for more than 20GByte (The last time >> I have done this was for dbpedia 3.6). If you need them in your index >> you will need to build your own DBpedia index with an adapted >> configuration (you will need to add those properties or the whole >> namespace to the 'indexing/config/mappings.txt' file) >> >> > I think this is also going to follow the same workflow that you had told >> in >> > the last mail through sample code? >> >> Yes, but my code assumed that you do already know the ID of Entities. >> The above would search for Entities that are of type >> dbpedia-owl:Organisation. The according Java code would look like >> follows >> >> Site site; //the site (as obtained by the SiteManager) >> FieldQuery query = site.getQueryFactory().createFieldQuery(); >> query.setConstraint(RDF_TYPE, new >> ReferenceConstraint(DBPEDIA_ORGANIZATION)); >> query.addSelectedField(RDF_TYPE); >> query.addSelectedField(DBPEDIA_INDUSTRY); >> query.addSelectedField(DBPEDIA_KEY_PERSON); >> query.setLimit(100); >> //and all the others >> QueryResultList<Representation> results = solrYard.find(query); >> for(Representation result : results){ >> //process the results >> } >> >> best >> Rupert >> >> > Regards, >> > Manish >> > >> > >> > >> > >> > On Wed, May 15, 2013 at 12:46 PM, Rupert Westenthaler < >> > [email protected]> wrote: >> > >> >> Hi Manish, >> >> >> >> Yes, typically this is done by using the Stanbol Entiyhub in >> >> combination with a referenced site for dbpedia. >> >> >> >> (1) Configuring the dbpedia ReferencedSite >> >> ------ >> >> >> >> But NOTE that all pre-build dbpedia indexes for the Entityhub do not >> >> include the dbpedia-owl:industry property. Meaning that you will need >> >> to create your own DBpedia index that does include this property by >> >> using the Entityhub Indexing Tool for dbpedia. >> >> >> >> As alternative you could also configure a Referenced Site for dbpedia >> >> that directly accesses the dbpedia SPARQL endpoint and stores >> >> retrieved entities in a local cache. For that you can install the >> >> >> >> <groupId>org.apache.stanbol</groupId> >> >> >> <artifactId>org.apache.stanbol.data.sites.dbpedia.cached</artifactId> >> >> <version>1.2.0-SNAPSHOT</version> >> >> >> >> NOTE: with revision http://svn.apache.org/r1482702 I changed the name >> >> of the ReferencedSite configured by this bundle from 'dbpedia' to >> >> 'dbpedia-cached' so that it does not conflict with the default >> >> 'dbpedia' ReferencedSite that does use a full local index. >> >> >> >> (2) Access the information of the dbpedia ReferencedSite >> >> --------- >> >> >> >> To get the required information you will need to use the Entityhub >> >> API. See the code samples below. >> >> >> >> import org.apache.stanbol.entityhub.servicesapi.site.SiteManager >> >> >> >> //inject a reference to the Entityhub SiteManager >> >> @Reference >> >> SiteManager siteManager >> >> >> >> //siteName is the name of the Referenced Site (most likely >> >> 'dbpedia' or 'dbpedia-cached') >> >> private someMethod(String siteName, String entityId){ >> >> >> >> Site site = siteManager.getSite(siteName); >> >> //check for not null (site with that name is not active) >> >> Entity entity = site.getEntity(entityId); >> >> Representation data = entity.getRepresentation(); >> >> //get the RDF type values of the Entity >> >> Iterator<Reference> types = data.getReferences(RDF_TYPE); >> >> //iterate over the types and check for dbp-ont:Organisation >> >> (the full URI) >> >> >> >> Iterator<Reference> industryValues = >> >> data.gerReferences(DBPEDIA_INDUSTRY); >> >> //iterate over the values for the industry values >> >> >> >> } >> >> >> >> If you prefer to use the Clerezza RDF API instead of the API of >> >> Representation you can also convert the Representation to RDF >> >> >> >> import org.apache.stanbol.entityhub.model.clerezza.RdfValueFactory; >> >> import import org.apache.clerezza.rdf.utils.GraphNode; >> >> >> >> private static RdfValueFactory vf = RdfValueFactory.getInstance(); >> >> >> >> private GraphNode convertRepresentationToRdf(Representation r){ >> >> return new GraphNode(new UriRef(r.getId()), >> >> vf.toRdfRepresentation(rep).getRdfGraph(); >> >> } >> >> >> >> The above code would create a new MGraph for each Representation. If >> >> you want to add multiple Representation to the same Graph you need to >> >> use >> >> >> >> import org.apache.stanbol.commons.indexedgraph.IndexedMGraph; >> >> >> >> private someMethod(...) { >> >> >> >> MGrpah graph = new IndexedMGrpah(); //a fast in-memory graph >> >> implementation >> >> RdfValueFactory vf = new RdfValueFactory(graph); >> >> >> >> //now use this RdfValueFactory to convert all Representations >> >> GraphNode entityRdfData = >> convertRepresentationToRdf(vf,representation) >> >> >> >> } >> >> >> >> private GraphNode convertRepresentationToRdf(RdfValueFactory vf, >> >> Representation r){ >> >> return new GraphNode(new UriRef(r.getId()), >> >> vf.toRdfRepresentation(rep).getRdfGraph(); >> >> } >> >> >> >> hope this helps >> >> >> >> best >> >> Rupert >> >> >> >> On Tue, May 14, 2013 at 2:29 PM, Manish Aggarwal <[email protected]> >> >> wrote: >> >> > Hi, >> >> > >> >> > Is it possible to query dbpedia database from a custom enhancement >> engine >> >> > and find out more about a keyword. For example, if the keyword >> classify >> >> > under organization (dbp-ont:Organisation), I will be interested to >> know >> >> > what is the industry (dbpedia-owl:industry) this keyword belongs to. >> >> > In a custom enhancement engine how can I get the required information? >> >> > >> >> > Regards, >> >> > Manish >> >> >> >> >> >> >> >> -- >> >> | Rupert Westenthaler [email protected] >> >> | Bodenlehenstraße 11 ++43-699-11108907 >> >> | A-5500 Bischofshofen >> >> >> >> >> >> -- >> | Rupert Westenthaler [email protected] >> | Bodenlehenstraße 11 ++43-699-11108907 >> | A-5500 Bischofshofen >> > > > > -- > Thanks > Harish -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
