Hi Sawhney If you want to index ALL rdf data with the Entityhub Indexing tool, you can use a "mappings.txt" file with a single line "*".
Given your use case - managing a Entityhub Site with some selected DBpedia entities - you should consider to configure a ManagedSite [1] and using the RESTful CRUD interface to create/update Entities downloaded from DBpedia. The Entityhub Indexing Tool is intended to be used for dataset that do not change often (e.g. a domain taxonomy that is released every few month, a Dataset like dbpedia that is released once every year ...) best Rupert [1] http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite.html On Fri, Jul 19, 2013 at 10:07 AM, Sawhney, Tarandeep Singh <tsawh...@innodata.com> wrote: > Hi Rafa > > Finally we have been able to get it working > > We modified file "namespaceprefix.mappings" with two entries below > > *'dbpedia-owl\thttp://dbpedia.org/ontology/\n* > *'dbpprop\thttp://dbpedia.org/property/\n* > * > * > Also we modified file "mappings.txt" with below two enteries > > dbpedia-owl:* > dbpprop:* > > And then we performed indexing and we were thenable to get indexes created > for additional data we wanted from DBpedia RDF > > We then ran into issue similar to already reported jira issue > https://issues.apache.org/jira/browse/STANBOL-519 > We followed steps to resolve this issue and then everything is working > > Thanks a lot for your prompt responses and giving us your valuable pointers > to isolate the problem area > > best regards, > tarandeep > > > On Thu, Jul 18, 2013 at 10:36 PM, Sawhney, Tarandeep Singh < > tsawh...@innodata.com> wrote: > >> Thanks Rafa >> >> I am querying my reference site through >> http://localhost:9090/entityhub/site/<MyReferencesiteName>/find >> >> and now entered LDPath as below including dbp-ont prefix >> >> @prefix dbp-ont : <http://dbpedia.org/ontology/>; >> name = rdfs:label[@en] :: xsd:string; >> comment = rdfs:comment[@en] :: xsd:string; >> categories = dc:subject :: xsd:anyURI; >> homepage = foaf:homepage :: xsd:anyURI; >> location = fn:concat("[",geo:lat,",",geo:long,"]") :: xsd:string; >> foafname = foaf:name :: xsd:string; >> abstract = dbp-ont:abstract[@en] :: xsd:string; >> type = rdf:type :: xsd:anyURI; >> foundedBy = dbp-ont:foundedBy :: xsd:anyURI; >> >> But still it doesnt get me data i have added for "foundedBy" and >> "abstract" fields >> >> Is this you wanted me to try or you meant something else >> >> I am still not sure where the problem lies :-( whether indexes are >> created or accessing them has an issue. >> >> best regards, >> tarandeep >> >> >> On Thu, Jul 18, 2013 at 8:26 PM, Rafa Haro <rh...@zaizi.com> wrote: >> >>> Hi Tarandeep, >>> >>> Have you included dbp-ont as prefix for http://dbpedia.org/ontology/ in >>> your LDPath program? According to ldpath documentation ( >>> https://code.google.com/p/**ldpath/wiki/PathLanguage<https://code.google.com/p/ldpath/wiki/PathLanguage>) >>> dbp-ont is not included as default prefix, so you might need to start your >>> program with: >>> >>> @prefix dbp-ont : <http://dbpedia.org/ontology/>**; >>> >>> Hope that helps. >>> >>> Cheers, >>> >>> Rafa >>> >>> El 18/07/13 15:24, Sawhney, Tarandeep Singh escribió: >>> >>>> Hi Rafa, >>>> >>>> Thanks for giving the pointer to resolve the issue. We tried below >>>> mentioned step but now the issue seems to be in how we are indexing. >>>> >>>> As you suggested we verified if we have the required indexes for >>>> additional >>>> fields in our reference site. We found out, that indexer had created all >>>> indexes except for fields in "dbpedia-owl<http://dbpedia.** >>>> org/ontology/foundedBy <http://dbpedia.org/ontology/foundedBy>> >>>> *:http://dbpedia.org/ontology/**" *namespace >>>> * >>>> * >>>> For example, Please look at RDF ---> http://dbpedia.org/page/Adidas >>>> >>>> When we query our reference site for entity "Adidas", we are able to see >>>> all indexes except for fields in *dbpedia-owl* namespace >>>> * >>>> >>>> * >>>> Below are the LDPaths we used while querying our reference site. >>>> >>>> * >>>> name = rdfs:label[@en] :: xsd:string; >>>> comment = rdfs:comment[@en] :: xsd:string; >>>> categories = dc:subject :: xsd:anyURI; >>>> homepage = foaf:homepage :: xsd:anyURI; >>>> location = fn:concat("[",geo:lat,",",geo:**long,"]") :: xsd:string; >>>> foafname = foaf:name :: xsd:string; >>>> abstract = dbp-ont:abstract[@en] :: xsd:string; >>>> type = rdf:type :: xsd:anyURI; >>>> foundedBy = dbp-ont:foundedBy :: xsd:anyURI; >>>> * >>>> >>>> We can see data for everything except for "abstract" and "foundedBy" >>>> since >>>> they are in namespace *dbpedia-owl <http://dbpedia.org/ontology/** >>>> foundedBy <http://dbpedia.org/ontology/foundedBy>>* >>>> >>>> >>>> So it means indexes for above two fields were not created by indexer and >>>> therefore we dont have them in the reference site and hence cant see them >>>> when entity is linked >>>> >>>> We have not changed any default settings while running indexer >>>> >>>> Can you please provide further help in order to find out what we are >>>> missing >>>> >>>> best regards >>>> tarandeep >>>> >>>> >>>> On Thu, Jul 18, 2013 at 4:22 PM, Rafa Haro <rh...@zaizi.com> wrote: >>>> >>>> Hi Tarandeep, >>>>> >>>>> Thanks, it's quite more clear now :-). Have you check if the information >>>>> you need (for example dbp-ont:capital) is actually in the index?. You >>>>> can >>>>> check it for example looking for "India" entity directly in your custom >>>>> DBpedia site in the EntityHub. >>>>> >>>>> Regards >>>>> >>>>> El 18/07/13 12:30, Sawhney, Tarandeep Singh escribió: >>>>> >>>>> No problems Rafa, may be i didnt explain with details/clarity. >>>>>> >>>>>> We are using custom ontology to extract custom entities from text and >>>>>> then >>>>>> we want to them to link with DBpedia entities (in local dbpedia >>>>>> reference >>>>>> site). >>>>>> >>>>>> We found dbpedia reference doesnt have enough data that we need, so >>>>>> decided >>>>>> to download additional data for selected entities (related to fashion >>>>>> brands, fashion designers, company names) directly from dbpedia.org >>>>>> >>>>>> We then indexed these individual RDF files and created indexes with new >>>>>> reference site >>>>>> >>>>>> We then did not use DBpedia reference site, instead used our new >>>>>> reference >>>>>> site which has dbpedia data that we need with our new Entityhub linking >>>>>> engine >>>>>> >>>>>> But after we followed steps i mentioned in my earlier email, during >>>>>> enhancement, custom entities are getting de-referenced from my new >>>>>> reference site but i dont see additional data that i needed which >>>>>> exists >>>>>> in >>>>>> local cache. >>>>>> >>>>>> Hope this explains what we are trying to do, please let me know if some >>>>>> more information is required. >>>>>> >>>>>> Best regards >>>>>> tarandeep >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jul 18, 2013 at 3:21 PM, Rafa Haro <rh...@zaizi.com> wrote: >>>>>> >>>>>> Hi Tarandeep, >>>>>> >>>>>>> El 18/07/13 11:18, Sawhney, Tarandeep Singh escribió: >>>>>>> >>>>>>> Hi Rafa >>>>>>> >>>>>>>> Thanks for your response >>>>>>>> >>>>>>>> Yes, we have tried the whole URI of the property ( >>>>>>>> http://dbpedia.org/ontology/********capital<http://dbpedia.org/ontology/******capital> >>>>>>>> <http://dbpedia.**org/ontology/****capital<http://dbpedia.org/ontology/****capital> >>>>>>>> > >>>>>>>> <http://dbpedia.org/****ontology/**capital<http://dbpedia.org/**ontology/**capital> >>>>>>>> <http://**dbpedia.org/ontology/**capital<http://dbpedia.org/ontology/**capital> >>>>>>>> **> >>>>>>>> )<http://dbpedia.org/******ontology/capital<http://dbpedia.org/****ontology/capital> >>>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/**ontology/capital> >>>>>>>> **> >>>>>>>> >>>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/ontology/capital> >>>>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital> >>>>>>>> > >>>>>>>> >>>>>>>> )> >>>>>>>> >>>>>>>> also >>>>>>>> but it didn't help >>>>>>>> >>>>>>>> Yes we are using EntityHub cache to locally store with all the >>>>>>>> additional >>>>>>>> information we pulled from Dbpedia.org >>>>>>>> >>>>>>>> In the documentation provided at >>>>>>>> http://stanbol.apache.org/******docs/trunk/customvocabulary.**** >>>>>>>> **html<http://stanbol.apache.org/****docs/trunk/customvocabulary.****html> >>>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html> >>>>>>>> > >>>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html< >>>>>>>> http://stanbol.apache.**org/docs/trunk/**customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html> >>>>>>>> > >>>>>>>> >>>>>>>> it is mentioned ---> >>>>>>>> >>>>>>>> *Optionally, if your data do use namespaces that are not present in >>>>>>>> >>>>>>>> prefix.cc (or the server used for indexing does not have internet >>>>>>>> connectivity) you can manually define required prefixes by >>>>>>>> creating/using >>>>>>>> the a indexing/config/******namespaceprefix.mappings file >>>>>>>> >>>>>>>> >>>>>>>> * >>>>>>>> * >>>>>>>> >>>>>>>> * >>>>>>>> Can we get some inputs on if some changes to this file are required >>>>>>>> while >>>>>>>> using DBpedia data >>>>>>>> >>>>>>>> This file can be used at 'indexing time' when you use the indexing >>>>>>>> tool >>>>>>>> >>>>>>> for creating the index for the DBpedia site. I have just seen that >>>>>>> dbp-ont >>>>>>> is already included as prefix. What I don't have clear right now is if >>>>>>> you >>>>>>> are generating your own dbpedia index including all the dbpedia >>>>>>> ontology >>>>>>> properties (that should be a enormous index) or if you are generating >>>>>>> an >>>>>>> index each time you need a new entity or even you are trying to >>>>>>> retrieve >>>>>>> the entities from dbpedia in a 'live' way :-). Sorry I'm confused >>>>>>> about >>>>>>> your workflow. >>>>>>> >>>>>>> >>>>>>> Also, looks like we are missing on some configurations in the >>>>>>> overall >>>>>>> >>>>>>>> process, so if dev community can please provide help, it will be much >>>>>>>> appreciated >>>>>>>> >>>>>>>> best regards >>>>>>>> tarandeep >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jul 18, 2013 at 1:38 PM, Rafa Haro <rh...@zaizi.com> wrote: >>>>>>>> >>>>>>>> Hi Tarandeep, >>>>>>>> >>>>>>>> Have you tried using the whole URI of the property ( >>>>>>>>> http://dbpedia.org/ontology/********capital<http://dbpedia.org/ontology/******capital> >>>>>>>>> <http://dbpedia.**org/ontology/****capital<http://dbpedia.org/ontology/****capital> >>>>>>>>> > >>>>>>>>> <http://dbpedia.org/****ontology/**capital<http://dbpedia.org/**ontology/**capital> >>>>>>>>> <http://**dbpedia.org/ontology/**capital<http://dbpedia.org/ontology/**capital> >>>>>>>>> **> >>>>>>>>> )<http://dbpedia.org/******ontology/capital<http://dbpedia.org/****ontology/capital> >>>>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/**ontology/capital> >>>>>>>>> **> >>>>>>>>> >>>>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/ontology/capital> >>>>>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital> >>>>>>>>> > >>>>>>>>> >>>>>>>>> )> >>>>>>>>> >>>>>>>>> ?? >>>>>>>>> >>>>>>>>> Anyway, maybe it is a better idea to change your workflow, because I >>>>>>>>> suppose that your example about "India" entity is something that >>>>>>>>> could >>>>>>>>> happen to you with more entities because the default DBpedia site in >>>>>>>>> Stanbol doesn't contain information about dbp-ont properties. I >>>>>>>>> would >>>>>>>>> suggest to use EntityHub cache to locally store entities with all >>>>>>>>> the >>>>>>>>> information you need directly from DBpedia. So, maybe you can try to >>>>>>>>> directly retrieve the entities from any DBpedia endpoint, store >>>>>>>>> them in >>>>>>>>> the >>>>>>>>> EntityHub cache to ensure that you can use it later as your >>>>>>>>> convenience. >>>>>>>>> Maybe the workflow could be the following: >>>>>>>>> >>>>>>>>> 1. Enhance a document using Stanbol DBpedia site for linking. >>>>>>>>> 2. For each extracted entity: >>>>>>>>> 2.1. If the entity is already store in the EntityHub, >>>>>>>>> get it >>>>>>>>> using >>>>>>>>> LDPath for dereferencing. >>>>>>>>> 2.2. If not, retrieve the entity from DBpedia endpoint >>>>>>>>> as RDF >>>>>>>>> data >>>>>>>>> and store it in the EntityHub. Then retrieve it >>>>>>>>> >>>>>>>>> I would day that this is currently possible in Stanbol, but maybe >>>>>>>>> someone >>>>>>>>> else in the list can give you more light with the issue. >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> >>>>>>>>> El 18/07/13 09:48, Sawhney, Tarandeep Singh escribió: >>>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> In the stanbol local cache we have limited triples in dbpedia >>>>>>>>>> reference >>>>>>>>>> site. >>>>>>>>>> >>>>>>>>>> We have a need to get more triples for entities which are present >>>>>>>>>> in >>>>>>>>>> dbpedia >>>>>>>>>> reference site. For example entity "India" has limited triples, so >>>>>>>>>> when >>>>>>>>>> we >>>>>>>>>> enhance text which has india, it gets us only information which is >>>>>>>>>> there >>>>>>>>>> in >>>>>>>>>> dbpedia reference site. >>>>>>>>>> >>>>>>>>>> We have followed below mentioned steps to add more RDF data for >>>>>>>>>> entity >>>>>>>>>> "India" by creating our own reference site. >>>>>>>>>> >>>>>>>>>> 1 - Downloaded rdf-data for 'India' from [1]. >>>>>>>>>> >>>>>>>>>> 2 - Generated indexes for this rdf-data as suggested in article [2] >>>>>>>>>> with >>>>>>>>>> *Demo >>>>>>>>>> *as a reference site name. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 3- Initialized indexes within stanbol instance as per [2]. >>>>>>>>>> >>>>>>>>>> 4- Configured new EntityLinking engine, '*demoLinkingEngine*' with >>>>>>>>>> *Demo >>>>>>>>>> *as >>>>>>>>>> >>>>>>>>>> referenced site as per [3]. >>>>>>>>>> I have added *dbp-ont:capital *in *'"Fields used for >>>>>>>>>> derefrencing* >>>>>>>>>> "option. >>>>>>>>>> >>>>>>>>>> 5- Configured new weighted chain (*demoChain*). >>>>>>>>>> >>>>>>>>>> 6 - Now i am trying to enhance *"India is a country."* I am getting >>>>>>>>>> India >>>>>>>>>> >>>>>>>>>> as de-reference entity but unable to get any new information >>>>>>>>>> related >>>>>>>>>> to *dbp-ont:capital >>>>>>>>>> *which exists in my new reference site *Demo, *which in this case >>>>>>>>>> should >>>>>>>>>> >>>>>>>>>> give us URI value of "New Delhi" >>>>>>>>>> >>>>>>>>>> [1] http://dbpedia.org/page/India >>>>>>>>>> [2] http://stanbol.apache.org/********docs/trunk/customvocabulary. >>>>>>>>>> ******<http://stanbol.apache.org/******docs/trunk/customvocabulary.****> >>>>>>>>>> **html<http://stanbol.apache.**org/****docs/trunk/** >>>>>>>>>> customvocabulary.****html<http://stanbol.apache.org/****docs/trunk/customvocabulary.****html> >>>>>>>>>> > >>>>>>>>>> <http://stanbol.apache.**org/****docs/trunk/**** >>>>>>>>>> customvocabulary.**html<http:/**/stanbol.apache.org/**docs/** >>>>>>>>>> trunk/customvocabulary.**html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html> >>>>>>>>>> > >>>>>>>>>> <http://stanbol.apache.**org/****docs/trunk/**** >>>>>>>>>> customvocabulary.**html< >>>>>>>>>> http://stanbol.apache.**org/**docs/trunk/**customvocabulary.** >>>>>>>>>> html<http://stanbol.apache.**org/docs/trunk/** >>>>>>>>>> customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html> >>>>>>>>>> > >>>>>>>>>> [3] >>>>>>>>>> http://stanbol.apache.org/********docs/trunk/components/****<http://stanbol.apache.org/******docs/trunk/components/****> >>>>>>>>>> <h**ttp://stanbol.apache.org/******docs/trunk/components/****<http://stanbol.apache.org/****docs/trunk/components/****> >>>>>>>>>> > >>>>>>>>>> enhancer/engines/**<http://**s**tanbol.apache.org/**docs/**<http://stanbol.apache.org/**docs/**> >>>>>>>>>> trunk/components/**enhancer/****engines/**<http://stanbol.** >>>>>>>>>> apache.org/**docs/trunk/**components/**enhancer/engines/****<http://stanbol.apache.org/**docs/trunk/components/**enhancer/engines/**> >>>>>>>>>> > >>>>>>>>>> entityhublinking<http://****stan**bol.apache.org/docs/**trunk/**<http://bol.apache.org/docs/trunk/**> >>>>>>>>>> <http://stanbol.**apache.org/docs/trunk/**<http://stanbol.apache.org/docs/trunk/**> >>>>>>>>>> > >>>>>>>>>> components/enhancer/engines/******entityhublinking<http://** >>>>>>>>>> stanbol.apache.org/docs/trunk/****components/enhancer/engines/****<http://stanbol.apache.org/docs/trunk/**components/enhancer/engines/**> >>>>>>>>>> entityhublinking<http://**stanbol.apache.org/docs/trunk/** >>>>>>>>>> components/enhancer/engines/**entityhublinking<http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking> >>>>>>>>>> > >>>>>>>>>> >>>>>>>>>> Can you please let me know if i am doing something wrong here or >>>>>>>>>> missing >>>>>>>>>> some configurations. >>>>>>>>>> Please let me know in case you need some more information on how we >>>>>>>>>> are >>>>>>>>>> trying to do it >>>>>>>>>> >>>>>>>>>> best regards >>>>>>>>>> tarandeep >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> ------------------------------ >>>>>>>>> This message should be regarded as confidential. If you have >>>>>>>>> received >>>>>>>>> this >>>>>>>>> email in error please notify the sender and destroy it immediately. >>>>>>>>> Statements of intent shall only become binding when confirmed in >>>>>>>>> hard >>>>>>>>> copy >>>>>>>>> by an authorised signatory. >>>>>>>>> >>>>>>>>> Zaizi Ltd is registered in England and Wales with the registration >>>>>>>>> number >>>>>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush >>>>>>>>> Road, >>>>>>>>> London W6 7AN. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>> ------------------------------ >>>>>>> This message should be regarded as confidential. If you have received >>>>>>> this >>>>>>> email in error please notify the sender and destroy it immediately. >>>>>>> Statements of intent shall only become binding when confirmed in hard >>>>>>> copy >>>>>>> by an authorised signatory. >>>>>>> >>>>>>> Zaizi Ltd is registered in England and Wales with the registration >>>>>>> number >>>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush >>>>>>> Road, >>>>>>> London W6 7AN. >>>>>>> >>>>>>> >>>>>>> -- >>>>> >>>>> ------------------------------ >>>>> This message should be regarded as confidential. If you have received >>>>> this >>>>> email in error please notify the sender and destroy it immediately. >>>>> Statements of intent shall only become binding when confirmed in hard >>>>> copy >>>>> by an authorised signatory. >>>>> >>>>> Zaizi Ltd is registered in England and Wales with the registration >>>>> number >>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, >>>>> London W6 7AN. >>>>> >>>>> >>> >>> -- >>> >>> ------------------------------ >>> This message should be regarded as confidential. If you have received >>> this email in error please notify the sender and destroy it immediately. >>> Statements of intent shall only become binding when confirmed in hard copy >>> by an authorised signatory. >>> >>> Zaizi Ltd is registered in England and Wales with the registration number >>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, >>> London W6 7AN. >>> >> >> > > -- > > "This e-mail and any attachments transmitted with it are for the sole use > of the intended recipient(s) and may contain confidential , proprietary or > privileged information. If you are not the intended recipient, please > contact the sender by reply e-mail and destroy all copies of the original > message. Any unauthorized review, use, disclosure, dissemination, > forwarding, printing or copying of this e-mail or any action taken in > reliance on this e-mail is strictly prohibited and may be unlawful." -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen