Hi Sawhney

If you want to index ALL rdf data with the Entityhub Indexing tool,
you can use a "mappings.txt" file with a single line "*".

Given your use case - managing a Entityhub Site with some selected
DBpedia entities - you should consider to configure a ManagedSite [1]
and using the RESTful CRUD interface to create/update Entities
downloaded from DBpedia. The Entityhub Indexing Tool is intended to be
used for dataset that do not change often (e.g. a domain taxonomy that
is released every few month, a Dataset like dbpedia that is released
once every year ...)

best
Rupert


[1] http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite.html

On Fri, Jul 19, 2013 at 10:07 AM, Sawhney, Tarandeep Singh
<tsawh...@innodata.com> wrote:
> Hi Rafa
>
> Finally we have been able to get it working
>
> We modified file "namespaceprefix.mappings" with two entries below
>
> *'dbpedia-owl\thttp://dbpedia.org/ontology/\n*
> *'dbpprop\thttp://dbpedia.org/property/\n*
> *
> *
> Also we modified file "mappings.txt" with below two enteries
>
> dbpedia-owl:*
> dbpprop:*
>
> And then we performed indexing and we were thenable to get indexes created
> for additional data we wanted from DBpedia RDF
>
> We then ran into issue similar to already reported jira issue
> https://issues.apache.org/jira/browse/STANBOL-519
> We followed steps to resolve this issue and then everything is working
>
> Thanks a lot for your prompt responses and giving us your valuable pointers
> to isolate the problem area
>
> best regards,
> tarandeep
>
>
> On Thu, Jul 18, 2013 at 10:36 PM, Sawhney, Tarandeep Singh <
> tsawh...@innodata.com> wrote:
>
>> Thanks Rafa
>>
>> I am querying my reference site through
>> http://localhost:9090/entityhub/site/<MyReferencesiteName>/find
>>
>> and now entered LDPath as below including dbp-ont prefix
>>
>> @prefix dbp-ont : <http://dbpedia.org/ontology/>;
>> name = rdfs:label[@en] :: xsd:string;
>> comment = rdfs:comment[@en] :: xsd:string;
>> categories = dc:subject :: xsd:anyURI;
>> homepage = foaf:homepage :: xsd:anyURI;
>> location = fn:concat("[",geo:lat,",",geo:long,"]") :: xsd:string;
>> foafname = foaf:name :: xsd:string;
>> abstract = dbp-ont:abstract[@en] :: xsd:string;
>> type = rdf:type :: xsd:anyURI;
>> foundedBy = dbp-ont:foundedBy :: xsd:anyURI;
>>
>> But still it doesnt get me data i have added for "foundedBy" and
>> "abstract" fields
>>
>> Is this you wanted me to try or you meant something else
>>
>> I am still not sure where the problem lies :-(  whether indexes are
>> created or accessing them has an issue.
>>
>> best regards,
>> tarandeep
>>
>>
>> On Thu, Jul 18, 2013 at 8:26 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>
>>> Hi Tarandeep,
>>>
>>> Have you included dbp-ont as prefix for http://dbpedia.org/ontology/ in
>>> your LDPath program? According to ldpath documentation (
>>> https://code.google.com/p/**ldpath/wiki/PathLanguage<https://code.google.com/p/ldpath/wiki/PathLanguage>)
>>> dbp-ont is not included as default prefix, so you might need to start your
>>> program with:
>>>
>>> @prefix dbp-ont : <http://dbpedia.org/ontology/>**;
>>>
>>> Hope that helps.
>>>
>>> Cheers,
>>>
>>> Rafa
>>>
>>> El 18/07/13 15:24, Sawhney, Tarandeep Singh escribió:
>>>
>>>> Hi Rafa,
>>>>
>>>> Thanks for giving the pointer to resolve the issue. We tried below
>>>> mentioned step but now the issue seems to be in how we are indexing.
>>>>
>>>> As you suggested we verified if we have the required indexes for
>>>> additional
>>>> fields in our reference site. We found out, that indexer had created all
>>>> indexes except for fields in "dbpedia-owl<http://dbpedia.**
>>>> org/ontology/foundedBy <http://dbpedia.org/ontology/foundedBy>>
>>>> *:http://dbpedia.org/ontology/**"; *namespace
>>>> *
>>>> *
>>>> For example, Please look at RDF ---> http://dbpedia.org/page/Adidas
>>>>
>>>> When we query our reference site for entity "Adidas", we are able to see
>>>> all indexes except for fields in *dbpedia-owl* namespace
>>>> *
>>>>
>>>> *
>>>> Below are the LDPaths we used while querying our reference site.
>>>>
>>>> *
>>>> name = rdfs:label[@en] :: xsd:string;
>>>> comment = rdfs:comment[@en] :: xsd:string;
>>>> categories = dc:subject :: xsd:anyURI;
>>>> homepage = foaf:homepage :: xsd:anyURI;
>>>> location = fn:concat("[",geo:lat,",",geo:**long,"]") :: xsd:string;
>>>> foafname = foaf:name :: xsd:string;
>>>> abstract = dbp-ont:abstract[@en] :: xsd:string;
>>>> type = rdf:type :: xsd:anyURI;
>>>> foundedBy = dbp-ont:foundedBy :: xsd:anyURI;
>>>> *
>>>>
>>>> We can see data for everything except for "abstract" and "foundedBy"
>>>> since
>>>> they are in namespace *dbpedia-owl <http://dbpedia.org/ontology/**
>>>> foundedBy <http://dbpedia.org/ontology/foundedBy>>*
>>>>
>>>>
>>>> So it means indexes for above two fields were not created by indexer and
>>>> therefore we dont have them in the reference site and hence cant see them
>>>> when entity is linked
>>>>
>>>> We have not changed any default settings while running indexer
>>>>
>>>> Can you please provide further help in order to find out what we are
>>>> missing
>>>>
>>>> best regards
>>>> tarandeep
>>>>
>>>>
>>>> On Thu, Jul 18, 2013 at 4:22 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>>
>>>>  Hi Tarandeep,
>>>>>
>>>>> Thanks, it's quite more clear now :-). Have you check if the information
>>>>> you need (for example dbp-ont:capital) is actually in the index?. You
>>>>> can
>>>>> check it for example looking for "India" entity directly in your custom
>>>>> DBpedia site in the EntityHub.
>>>>>
>>>>> Regards
>>>>>
>>>>> El 18/07/13 12:30, Sawhney, Tarandeep Singh escribió:
>>>>>
>>>>>  No problems Rafa, may be i didnt explain with details/clarity.
>>>>>>
>>>>>> We are using custom ontology to extract custom entities from text and
>>>>>> then
>>>>>> we want to them to link with DBpedia entities (in local dbpedia
>>>>>> reference
>>>>>> site).
>>>>>>
>>>>>> We found dbpedia reference doesnt have enough data that we need, so
>>>>>> decided
>>>>>> to download additional data for selected entities (related to fashion
>>>>>> brands, fashion designers, company names) directly from dbpedia.org
>>>>>>
>>>>>> We then indexed these individual RDF files and created indexes with new
>>>>>> reference site
>>>>>>
>>>>>> We then did not use DBpedia reference site, instead used our new
>>>>>> reference
>>>>>> site which has dbpedia data that we need with our new Entityhub linking
>>>>>> engine
>>>>>>
>>>>>> But after we followed steps i mentioned in my earlier email, during
>>>>>> enhancement, custom entities are getting de-referenced from my new
>>>>>> reference site but i dont see additional data that i needed which
>>>>>> exists
>>>>>> in
>>>>>> local cache.
>>>>>>
>>>>>> Hope this explains what we are trying to do, please let me know if some
>>>>>> more information is required.
>>>>>>
>>>>>> Best regards
>>>>>> tarandeep
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 18, 2013 at 3:21 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>>>>
>>>>>>   Hi Tarandeep,
>>>>>>
>>>>>>> El 18/07/13 11:18, Sawhney, Tarandeep Singh escribió:
>>>>>>>
>>>>>>>   Hi Rafa
>>>>>>>
>>>>>>>> Thanks for your response
>>>>>>>>
>>>>>>>> Yes, we have tried the whole URI of the property (
>>>>>>>> http://dbpedia.org/ontology/********capital<http://dbpedia.org/ontology/******capital>
>>>>>>>> <http://dbpedia.**org/ontology/****capital<http://dbpedia.org/ontology/****capital>
>>>>>>>> >
>>>>>>>> <http://dbpedia.org/****ontology/**capital<http://dbpedia.org/**ontology/**capital>
>>>>>>>> <http://**dbpedia.org/ontology/**capital<http://dbpedia.org/ontology/**capital>
>>>>>>>> **>
>>>>>>>> )<http://dbpedia.org/******ontology/capital<http://dbpedia.org/****ontology/capital>
>>>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/**ontology/capital>
>>>>>>>> **>
>>>>>>>>  
>>>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>>>> >
>>>>>>>>
>>>>>>>> )>
>>>>>>>>
>>>>>>>> also
>>>>>>>> but it didn't help
>>>>>>>>
>>>>>>>> Yes we are using EntityHub cache to locally store with all the
>>>>>>>> additional
>>>>>>>> information we pulled from Dbpedia.org
>>>>>>>>
>>>>>>>> In the documentation provided at
>>>>>>>> http://stanbol.apache.org/******docs/trunk/customvocabulary.****
>>>>>>>> **html<http://stanbol.apache.org/****docs/trunk/customvocabulary.****html>
>>>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>>>>>> >
>>>>>>>> <http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**html<
>>>>>>>> http://stanbol.apache.**org/docs/trunk/**customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>>>>>> >
>>>>>>>>
>>>>>>>> it is mentioned --->
>>>>>>>>
>>>>>>>> *Optionally, if your data do use namespaces that are not present in
>>>>>>>>
>>>>>>>> prefix.cc (or the server used for indexing does not have internet
>>>>>>>> connectivity) you can manually define required prefixes by
>>>>>>>> creating/using
>>>>>>>> the a indexing/config/******namespaceprefix.mappings file
>>>>>>>>
>>>>>>>>
>>>>>>>> *
>>>>>>>> *
>>>>>>>>
>>>>>>>> *
>>>>>>>> Can we get some inputs on if some changes to this file are required
>>>>>>>> while
>>>>>>>> using DBpedia data
>>>>>>>>
>>>>>>>>   This file can be used at 'indexing time' when you use the indexing
>>>>>>>> tool
>>>>>>>>
>>>>>>> for creating the index for the DBpedia site. I have just seen that
>>>>>>> dbp-ont
>>>>>>> is already included as prefix. What I don't have clear right now is if
>>>>>>> you
>>>>>>> are generating your own dbpedia index including all the dbpedia
>>>>>>> ontology
>>>>>>> properties (that should be a enormous index) or if you are generating
>>>>>>> an
>>>>>>> index each time you need a new entity or even you are trying to
>>>>>>> retrieve
>>>>>>> the entities from dbpedia in a 'live' way :-). Sorry I'm confused
>>>>>>> about
>>>>>>> your workflow.
>>>>>>>
>>>>>>>
>>>>>>>   Also, looks like we are missing on some configurations in the
>>>>>>> overall
>>>>>>>
>>>>>>>> process, so if dev community can please provide help, it will be much
>>>>>>>> appreciated
>>>>>>>>
>>>>>>>> best regards
>>>>>>>> tarandeep
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 18, 2013 at 1:38 PM, Rafa Haro <rh...@zaizi.com> wrote:
>>>>>>>>
>>>>>>>>    Hi Tarandeep,
>>>>>>>>
>>>>>>>>  Have you tried using the whole URI of the property (
>>>>>>>>> http://dbpedia.org/ontology/********capital<http://dbpedia.org/ontology/******capital>
>>>>>>>>> <http://dbpedia.**org/ontology/****capital<http://dbpedia.org/ontology/****capital>
>>>>>>>>> >
>>>>>>>>> <http://dbpedia.org/****ontology/**capital<http://dbpedia.org/**ontology/**capital>
>>>>>>>>> <http://**dbpedia.org/ontology/**capital<http://dbpedia.org/ontology/**capital>
>>>>>>>>> **>
>>>>>>>>> )<http://dbpedia.org/******ontology/capital<http://dbpedia.org/****ontology/capital>
>>>>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/**ontology/capital>
>>>>>>>>> **>
>>>>>>>>>  
>>>>>>>>> <http://**dbpedia.org/**ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>>>>> <http://**dbpedia.org/ontology/capital<http://dbpedia.org/ontology/capital>
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>> )>
>>>>>>>>>
>>>>>>>>> ??
>>>>>>>>>
>>>>>>>>> Anyway, maybe it is a better idea to change your workflow, because I
>>>>>>>>> suppose that your example about "India" entity is something that
>>>>>>>>> could
>>>>>>>>> happen to you with more entities because the default DBpedia site in
>>>>>>>>> Stanbol doesn't contain information about dbp-ont properties. I
>>>>>>>>> would
>>>>>>>>> suggest to use EntityHub cache to locally store entities with all
>>>>>>>>> the
>>>>>>>>> information you need directly from DBpedia. So, maybe you can try to
>>>>>>>>> directly retrieve the entities from any DBpedia endpoint, store
>>>>>>>>> them in
>>>>>>>>> the
>>>>>>>>> EntityHub cache to ensure that you can use it later as your
>>>>>>>>> convenience.
>>>>>>>>> Maybe the workflow could be the following:
>>>>>>>>>
>>>>>>>>> 1. Enhance a document using Stanbol DBpedia site for linking.
>>>>>>>>> 2. For each extracted entity:
>>>>>>>>>            2.1. If the entity is already store in the EntityHub,
>>>>>>>>> get it
>>>>>>>>> using
>>>>>>>>> LDPath for dereferencing.
>>>>>>>>>            2.2. If not, retrieve the entity from DBpedia endpoint
>>>>>>>>> as RDF
>>>>>>>>> data
>>>>>>>>> and store it in the EntityHub. Then retrieve it
>>>>>>>>>
>>>>>>>>> I would day that this is currently possible in Stanbol, but maybe
>>>>>>>>> someone
>>>>>>>>> else in the list can give you more light with the issue.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>> El 18/07/13 09:48, Sawhney, Tarandeep Singh escribió:
>>>>>>>>>
>>>>>>>>>    Hi All,
>>>>>>>>>
>>>>>>>>>  In the stanbol local cache we have limited triples in dbpedia
>>>>>>>>>> reference
>>>>>>>>>> site.
>>>>>>>>>>
>>>>>>>>>> We have a need to get more triples for entities which are present
>>>>>>>>>> in
>>>>>>>>>> dbpedia
>>>>>>>>>> reference site. For example entity "India" has limited triples, so
>>>>>>>>>> when
>>>>>>>>>> we
>>>>>>>>>> enhance text which has india, it gets us only information which is
>>>>>>>>>> there
>>>>>>>>>> in
>>>>>>>>>> dbpedia reference site.
>>>>>>>>>>
>>>>>>>>>> We have followed below mentioned steps to add more RDF data for
>>>>>>>>>> entity
>>>>>>>>>> "India" by creating our own reference site.
>>>>>>>>>>
>>>>>>>>>> 1 - Downloaded rdf-data for 'India' from [1].
>>>>>>>>>>
>>>>>>>>>> 2 - Generated indexes for this rdf-data as suggested in article [2]
>>>>>>>>>> with
>>>>>>>>>> *Demo
>>>>>>>>>> *as a reference site name.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 3-  Initialized indexes within stanbol instance  as per [2].
>>>>>>>>>>
>>>>>>>>>> 4-  Configured new EntityLinking engine, '*demoLinkingEngine*' with
>>>>>>>>>> *Demo
>>>>>>>>>> *as
>>>>>>>>>>
>>>>>>>>>> referenced site as per [3].
>>>>>>>>>>          I have added *dbp-ont:capital *in *'"Fields used for
>>>>>>>>>> derefrencing*
>>>>>>>>>> "option.
>>>>>>>>>>
>>>>>>>>>> 5- Configured new weighted chain (*demoChain*).
>>>>>>>>>>
>>>>>>>>>> 6 - Now i am trying to enhance *"India is a country."* I am getting
>>>>>>>>>> India
>>>>>>>>>>
>>>>>>>>>> as de-reference entity but unable to get any new information
>>>>>>>>>> related
>>>>>>>>>> to *dbp-ont:capital
>>>>>>>>>> *which exists in my new reference site *Demo, *which in this case
>>>>>>>>>> should
>>>>>>>>>>
>>>>>>>>>> give us URI value of "New Delhi"
>>>>>>>>>>
>>>>>>>>>> [1] http://dbpedia.org/page/India
>>>>>>>>>> [2] http://stanbol.apache.org/********docs/trunk/customvocabulary.
>>>>>>>>>> ******<http://stanbol.apache.org/******docs/trunk/customvocabulary.****>
>>>>>>>>>> **html<http://stanbol.apache.**org/****docs/trunk/**
>>>>>>>>>> customvocabulary.****html<http://stanbol.apache.org/****docs/trunk/customvocabulary.****html>
>>>>>>>>>> >
>>>>>>>>>> <http://stanbol.apache.**org/****docs/trunk/****
>>>>>>>>>> customvocabulary.**html<http:/**/stanbol.apache.org/**docs/**
>>>>>>>>>> trunk/customvocabulary.**html<http://stanbol.apache.org/**docs/trunk/customvocabulary.**html>
>>>>>>>>>> >
>>>>>>>>>> <http://stanbol.apache.**org/****docs/trunk/****
>>>>>>>>>> customvocabulary.**html<
>>>>>>>>>> http://stanbol.apache.**org/**docs/trunk/**customvocabulary.**
>>>>>>>>>> html<http://stanbol.apache.**org/docs/trunk/**
>>>>>>>>>> customvocabulary.html<http://stanbol.apache.org/docs/trunk/customvocabulary.html>
>>>>>>>>>> >
>>>>>>>>>> [3]
>>>>>>>>>> http://stanbol.apache.org/********docs/trunk/components/****<http://stanbol.apache.org/******docs/trunk/components/****>
>>>>>>>>>> <h**ttp://stanbol.apache.org/******docs/trunk/components/****<http://stanbol.apache.org/****docs/trunk/components/****>
>>>>>>>>>> >
>>>>>>>>>> enhancer/engines/**<http://**s**tanbol.apache.org/**docs/**<http://stanbol.apache.org/**docs/**>
>>>>>>>>>> trunk/components/**enhancer/****engines/**<http://stanbol.**
>>>>>>>>>> apache.org/**docs/trunk/**components/**enhancer/engines/****<http://stanbol.apache.org/**docs/trunk/components/**enhancer/engines/**>
>>>>>>>>>> >
>>>>>>>>>> entityhublinking<http://****stan**bol.apache.org/docs/**trunk/**<http://bol.apache.org/docs/trunk/**>
>>>>>>>>>> <http://stanbol.**apache.org/docs/trunk/**<http://stanbol.apache.org/docs/trunk/**>
>>>>>>>>>> >
>>>>>>>>>> components/enhancer/engines/******entityhublinking<http://**
>>>>>>>>>> stanbol.apache.org/docs/trunk/****components/enhancer/engines/****<http://stanbol.apache.org/docs/trunk/**components/enhancer/engines/**>
>>>>>>>>>> entityhublinking<http://**stanbol.apache.org/docs/trunk/**
>>>>>>>>>> components/enhancer/engines/**entityhublinking<http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking>
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>> Can you please let me know if i am doing something wrong here or
>>>>>>>>>> missing
>>>>>>>>>> some configurations.
>>>>>>>>>> Please let me know in case you need some more information on how we
>>>>>>>>>> are
>>>>>>>>>> trying to do it
>>>>>>>>>>
>>>>>>>>>> best regards
>>>>>>>>>> tarandeep
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    --
>>>>>>>>>>
>>>>>>>>>>  ------------------------------
>>>>>>>>> This message should be regarded as confidential. If you have
>>>>>>>>> received
>>>>>>>>> this
>>>>>>>>> email in error please notify the sender and destroy it immediately.
>>>>>>>>> Statements of intent shall only become binding when confirmed in
>>>>>>>>> hard
>>>>>>>>> copy
>>>>>>>>> by an authorised signatory.
>>>>>>>>>
>>>>>>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>>>>>>> number
>>>>>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
>>>>>>>>> Road,
>>>>>>>>> London W6 7AN.
>>>>>>>>>
>>>>>>>>>   --
>>>>>>>>>
>>>>>>>> ------------------------------
>>>>>>> This message should be regarded as confidential. If you have received
>>>>>>> this
>>>>>>> email in error please notify the sender and destroy it immediately.
>>>>>>> Statements of intent shall only become binding when confirmed in hard
>>>>>>> copy
>>>>>>> by an authorised signatory.
>>>>>>>
>>>>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>>>>> number
>>>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
>>>>>>> Road,
>>>>>>> London W6 7AN.
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>
>>>>> ------------------------------
>>>>> This message should be regarded as confidential. If you have received
>>>>> this
>>>>> email in error please notify the sender and destroy it immediately.
>>>>> Statements of intent shall only become binding when confirmed in hard
>>>>> copy
>>>>> by an authorised signatory.
>>>>>
>>>>> Zaizi Ltd is registered in England and Wales with the registration
>>>>> number
>>>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>>>> London W6 7AN.
>>>>>
>>>>>
>>>
>>> --
>>>
>>> ------------------------------
>>> This message should be regarded as confidential. If you have received
>>> this email in error please notify the sender and destroy it immediately.
>>> Statements of intent shall only become binding when confirmed in hard copy
>>> by an authorised signatory.
>>>
>>> Zaizi Ltd is registered in England and Wales with the registration number
>>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>>> London W6 7AN.
>>>
>>
>>
>
> --
>
> "This e-mail and any attachments transmitted with it are for the sole use
> of the intended recipient(s) and may contain confidential , proprietary or
> privileged information. If you are not the intended recipient, please
> contact the sender by reply e-mail and destroy all copies of the original
> message. Any unauthorized review, use, disclosure, dissemination,
> forwarding, printing or copying of this e-mail or any action taken in
> reliance on this e-mail is strictly prohibited and may be unlawful."



-- 
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to