The the link of the data-set project I'm looking at :
http://km.aifb.kit.edu/projects/btc-2012/


On Wed, Jun 19, 2013 at 4:21 PM, Dileepa Jayakody <dileepajayak...@gmail.com
> wrote:

> Hi Rupert et al,
> On Wed, Jun 19, 2013 at 2:27 PM, Rupert Westenthaler <
> rupert.westentha...@gmail.com> wrote:
>
>> Hi
>>
>>
>> On Wed, Jun 19, 2013 at 9:20 AM, Dileepa Jayakody
>> <dileepajayak...@gmail.com> wrote:
>> > Hi All,
>> >
>> > I'm trying out entityhub indexing tool to configure a site for a sample
>> > foaf dataset. My data set (sampleNquads.nx) is in n-quad format.
>> Actually
>> >  it is a set of links to foaf files from various sources in nquad
>> format.
>> >
>> > eg:
>> > <http://www.agfa.com/> <http://www.agfa.com/global/en/main/index.jsp> .
>> > *<http://sebastian.tramp.name/> <http://sebastian.tramp.name/index.rdf>
>> .*
>> > <http://gitorious.com/~tobyink> <http://gitorious.org/~tobyink> .
>> >
>>
>> I am not completely sure what you are mean by that.
>>
>
> I have misunderstood the N-Quad format, and thought it's just a set of
> links to external rdf files.
> The sample data-set I used  was just a small part of the actual datahub
> dataset (>1 GB) and incomplete. That might be the reason the indexer not
> been able to index the dataset.:)
>
>>
>> Generally: Links to RDF files are not supported by the Indexing Tool.
>> You will need to download the RDF files to the
>> "indexing/resources/rdfdata" directory.
>>
>> Quad Formats are in principle supported by the Indexing Tool. However
>> node that only SPO are used and the Context is dropped during the
>> import.
>>
>>
>> For debugging the indexing process:
>>
>>   * the Indexing Tool logs the number of indexed Entities. You should
>> check this value
>>   * the IDs off all indexed entities are also stored in
>> "indexing/destination/indexed-entities-ids.zip". After installing the
>> index to Stanbol you can use those IDs to retrieve the available data
>> by using requests like "curl -H "Accept: text/turtle"
>> "http://localhost:8080/entityhub/site/{site-name}/entity?id={entity-id}";
>>
>> > I followed the instructions here [1] and in the ReadMe.md,
>> > indexing.properties files of the tool and created a site {datahub} for
>> my
>> > data accessible at : http://localhost:8080/entityhub/site/datahub/
>> >
>> > However when I try out sample requests to find entities in the site I
>> get
>> > no results.
>> > I'm trying to find the entity with *name=Sebastian** which is actually
>> in
>> > the sample dataset used above but I get an empty results set. Can anyone
>> > please help me understand what I've done wrong here? Basically I have
>> > followed the steps in init, index executions of the tool.
>> >
>> > Is it because my dataset is only a set of external links to foaf files?
>> > Do I need to manually download the foaf files to
>> indexing/resources/rdfdata
>> > directory?
>> >
>> > eg :
>> >
>> > request: curl -X POST -d "name=Sebastian*"
>> > http://localhost:8080/entityhub/site/datahub/find
>> >
>> > result :
>> > {
>> >     "query": {
>> >         "selected": [
>> >             "http:\/\/stanbol.apache.org
>> \/ontology\/entityhub\/query#score",
>> >             "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label"
>> >         ],
>> >         "constraints": [{
>> >             "type": "text",
>> >             "patternType": "wildcard",
>> >             "text": "SSebastian Tramp",
>> >             "field": "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label"
>> >         }],
>> >         "limit": 5,
>> >         "offset": 0
>> >     },
>> >     "results": []
>> > }
>> >
>>
>> For queries like that you need to make sure that your entities do have
>> values for "rdf:label". AFAIK the default
>> "indexing/config/mapping.txt" configuration does copy the foaf:name
>> value to rdfs:label, but if you do specifically work with FOAF data
>> you should preferable query for "foaf:name".
>>
>> Thanks for these useful pointers. I will follow them.
>
> In general for my GSOC project on FOAF co-reference based disambiguation,
> do you think this datahub dataset is useful?
> This is the best dataset I found so far other than already indexed DBpedia
> dataset in Stanbol.
>
> Thanks,
> Dileepa
>
>
>> best
>> Rupert
>>
>> >
>> > Your help is much appreciated here.
>> > Thanks,
>> > Dileepa
>> >
>> >
>> > [1] http://stanbol.apache.org/docs/trunk/customvocabulary.html
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westentha...@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>

Reply via email to