Hi Rupert et al,
On Wed, Jun 19, 2013 at 2:27 PM, Rupert Westenthaler <
rupert.westentha...@gmail.com> wrote:

> Hi
>
>
> On Wed, Jun 19, 2013 at 9:20 AM, Dileepa Jayakody
> <dileepajayak...@gmail.com> wrote:
> > Hi All,
> >
> > I'm trying out entityhub indexing tool to configure a site for a sample
> > foaf dataset. My data set (sampleNquads.nx) is in n-quad format. Actually
> >  it is a set of links to foaf files from various sources in nquad format.
> >
> > eg:
> > <http://www.agfa.com/> <http://www.agfa.com/global/en/main/index.jsp> .
> > *<http://sebastian.tramp.name/> <http://sebastian.tramp.name/index.rdf>
> .*
> > <http://gitorious.com/~tobyink> <http://gitorious.org/~tobyink> .
> >
>
> I am not completely sure what you are mean by that.
>

I have misunderstood the N-Quad format, and thought it's just a set of
links to external rdf files.
The sample data-set I used  was just a small part of the actual datahub
dataset (>1 GB) and incomplete. That might be the reason the indexer not
been able to index the dataset.:)

>
> Generally: Links to RDF files are not supported by the Indexing Tool.
> You will need to download the RDF files to the
> "indexing/resources/rdfdata" directory.
>
> Quad Formats are in principle supported by the Indexing Tool. However
> node that only SPO are used and the Context is dropped during the
> import.
>
>
> For debugging the indexing process:
>
>   * the Indexing Tool logs the number of indexed Entities. You should
> check this value
>   * the IDs off all indexed entities are also stored in
> "indexing/destination/indexed-entities-ids.zip". After installing the
> index to Stanbol you can use those IDs to retrieve the available data
> by using requests like "curl -H "Accept: text/turtle"
> "http://localhost:8080/entityhub/site/{site-name}/entity?id={entity-id}";
>
> > I followed the instructions here [1] and in the ReadMe.md,
> > indexing.properties files of the tool and created a site {datahub} for my
> > data accessible at : http://localhost:8080/entityhub/site/datahub/
> >
> > However when I try out sample requests to find entities in the site I get
> > no results.
> > I'm trying to find the entity with *name=Sebastian** which is actually in
> > the sample dataset used above but I get an empty results set. Can anyone
> > please help me understand what I've done wrong here? Basically I have
> > followed the steps in init, index executions of the tool.
> >
> > Is it because my dataset is only a set of external links to foaf files?
> > Do I need to manually download the foaf files to
> indexing/resources/rdfdata
> > directory?
> >
> > eg :
> >
> > request: curl -X POST -d "name=Sebastian*"
> > http://localhost:8080/entityhub/site/datahub/find
> >
> > result :
> > {
> >     "query": {
> >         "selected": [
> >             "http:\/\/stanbol.apache.org
> \/ontology\/entityhub\/query#score",
> >             "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label"
> >         ],
> >         "constraints": [{
> >             "type": "text",
> >             "patternType": "wildcard",
> >             "text": "SSebastian Tramp",
> >             "field": "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label"
> >         }],
> >         "limit": 5,
> >         "offset": 0
> >     },
> >     "results": []
> > }
> >
>
> For queries like that you need to make sure that your entities do have
> values for "rdf:label". AFAIK the default
> "indexing/config/mapping.txt" configuration does copy the foaf:name
> value to rdfs:label, but if you do specifically work with FOAF data
> you should preferable query for "foaf:name".
>
> Thanks for these useful pointers. I will follow them.

In general for my GSOC project on FOAF co-reference based disambiguation,
do you think this datahub dataset is useful?
This is the best dataset I found so far other than already indexed DBpedia
dataset in Stanbol.

Thanks,
Dileepa


> best
> Rupert
>
> >
> > Your help is much appreciated here.
> > Thanks,
> > Dileepa
> >
> >
> > [1] http://stanbol.apache.org/docs/trunk/customvocabulary.html
>
>
>
> --
> | Rupert Westenthaler             rupert.westentha...@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Reply via email to