Hi Rupert et al, On Wed, Jun 19, 2013 at 2:27 PM, Rupert Westenthaler < rupert.westentha...@gmail.com> wrote:
> Hi > > > On Wed, Jun 19, 2013 at 9:20 AM, Dileepa Jayakody > <dileepajayak...@gmail.com> wrote: > > Hi All, > > > > I'm trying out entityhub indexing tool to configure a site for a sample > > foaf dataset. My data set (sampleNquads.nx) is in n-quad format. Actually > > it is a set of links to foaf files from various sources in nquad format. > > > > eg: > > <http://www.agfa.com/> <http://www.agfa.com/global/en/main/index.jsp> . > > *<http://sebastian.tramp.name/> <http://sebastian.tramp.name/index.rdf> > .* > > <http://gitorious.com/~tobyink> <http://gitorious.org/~tobyink> . > > > > I am not completely sure what you are mean by that. > I have misunderstood the N-Quad format, and thought it's just a set of links to external rdf files. The sample data-set I used was just a small part of the actual datahub dataset (>1 GB) and incomplete. That might be the reason the indexer not been able to index the dataset.:) > > Generally: Links to RDF files are not supported by the Indexing Tool. > You will need to download the RDF files to the > "indexing/resources/rdfdata" directory. > > Quad Formats are in principle supported by the Indexing Tool. However > node that only SPO are used and the Context is dropped during the > import. > > > For debugging the indexing process: > > * the Indexing Tool logs the number of indexed Entities. You should > check this value > * the IDs off all indexed entities are also stored in > "indexing/destination/indexed-entities-ids.zip". After installing the > index to Stanbol you can use those IDs to retrieve the available data > by using requests like "curl -H "Accept: text/turtle" > "http://localhost:8080/entityhub/site/{site-name}/entity?id={entity-id}" > > > I followed the instructions here [1] and in the ReadMe.md, > > indexing.properties files of the tool and created a site {datahub} for my > > data accessible at : http://localhost:8080/entityhub/site/datahub/ > > > > However when I try out sample requests to find entities in the site I get > > no results. > > I'm trying to find the entity with *name=Sebastian** which is actually in > > the sample dataset used above but I get an empty results set. Can anyone > > please help me understand what I've done wrong here? Basically I have > > followed the steps in init, index executions of the tool. > > > > Is it because my dataset is only a set of external links to foaf files? > > Do I need to manually download the foaf files to > indexing/resources/rdfdata > > directory? > > > > eg : > > > > request: curl -X POST -d "name=Sebastian*" > > http://localhost:8080/entityhub/site/datahub/find > > > > result : > > { > > "query": { > > "selected": [ > > "http:\/\/stanbol.apache.org > \/ontology\/entityhub\/query#score", > > "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label" > > ], > > "constraints": [{ > > "type": "text", > > "patternType": "wildcard", > > "text": "SSebastian Tramp", > > "field": "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label" > > }], > > "limit": 5, > > "offset": 0 > > }, > > "results": [] > > } > > > > For queries like that you need to make sure that your entities do have > values for "rdf:label". AFAIK the default > "indexing/config/mapping.txt" configuration does copy the foaf:name > value to rdfs:label, but if you do specifically work with FOAF data > you should preferable query for "foaf:name". > > Thanks for these useful pointers. I will follow them. In general for my GSOC project on FOAF co-reference based disambiguation, do you think this datahub dataset is useful? This is the best dataset I found so far other than already indexed DBpedia dataset in Stanbol. Thanks, Dileepa > best > Rupert > > > > > Your help is much appreciated here. > > Thanks, > > Dileepa > > > > > > [1] http://stanbol.apache.org/docs/trunk/customvocabulary.html > > > > -- > | Rupert Westenthaler rupert.westentha...@gmail.com > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen >