The the link of the data-set project I'm looking at : http://km.aifb.kit.edu/projects/btc-2012/
On Wed, Jun 19, 2013 at 4:21 PM, Dileepa Jayakody <dileepajayak...@gmail.com > wrote: > Hi Rupert et al, > On Wed, Jun 19, 2013 at 2:27 PM, Rupert Westenthaler < > rupert.westentha...@gmail.com> wrote: > >> Hi >> >> >> On Wed, Jun 19, 2013 at 9:20 AM, Dileepa Jayakody >> <dileepajayak...@gmail.com> wrote: >> > Hi All, >> > >> > I'm trying out entityhub indexing tool to configure a site for a sample >> > foaf dataset. My data set (sampleNquads.nx) is in n-quad format. >> Actually >> > it is a set of links to foaf files from various sources in nquad >> format. >> > >> > eg: >> > <http://www.agfa.com/> <http://www.agfa.com/global/en/main/index.jsp> . >> > *<http://sebastian.tramp.name/> <http://sebastian.tramp.name/index.rdf> >> .* >> > <http://gitorious.com/~tobyink> <http://gitorious.org/~tobyink> . >> > >> >> I am not completely sure what you are mean by that. >> > > I have misunderstood the N-Quad format, and thought it's just a set of > links to external rdf files. > The sample data-set I used was just a small part of the actual datahub > dataset (>1 GB) and incomplete. That might be the reason the indexer not > been able to index the dataset.:) > >> >> Generally: Links to RDF files are not supported by the Indexing Tool. >> You will need to download the RDF files to the >> "indexing/resources/rdfdata" directory. >> >> Quad Formats are in principle supported by the Indexing Tool. However >> node that only SPO are used and the Context is dropped during the >> import. >> >> >> For debugging the indexing process: >> >> * the Indexing Tool logs the number of indexed Entities. You should >> check this value >> * the IDs off all indexed entities are also stored in >> "indexing/destination/indexed-entities-ids.zip". After installing the >> index to Stanbol you can use those IDs to retrieve the available data >> by using requests like "curl -H "Accept: text/turtle" >> "http://localhost:8080/entityhub/site/{site-name}/entity?id={entity-id}" >> >> > I followed the instructions here [1] and in the ReadMe.md, >> > indexing.properties files of the tool and created a site {datahub} for >> my >> > data accessible at : http://localhost:8080/entityhub/site/datahub/ >> > >> > However when I try out sample requests to find entities in the site I >> get >> > no results. >> > I'm trying to find the entity with *name=Sebastian** which is actually >> in >> > the sample dataset used above but I get an empty results set. Can anyone >> > please help me understand what I've done wrong here? Basically I have >> > followed the steps in init, index executions of the tool. >> > >> > Is it because my dataset is only a set of external links to foaf files? >> > Do I need to manually download the foaf files to >> indexing/resources/rdfdata >> > directory? >> > >> > eg : >> > >> > request: curl -X POST -d "name=Sebastian*" >> > http://localhost:8080/entityhub/site/datahub/find >> > >> > result : >> > { >> > "query": { >> > "selected": [ >> > "http:\/\/stanbol.apache.org >> \/ontology\/entityhub\/query#score", >> > "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label" >> > ], >> > "constraints": [{ >> > "type": "text", >> > "patternType": "wildcard", >> > "text": "SSebastian Tramp", >> > "field": "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label" >> > }], >> > "limit": 5, >> > "offset": 0 >> > }, >> > "results": [] >> > } >> > >> >> For queries like that you need to make sure that your entities do have >> values for "rdf:label". AFAIK the default >> "indexing/config/mapping.txt" configuration does copy the foaf:name >> value to rdfs:label, but if you do specifically work with FOAF data >> you should preferable query for "foaf:name". >> >> Thanks for these useful pointers. I will follow them. > > In general for my GSOC project on FOAF co-reference based disambiguation, > do you think this datahub dataset is useful? > This is the best dataset I found so far other than already indexed DBpedia > dataset in Stanbol. > > Thanks, > Dileepa > > >> best >> Rupert >> >> > >> > Your help is much appreciated here. >> > Thanks, >> > Dileepa >> > >> > >> > [1] http://stanbol.apache.org/docs/trunk/customvocabulary.html >> >> >> >> -- >> | Rupert Westenthaler rupert.westentha...@gmail.com >> | Bodenlehenstraße 11 ++43-699-11108907 >> | A-5500 Bischofshofen >> > >