btc-2012 is definitely a good idea, you should start with it.

If you have time, you might want to also extract foaf:Person and
schema:Person URIs from the more recent Web Data Commons (WDC) from August
2012 [1], and use them as seed sets for crawling more FOAF data (you might
have to align the schema.org vocabulary to FOAF, I think stanbol allows
such functionality out of the box).

Steph.

[1]

On Jun 26, 2013 2:24 AM, "Dileepa Jayakody" <dileepajayak...@gmail.com>
wrote:
>
> Hi All,
>
> Below is the reply I got from Andreas Harth from webdatacommons project.
He
> suggests that the btc-2012 dataset I mentioned in my previous mail has a
> sufficient FOAF dataset.
> Shall I go ahead with that dataset for my project?
>
> "the BTC 2012 has FOAF data [1].  You'd get a more comprehensive FOAF
> dataset if you first get all instances of foaf:Persons (simple grep)
> and then start a crawl from those, e.g., via LDSpider [2].  I assume
> that a hop-1 crawl would already get you a sizable dataset.
>
> All the best with your project, I look forward to seeing the results!
>
> Best regards,
> Andreas.
>
> [1] http://km.aifb.kit.edu/**projects/btc-2012/<
http://km.aifb.kit.edu/projects/btc-2012/>
> [2] http://code.google.com/p/**ldspider/<
http://code.google.com/p/ldspider/>
> "
> Thanks,
> Dileepa
>
>
> On Tue, Jun 25, 2013 at 5:45 PM, Dileepa Jayakody <
dileepajayak...@gmail.com
> > wrote:
>
> > Hi All,
> >
> > For my project:  FOAF co-reference based disambiguation, as the first
> > milestone I'm developing an EntityHub ReferencedSite for a foaf
data-set.
> > With help from Rupert and others I was able to index a sample foaf
dataset
> > using the genericrdf indexing tool and setup a referenced-site.
foaf-data
> > can be filtered, by using propertyfilter.config to import foaf:*. This
will
> > import all entities which define foaf properties. The next step will be
> > to develop a EntityProcessor to further filter and clean the foaf data
by
> > defining the required foaf properties that are going to be used for
> > disambiguation purpose.
> >
> > To continue my project I would like to finalize the FOAF dataset I need
to
> > use, and highly appreciate your input on this.
> > In the foaf-wiki site [1] there are many datasource projects but many of
> > them are out of date.
> >
> > Following are my findings for a dataset for my project;
> >
> > 1. The billion-tripple challenge 2012 project [2] , a web-crawled
dataset
> > including data from dbpedia, freebase, datahub, timbl, rest
datasources. Quantity
> > wise I think this has a sufficient amount (1436545545 quads) of data and
> > it's fairly upto date.
> > 2. WebDataCommons project [3] which has a dataset (1079175202 quads)
> > created in August 2012. But the sources of the data is not specified in
the
> > project. I have posted on their group asking if they have foaf data in
> > their dataset, waiting for their suggestions on it.
> >
> > 3. DBpedia also has resources having foaf properties. Specially
'dbpedia-ont:Person'
> > type entities contain foaf properties. I think we can map
> > dbpedia-ont:Person to a FOAF profile here. WDYT?
> >
> > 4. There are several websites like http://iwlearn.net/, opera-community
> > exposing their contact list as FOAF, but they don't contain data on
public
> > figures, celebrities AFAIK.
> >
> > Can I please have your opinions on finalizing a dataset for my project?
> > Appreciate your help.
> >
> > Thanks,
> > Dileepa
> >
> > [1] http://www.w3.org/wiki/FoafSites
> > [2] http://km.aifb.kit.edu/projects/btc-2012/
> > [3] http://webdatacommons.org/
> >

Reply via email to