btc-2012 is definitely a good idea, you should start with it. If you have time, you might want to also extract foaf:Person and schema:Person URIs from the more recent Web Data Commons (WDC) from August 2012 [1], and use them as seed sets for crawling more FOAF data (you might have to align the schema.org vocabulary to FOAF, I think stanbol allows such functionality out of the box).
Steph. [1] On Jun 26, 2013 2:24 AM, "Dileepa Jayakody" <dileepajayak...@gmail.com> wrote: > > Hi All, > > Below is the reply I got from Andreas Harth from webdatacommons project. He > suggests that the btc-2012 dataset I mentioned in my previous mail has a > sufficient FOAF dataset. > Shall I go ahead with that dataset for my project? > > "the BTC 2012 has FOAF data [1]. You'd get a more comprehensive FOAF > dataset if you first get all instances of foaf:Persons (simple grep) > and then start a crawl from those, e.g., via LDSpider [2]. I assume > that a hop-1 crawl would already get you a sizable dataset. > > All the best with your project, I look forward to seeing the results! > > Best regards, > Andreas. > > [1] http://km.aifb.kit.edu/**projects/btc-2012/< http://km.aifb.kit.edu/projects/btc-2012/> > [2] http://code.google.com/p/**ldspider/< http://code.google.com/p/ldspider/> > " > Thanks, > Dileepa > > > On Tue, Jun 25, 2013 at 5:45 PM, Dileepa Jayakody < dileepajayak...@gmail.com > > wrote: > > > Hi All, > > > > For my project: FOAF co-reference based disambiguation, as the first > > milestone I'm developing an EntityHub ReferencedSite for a foaf data-set. > > With help from Rupert and others I was able to index a sample foaf dataset > > using the genericrdf indexing tool and setup a referenced-site. foaf-data > > can be filtered, by using propertyfilter.config to import foaf:*. This will > > import all entities which define foaf properties. The next step will be > > to develop a EntityProcessor to further filter and clean the foaf data by > > defining the required foaf properties that are going to be used for > > disambiguation purpose. > > > > To continue my project I would like to finalize the FOAF dataset I need to > > use, and highly appreciate your input on this. > > In the foaf-wiki site [1] there are many datasource projects but many of > > them are out of date. > > > > Following are my findings for a dataset for my project; > > > > 1. The billion-tripple challenge 2012 project [2] , a web-crawled dataset > > including data from dbpedia, freebase, datahub, timbl, rest datasources. Quantity > > wise I think this has a sufficient amount (1436545545 quads) of data and > > it's fairly upto date. > > 2. WebDataCommons project [3] which has a dataset (1079175202 quads) > > created in August 2012. But the sources of the data is not specified in the > > project. I have posted on their group asking if they have foaf data in > > their dataset, waiting for their suggestions on it. > > > > 3. DBpedia also has resources having foaf properties. Specially 'dbpedia-ont:Person' > > type entities contain foaf properties. I think we can map > > dbpedia-ont:Person to a FOAF profile here. WDYT? > > > > 4. There are several websites like http://iwlearn.net/, opera-community > > exposing their contact list as FOAF, but they don't contain data on public > > figures, celebrities AFAIK. > > > > Can I please have your opinions on finalizing a dataset for my project? > > Appreciate your help. > > > > Thanks, > > Dileepa > > > > [1] http://www.w3.org/wiki/FoafSites > > [2] http://km.aifb.kit.edu/projects/btc-2012/ > > [3] http://webdatacommons.org/ > >