CrunchBase [1] (I am sure you all know) is an interest website (with APIs). It is a great source of information about companies, people, financial organizations, service providers, funding rounds and acquisitions. The website is very useful already as it is if you want to search info about a company or browse around. However, it is not possible, for example: to search for trends in funding, movements of people between companies, etc.
They have an API and all data is available in JSON format. It's quite easy to crawl and extract what you want. A conversion of this data in RDF would be quite useful to people wanting to do some CrunchBase data mining/analysis. I started writing a crunchbase2rdf crawler/conversion tool using Apache Jena (of course!) and JSoup. The main code for crawling and converting the data is there, however it is incomplete and just an initial hack. Help on data modeling, suggestions on RDF vocabularies to use (other than FOAF, DC, ...) and writing more RDFExtractors is welcome. And this is the reason why I am posting this message on jena-users ml. An RDFExtract is very easy to write, here is one: public class TwitterRdfExtractor extends AbstractRdfExtractor { public TwitterRdfExtractor() { super("twitter_username"); } @Override public Model extract ( Resource subject, JSON json ) { Model model = ModelFactory.createDefaultModel(); Object object = json.object().get(name()); if ( object != null ) { String username = object.toString().trim(); if ( username.length() > 0 ) { model.add(subject, ResourceFactory.createProperty(Run.CRUNCHBASE_NS, name()), username); } } return model; } } The crawler will automatically trigger the execution of this if the JSON document has a field named "twitter_username". Maybe this is overcomplicated and something easier/simpler is better. Do you have a generic JSON to RDF conversion code in Java? Of course, in an ideal world CrunchBase would publish a data dump or a public SQL/SPARQL (or any other query language they chose) endpoint. So that people interested can explore their data as they wish. Last but not least, see also: - http://bnode.org/blog/2008/07/29/semantic-web-by-example-semantic-crunchbase - http://cb.semsol.org/ (yep... not there, unfortunately) Paolo PS: Benji, you should really resurrect Semantic CrunchBase and have time to work on it. ;-) [1] http://www.crunchbase.com/