CrunchBase data in RDF?

Paolo Castagna Fri, 27 Apr 2012 03:52:37 -0700

CrunchBase [1] (I am sure you all know) is an interest website (with APIs). It
is a great source of information about companies, people, financial
organizations, service providers, funding rounds and acquisitions. The website
is very useful already as it is if you want to search info about a company or
browse around. However, it is not possible, for example: to search for trends in
funding, movements of people between companies, etc.


They have an API and all data is available in JSON format. It's quite easy to
crawl and extract what you want. A conversion of this data in RDF would be quite
useful to people wanting to do some CrunchBase data mining/analysis.

I started writing a crunchbase2rdf crawler/conversion tool using Apache Jena (of
course!) and JSoup. The main code for crawling and converting the data is there,
however it is incomplete and just an initial hack.

Help on data modeling, suggestions on RDF vocabularies to use (other than FOAF,
DC, ...) and writing more RDFExtractors is welcome. And this is the reason why I
am posting this message on jena-users ml.

An RDFExtract is very easy to write, here is one:

public class TwitterRdfExtractor extends AbstractRdfExtractor {
        public TwitterRdfExtractor() { super("twitter_username"); }
        @Override
        public Model extract ( Resource subject, JSON json ) {
                Model model = ModelFactory.createDefaultModel();
                Object object = json.object().get(name());
                if ( object != null ) {
                        String username = object.toString().trim();
                        if ( username.length() > 0 ) {
                                model.add(subject, 
ResourceFactory.createProperty(Run.CRUNCHBASE_NS,
name()), username);
                        }                       
                }
                return model;
        }
}

The crawler will automatically trigger the execution of this if the JSON
document has a field named "twitter_username". Maybe this is overcomplicated
and something easier/simpler is better.

Do you have a generic JSON to RDF conversion code in Java?

Of course, in an ideal world CrunchBase would publish a data dump or a public
SQL/SPARQL (or any other query language they chose) endpoint. So that people
interested can explore their data as they wish.

Last but not least, see also:

 - http://bnode.org/blog/2008/07/29/semantic-web-by-example-semantic-crunchbase
 - http://cb.semsol.org/ (yep... not there, unfortunately)

Paolo

PS:
Benji, you should really resurrect Semantic CrunchBase and have time to work on
it. ;-)

 [1] http://www.crunchbase.com/

CrunchBase data in RDF?

Reply via email to