I'm looking to build up data for an RDF Data store using Hadoop. I could just generate lots of RDF XML files - or a big one - and feed it into Apache Jena.. However it seems to me that it would be best if I used a hadoop aware distributed triplestore so that my data stayed on the data nodes.
I see that there is the Heart project ( http://rdf-proj.blogspot.com/ ) but it doesnt seem very active. Does anyone have any recommendations for a usable RDF data store which I can use with Hadoop? Or should I consider this outside of the Hadoop world and just put it on one machine? Is Heart "nearly there" and just needs a helping hand? I am tempted to bypass the RDF triplestore and role my own using hBase but I dont want to re-invent the wheel. Cheers Alex
