I recently came across this OSS project called BigData [1], with the following description from their web site: Bigdata(R) comes packaged with a very high-performance RDF store supporting RDF(S) and OWL Lite inference. The Bigdata RDF Store is currently the only RDF database capable of operating distributed on a cluster. The Bigdata RDF Store was designed specifically to meet requirements for very large scale semantic alignment and federation. RDF is a Semantic Web technology particularly well-suited to modeling graph-shaped data and metadata, such as an associative entity-link model, whereby actors are linked to one another in an ad-hoc fashion within the context of an evolving ontology of concepts for entity types and link types related to a particular problem domain. The Bigdata RDF Store is used operationally in data harvesting systems to create mash-ups of structured, semi-structured, and unstructured data from myriad sources in a schema-flexible manner.
Having had a very brief look at the API, it seems that BigData is interfacing with Sesame for RDF-related processing. It also seems that this project has demonstrated some great scalability. Does anybody have any experience with BigData? Best wishes Yuan-Fang 1. http://www.systap.com/bigdata.htm On Fri, Jul 3, 2009 at 4:29 AM, Amandeep Khurana <[email protected]> wrote: > I can share the data model right here with you. Beyond the data model, the > MR jobs etc are specific to the data sources I am pulling into Hbase to > connect with each other. > > I've attached an image that represents the table structure basics. > > Essentially, the column family and column identifier used in combination > represent the predicate. The row id is the subject and the cell value is the > object.. > > > > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > > > On Thu, Jul 2, 2009 at 11:20 AM, Brian MacKay <[email protected] > > wrote: > >> I understand, if you would consider open sourcing it as a rough >> prototype, let us know. ? >> >> -----Original Message----- >> From: Amandeep Khurana [mailto:[email protected]] >> Sent: Thursday, July 02, 2009 2:15 PM >> To: [email protected] >> Subject: Re: RDF Data Store on Hadoop >> >> Its a prototype right now and in a nascent stage. I havent made it open. >> Essentially, its storing triples in Hbase so its just the data model >> thats >> solving some problems that I was working on. I dont have a SPARQL engine >> built over it yet. >> >> >> >> >> Amandeep Khurana >> Computer Science Graduate Student >> University of California, Santa Cruz >> >> >> On Thu, Jul 2, 2009 at 11:13 AM, Brian MacKay >> <[email protected]>wrote: >> >> > Hi Amandeep, >> > >> > Is your custom RDF store over Hbase open source? If so, where is it >> > hosted? >> > >> > Thanks, >> > Brian >> > >> > -----Original Message----- >> > From: Amandeep Khurana [mailto:[email protected]] >> > Sent: Thursday, July 02, 2009 2:08 PM >> > To: [email protected] >> > Subject: Re: RDF Data Store on Hadoop >> > >> > Hi >> > >> > I have been working on this as well. There are a couple of more >> threads >> > on >> > this and the Hbase mailing list about this. One is pretty recent - its >> > about >> > graph algorithms using map reduce. >> > >> > I built a custom RDF store over Hbase. Its not really an RDF store by >> > all >> > means, but the data model is something that can be extended over to >> > support >> > all specifications of RDF. >> > >> > The Heart project hasnt been active for some time now. If we have >> people >> > interested, we can take it on and work on creating an RDF store over >> > Hbase >> > alongwith graph algorithms using MR. >> > >> > Amandeep >> > >> > >> > Amandeep Khurana >> > Computer Science Graduate Student >> > University of California, Santa Cruz >> > >> > >> > On Thu, Jul 2, 2009 at 5:32 AM, Alex McLintock >> > <[email protected]>wrote: >> > >> > > I'm looking to build up data for an RDF Data store using Hadoop. >> > > I could just generate lots of RDF XML files - or a big one - and >> feed >> > > it into Apache Jena.. >> > > However it seems to me that it would be best if I used a hadoop >> aware >> > > distributed triplestore so that my data stayed on the data nodes. >> > > >> > > I see that there is the Heart project ( >> http://rdf-proj.blogspot.com/ >> > > ) but it doesnt seem very active. >> > > >> > > Does anyone have any recommendations for a usable RDF data store >> which >> > > I can use with Hadoop? Or should I consider this outside of the >> Hadoop >> > > world and just put it on one machine? Is Heart "nearly there" and >> just >> > > needs a helping hand? >> > > >> > > I am tempted to bypass the RDF triplestore and role my own using >> hBase >> > > but I dont want to re-invent the wheel. >> > > >> > > Cheers >> > > >> > > Alex >> > > >> > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >> _ _ _ >> > >> > The information transmitted is intended only for the person or entity >> to >> > which it is addressed and may contain confidential and/or privileged >> > material. Any review, retransmission, dissemination or other use of, >> or >> > taking of any action in reliance upon, this information by persons or >> > entities other than the intended recipient is prohibited. If you >> received >> > this message in error, please contact the sender and delete the >> material >> > from any computer. >> > >> > >> > >> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ >> _ >> >> The information transmitted is intended only for the person or entity to >> which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipient is prohibited. If you received >> this message in error, please contact the sender and delete the material >> from any computer. >> >> >> >
