Hi, This is an interesting topic, indeed. I haven't done any research on scalable RDF stores but I accidentally run onto http://www.systap.com/bigdata.htm that claims something like that.
My two cents ... Milorad >________________________________ > From: Tobias Neef <tob...@gmail.com> >To: jena-users@incubator.apache.org >Sent: Wednesday, April 25, 2012 10:33 AM >Subject: Re: How to implement a custom JENA Backend > >Hi Andy, > >thanks a lot for your insight. I agree with you that there is no free >cake. Most of the highly scalable stores have a very specific usage >scenario which is there sweet spot. Also the qualities of such a >service would depend on the mapping strategy you choose. > >> I have built a TDB that used Project Voldemort as a block store for the TDB >> B+Trees. It worked quite well but as highly scalable base, it's limited as >> too much work ends up on the query engine and not enough of the indexing >> work access work is doe in the cluster. > >Not sure if I quite get your point there. What do you mean with "and >not enough of the indexing work access work is doe in the cluster"? Do >you mean that this architecture would be most suited for a read / >query intensive scenario rather than a frequent update one? > >Your approach seems to be similar to some recently published approach, >which is the only research paper I have found in this area: >http://www.edbt.org/Proceedings/2012-Berlin/papers/workshops/danac2012/a4-bugiotti.pdf. > >Is there a chance that I can take a look at your project code? > >I know http://www.dydra.com/ but they haven't published anything yet >on how they manage their store. And the testing your can do seems to >be rather limited due to the beta constraints the currently have. > > >On Tue, Apr 24, 2012 at 8:56 PM, Andy Seaborne <a...@apache.org> wrote: >> On 24/04/12 14:57, Paolo Castagna wrote: >>> >>> Tobias Neef wrote: >>>> >>>> Hi Paolo, >>>> >>>> thanks for the quick response! The reason for doing this is, because I >>>> think it would be useful to have a RDF-Database with SPARQL-Interface >>>> which can be used as a PAAS offering like Amazon RDS or Amazon Dynamo >>>> DB: For the developer this would mean no hassle about replication, or >>>> scaling etc. To some extend you can achieve that when using Jena SDB >>>> on top of something like Amazon RDS or MS SQL Azure. I want to try how >>>> far I can get when I use Jena as API and map it to something like >>>> Dynamo DB or MS Azure Tables which have quite unique >>>> Scalability/Availability characteristics. There is for example >>>> http://datomic.com/ which also goes along those lines. They >>>> implemented it on top of Dynamo DB but with a custom query language. >>>> >>>> Does that make sense from your perspective? >> >> >> Hi Tobias, >> >> Interesting space and it would be great to have such a service. >> >> There are quite a few design choices to make and they can greatly influence >> the desing. For example: a service that offered replication etc and had >> many datasets can be built using one dataset per machine as the unit. It >> scales in total data but not in data-per-dataset or graph. >> >> A service that specialised in massive data (more about data management than >> raw query performance; maybe like a column store if aggregation queries >> matter) if different from one giving as-near-real-time response for UIs >> (basically, in-memory or the working set is in-memory). >> >> In terms of where to start, >> >> SDB if you are building on top of an SQL service >> >> TDB, or the shell of TDB, if you building on what amounts to a index >> service. TDB is built on top of indexes - you can plug in your own. >> >> I have built a TDB that used Project Voldemort as a block store for the TDB >> B+Trees. It worked quite well but as highly scalable base, it's limited as >> too much work ends up on the query engine and not enough of the indexing >> work access work is doe in the cluster. >> >> As for examples: see http://www.dydra.com/ which is SPARQL. >> >> Andy >> > > >