Re: How to implement a custom JENA Backend

Milorad Tosic Wed, 25 Apr 2012 02:58:03 -0700

Hi,

This is an interesting topic, indeed. I haven't done any research on scalable 
RDF stores but I accidentally run onto http://www.systap.com/bigdata.htm that 
claims something like that.


My two cents ...

Milorad




>________________________________
> From: Tobias Neef <tob...@gmail.com>
>To: jena-users@incubator.apache.org 
>Sent: Wednesday, April 25, 2012 10:33 AM
>Subject: Re: How to implement a custom JENA Backend
> 
>Hi Andy,
>
>thanks a lot for your insight. I agree with you that there is no free
>cake. Most of the highly scalable stores have a very specific usage
>scenario which is there sweet spot. Also the qualities of such a
>service would depend on the mapping strategy you choose.
>
>> I have built a TDB that used Project Voldemort as a block store for the TDB
>> B+Trees.  It worked quite well but as highly scalable base, it's limited as
>> too much work ends up on the query engine and not enough of the indexing
>> work access work is doe in the cluster.
>
>Not sure if I quite get your point there. What do you mean with "and
>not enough of the indexing work access work is doe in the cluster"? Do
>you mean that this architecture would be most suited for a read /
>query intensive scenario rather than a frequent update one?
>
>Your approach seems to be similar to some recently published approach,
>which is the only research paper I have found in this area:
>http://www.edbt.org/Proceedings/2012-Berlin/papers/workshops/danac2012/a4-bugiotti.pdf.
>
>Is there a chance that I can take a look at your project code?
>
>I know http://www.dydra.com/ but they haven't published anything yet
>on how they manage their store. And the testing your can do seems to
>be rather limited due to the beta constraints the currently have.
>
>
>On Tue, Apr 24, 2012 at 8:56 PM, Andy Seaborne <a...@apache.org> wrote:
>> On 24/04/12 14:57, Paolo Castagna wrote:
>>>
>>> Tobias Neef wrote:
>>>>
>>>> Hi Paolo,
>>>>
>>>> thanks for the quick response! The reason for doing this is, because I
>>>> think it would be useful to have a RDF-Database with SPARQL-Interface
>>>> which can be used as a PAAS offering like Amazon RDS or Amazon Dynamo
>>>> DB: For the developer this would mean no hassle about replication, or
>>>> scaling etc. To some extend you can achieve that when using Jena SDB
>>>> on top of something like Amazon RDS or MS SQL Azure. I want to try how
>>>> far I can get when I use Jena as API and map it to something like
>>>> Dynamo DB or MS Azure Tables which have quite unique
>>>> Scalability/Availability characteristics. There is for example
>>>> http://datomic.com/ which also goes along those lines. They
>>>> implemented it on top of Dynamo DB but with a custom query language.
>>>>
>>>> Does that make sense from your perspective?
>>
>>
>> Hi Tobias,
>>
>> Interesting space and it would be great to have such a service.
>>
>> There are quite a few design choices to make and they can greatly influence
>> the desing.  For example: a service that offered replication etc and had
>> many datasets can be built using one dataset per machine as the unit.  It
>> scales in total data but not in data-per-dataset or graph.
>>
>> A service that specialised in massive data (more about data management than
>> raw query performance; maybe like a column store if aggregation queries
>> matter) if different from one giving as-near-real-time response for UIs
>> (basically, in-memory or the working set is in-memory).
>>
>> In terms of where to start,
>>
>> SDB if you are building on top of an SQL service
>>
>> TDB, or the shell of TDB, if you building on what amounts to a index
>> service.  TDB is built on top of indexes - you can plug in your own.
>>
>> I have built a TDB that used Project Voldemort as a block store for the TDB
>> B+Trees.  It worked quite well but as highly scalable base, it's limited as
>> too much work ends up on the query engine and not enough of the indexing
>> work access work is doe in the cluster.
>>
>> As for examples: see http://www.dydra.com/ which is SPARQL.
>>
>>        Andy
>>
>
>
>

Re: How to implement a custom JENA Backend

Reply via email to