Re: How to implement a custom JENA Backend

Tobias Neef Wed, 25 Apr 2012 01:34:02 -0700

Hi Andy,

thanks a lot for your insight. I agree with you that there is no free
cake. Most of the highly scalable stores have a very specific usage
scenario which is there sweet spot. Also the qualities of such a
service would depend on the mapping strategy you choose.


> I have built a TDB that used Project Voldemort as a block store for the TDB
> B+Trees.  It worked quite well but as highly scalable base, it's limited as
> too much work ends up on the query engine and not enough of the indexing
> work access work is doe in the cluster.

Not sure if I quite get your point there. What do you mean with "and
not enough of the indexing work access work is doe in the cluster"? Do
you mean that this architecture would be most suited for a read /
query intensive scenario rather than a frequent update one?

Your approach seems to be similar to some recently published approach,
which is the only research paper I have found in this area:
http://www.edbt.org/Proceedings/2012-Berlin/papers/workshops/danac2012/a4-bugiotti.pdf.

Is there a chance that I can take a look at your project code?

I know http://www.dydra.com/ but they haven't published anything yet
on how they manage their store. And the testing your can do seems to
be rather limited due to the beta constraints the currently have.


On Tue, Apr 24, 2012 at 8:56 PM, Andy Seaborne <a...@apache.org> wrote:
> On 24/04/12 14:57, Paolo Castagna wrote:
>>
>> Tobias Neef wrote:
>>>
>>> Hi Paolo,
>>>
>>> thanks for the quick response! The reason for doing this is, because I
>>> think it would be useful to have a RDF-Database with SPARQL-Interface
>>> which can be used as a PAAS offering like Amazon RDS or Amazon Dynamo
>>> DB: For the developer this would mean no hassle about replication, or
>>> scaling etc. To some extend you can achieve that when using Jena SDB
>>> on top of something like Amazon RDS or MS SQL Azure. I want to try how
>>> far I can get when I use Jena as API and map it to something like
>>> Dynamo DB or MS Azure Tables which have quite unique
>>> Scalability/Availability characteristics. There is for example
>>> http://datomic.com/ which also goes along those lines. They
>>> implemented it on top of Dynamo DB but with a custom query language.
>>>
>>> Does that make sense from your perspective?
>
>
> Hi Tobias,
>
> Interesting space and it would be great to have such a service.
>
> There are quite a few design choices to make and they can greatly influence
> the desing.  For example: a service that offered replication etc and had
> many datasets can be built using one dataset per machine as the unit.  It
> scales in total data but not in data-per-dataset or graph.
>
> A service that specialised in massive data (more about data management than
> raw query performance; maybe like a column store if aggregation queries
> matter) if different from one giving as-near-real-time response for UIs
> (basically, in-memory or the working set is in-memory).
>
> In terms of where to start,
>
> SDB if you are building on top of an SQL service
>
> TDB, or the shell of TDB, if you building on what amounts to a index
> service.  TDB is built on top of indexes - you can plug in your own.
>
> I have built a TDB that used Project Voldemort as a block store for the TDB
> B+Trees.  It worked quite well but as highly scalable base, it's limited as
> too much work ends up on the query engine and not enough of the indexing
> work access work is doe in the cluster.
>
> As for examples: see http://www.dydra.com/ which is SPARQL.
>
>        Andy
>

Re: How to implement a custom JENA Backend

Reply via email to