Hi Plamen,

Apologies for the delay in responding to you.

Accumulo itself, the default underlying database of Rya, is known to scale to 
very large instances. Clusters of hundreds or thousands of nodes are common. 
Those store tens, even hundreds, of petabytes. A lot more than 100B triples 
would fit. Accumulo scales well as a warehouse of data.

Rya doesn't really do anything fancy that would limit the underlying 
scalability of Accumulo, as far as I can tell, but that really depends on the 
complexity of your use of RDFS and OWL ontology features, for instance. The 
limiting factor is probably two things:

  1.  MapReduce and/or Fluo is used to pre-compute answers to various indexes 
or optimization strategies. These frameworks are distributed across a network 
and read from disks, so are never going to be as fast as alternatives that 
reside on huge hardware nodes, in memory (if you have a machine with a few TB 
of RAM for instance).
  2.  Queries are performed through a single Tomcat instance, so some query 
results (for example complex joins) may need to fit into memory on that box. 
Simple queries stream though.

Hope this helps.

Brad


________________________________
From: Plamen Tarkalanov <p_tarkala...@abv.bg>
Sent: Wednesday, August 5, 2020 11:23 PM
To: dev@rya.apache.org <dev@rya.apache.org>
Subject: Benchmark question

Hello,

I am trying to find benchmarks for the database.
I was able to find only this paper from 2013: 
https://www.usna.edu/Users/cs/adina/research/Rya_ISjournal2013.pdf 
<https://www.usna.edu/Users/cs/adina/research/Rya_ISjournal2013.pdf> .
Do you have anything fresher?

On your site it says that the DB “query processing techniques that scale to 
billions of triples”.
Is the database supposed to handle more than 100 B of triples?
According to https://www.w3.org/wiki/LargeTripleStores 
<https://www.w3.org/wiki/LargeTripleStores> there are single node 
implementations which scale in the billions.

Thanks!
Plamen

Reply via email to