Hi Sebastian, all,
rerunning test with enabled batch, it gives results in line with yours, times are further reduced by more than 50%. I think that at the current state of implementation that's the best improvementwhich could be done without having too many impacts. Regarding performance and storage capabilities, to try to get even better results other paths need to be explored(e.g. Jena, NoSql storage, etc.), but they require more impacts.
Cheers,
Raffaele.


On 06/18/2013 11:44 AM, Sebastian Schaffert wrote:
Dear all,

I worked on some ideas on improving the import/write performance in the
triple store. With the new implementation that is now in GIT (devel
branch), I was able to reduce the import time for larger datasets by
more than 50%. The new implementation uses for some databases
(PostgreSQL and MySQL) a batched insertion into the database. This means
that when storing nodes and triples, they are not immediately added to
the JDBC connection but rather cached in-memory. When either the
batch-size is reached (currently 1000 triples or nodes) or the
connection commits, the triples and nodes are written to the database
using JDBC/SQL batch operations.

Here are the figures (PostgreSQL, my machine, about 30k triples):

batch disabled: 83033ms
batch enabled (triples only): 70966ms
batch enabled (triples & nodes): 45308ms
batch enabled (triples & nodes, shared connection): 39495ms

Of course, the implementation with batching becomes more complex, so I
would like to ask you to try it out in different scenarios.
Particularly, I'd like to ask Raffaele if it improves the performance
benchmarks he has been doing. :-)

Greetings,

Sebastian


Reply via email to