Re: Bulk load of triples on DB

Raffaele Palmieri Mon, 27 May 2013 08:34:56 -0700

Il giorno venerdì 24 maggio 2013, Sergio Fernández ha scritto:

> Hi,
>
> IMHO just switch to Jena SDB is not a right idea, but porting some ideas
> to KiWi triple store would be nice.



Yes, i agree with you in consequence also of created issues Marmotta-85/89;
Jena SDB would be only a possibile option, not the only backend.


>
> In parallel, when we switched to a pure Sesame-backend, the idea for the
> mid-long term was to be able to run Marmotta on top of other triple stores.
> I particularly would like to be able to use Jena TDB. But there are still
> many part of Marmotta that should be refactored for allowing such.
>
>
So, how do we think to approach this refactoring? Can we identify the parts
that need refactoring or is it premature and so do we focus on other issues?



> Cheers,
>

Regards,
Raffaele.


>
>
> On 23/05/13 15:03, Sebastian Schaffert wrote:
>
> Hi Raffaele,
>
> the idea was anyways to allow different backends besides KiWi, because each
> has its advantages and disadvantages (KiWi's advantages are the versioning
> and the reasoner). The issue is documented under
>
> https://issues.apache.org/**jira/browse/MARMOTTA-85<https://issues.apache.org/jira/browse/MARMOTTA-85>
>
> and the individual backends have subsequent numbers. See e.g.
>
> https://issues.apache.org/**jira/browse/MARMOTTA-89<https://issues.apache.org/jira/browse/MARMOTTA-89>
>
> for the SDB backend implementation.
>
> Changing backends is currently not possible, but it is foreseen in the
> architecture and it would take me about one day of work to change the
> platform in a way that other backends can be used. The main change will be
> in the SesameServiceImpl which sets up the underlying triple store. The
> initialisation method for this service stacks together different sails
> depending on the configuration and is already very modular. The only thing
> that is currently hardcoded there is the initialisation of a new KiWiStore,
> but in principle it could be any Sesame Sail.
>
> But there are some consequences and dependencies, e.g. the
> marmotta-versioning and marmotta-reasoner modules cannot be used if the
> backend is not KiWi, and I need to find a clean way to model these
> dependencies (Maven is unfortunately probably not enough, because several
> backends could be on the classpath and only one backend selected - on the
> other hand we could simply create different backend configurations in Maven
> that only include one backend to be used - we will see).
>
> If you want to try with SDB and TDB, the first step would be to implement a
> clean wrapper that allows accessing Jena through the Sesame SAIL API. Peter
> Ansell has already worked on such adapters:
>
> https://github.com/ansell/**JenaSesame<https://github.com/ansell/JenaSesame>
>
> Maybe this would be a good starting point. I will in parallel try to work
> on modularizing the backends. Not sure when I will be able to finish this,
> because other things are currently on my priority list...
>
> Greetings,
>
> Sebastian
>
>
> 2013/5/23 Raffaele Palmieri <raffaele.palmi...@gmail.com>
>
>  Hi Sebastian, below are some considerations that induce me to think that
> Jena SDB(or TDB) could be a better solution, but I understand that's a big
> impact on codebase, and so I would go cautious.
>
> On 23 May 2013 12:20, Sebastian Schaffert <sebastian.schaff...@gmail.com
>
> wrote:
>
>
>  Hi Raffaele,
>
>
> 2013/5/22 Raffaele Palmieri <raffaele.palmi...@gmail.com>
>
>  On 22 May 2013 15:04, Andy Seaborne <a...@apache.org> wrote:
>
>  What is the current loading rate?
>
>
> Tried a test with a graph of 661 nodes and 957 triples: it took about
>
> 18
>
> sec. So, looking the triples the medium rate is 18.8 ms per triple;
>
> tested
>
> on Tomcat with maximum size of 1.5 Gb.
>
>
>  This is a bit too small for a real test, because you will have a high
> influence of side effects (like cache initialisation). I have done some
> performance comparisons with importing about 10% of GeoNames (about 15
> million triples, 1.5 million resources). The test uses a specialised
> parallel importer that was configured to run 8 importing threads in
> parallel. Here are some figures on different hardware:
> *- VMWare, 4CPU, 6GB RAM, HDD: 4:20h (avg per 100 resources: 10-13
>
> seconds,
>
> 8 in parallel). In case of VMWare, the CPU is waiting most of the time
>
> for
>
> --
> Sergio Fernández
>

Re: Bulk load of triples on DB

Reply via email to