Re: documentation sprint

Sebastian Schaffert Fri, 29 Mar 2013 09:03:38 -0700

Hi Raffaele,

thanks for summarizing. :-)

I already started working on the architecture diagram. I'll include it once
it is ready (but I cannot promise it will be today).

The GeoNames import is currently still a module provided by the LMF,
because we thought it would not be so useful for most people. You are
however right that we should work more on how to import existing datasets
easily. The whole GeoNames import (with 140 million triples) on a decent
server with PostgreSQL (2x Quadcore = 8 cores, SSD disk, 24GB memory) took
3:40 hours (versioning turned on, which slows down the import by about 30%).

I have already an issue open on further improving this in a similar way to
Jena (i.e. by setting the triplestore into a maintenance mode and then
doing a dedicated batch import). I'll work on this when I have time. I
expect that under certain conditions the import time can be reduced by a
factor of about 10, because many things that slow down the import
performance currently are related to transactions and concurrency (I need
to make sure data is always consistent even under concurrent access, so
there are many checks and I cannot really batch SQL executions).

OTOH, I think most datasets are not really so big anyways, so no high
priority. Would just be nice to be able to offer a better and more reliable
DBPedia through Marmotta ... ;-)

Greetings,

Sebastian

2013/3/29 Raffaele Palmieri <raffaele.palmi...@gmail.com>

> Hi Sergio,
> I gave a look to site's documentation,the following links are incomplete:
>
>    - Apache Marmotta->Download Marmotta
>    - Apache Marmotta->Development->Development practices
>    - Apache Marmotta->Acknowledgements
>    - Platform->Introduction
>    - Platform->Core module
>    - Platform->LDCache module
>    - Platform->LDPath module
>    - Platform->Reasoner module
>    - Platform->SPARQL module
>    - Platform->User module
>    - Platform->Client library(broken link)
>    - Platform->Sesame tools(broken link)
>    - LDCache->Wrappers
>    - LDPath->Backends
>    - LDPath->Functions
>    - Wiki->Dependencies protocol and various modules and libraries
>
> There are in documentation again sparse references to LMF.
> I think that a picture showing architectural overview could be useful,
> showing also some possible applications of platform, as in the past it has
> been showed for LMF, when possible with some screencasts.
> For example, a couple of use cases could regard importing content from LOD
> using new linked data client modules(Youtube, Vimeo, Facebook, etc.) and
> retrieval of content, maybe using the integration of lmf-search.
> Regarding performance considerations, it could be useful showing how to
> import in parallel way data in Marmotta, for example showing how to perform
> Geonames import.
> Cheers,
> and Happy Easter to all of you!
> Raffaele.
>
>
> On 28 March 2013 12:17, Sergio Fernández <wik...@apache.org> wrote:
>
> > Hi all,
> >
> > in the meantime we get passed the vote for 3.0.0-incubating, we should
> > take a look to the public documentation we are providing through our web
> > site, staging at http://marmotta.staging.**apache.org<
> http://marmotta.staging.apache.org>
> >
> > That's why we have talked to do a documentation sprint early next week,
> to
> > have it ready for the actual publication of the release, whenever it will
> > happen...
> >
> > Usually for the people we are so deep into the code, it is not always
> easy
> > to see the deficiencies on the documentation. So I'd like to kindly ask
> all
> > of you (the farther from the code, the better) what are those missing
> > things you don't find in the documentation: what is not so clear, what is
> > missing, what you think should be possible but you can't find how, what
> > would need other kind of documentation (screencast or whatever), and so
> on.
> >
> > Thanks!
> >
> > Cheers,
> >
> > --
> > Sergio Fernández
> >
>

Re: documentation sprint

Reply via email to