Von meinem iPad gesendet
Am 05.04.2012 um 12:59 schrieb Reto Bachmann-Gmür <[email protected]>: > Hi Rupert, > > I like your proposal but would suggest: > - SingleDatasetTdbTcProvider should not need a directory configured No problem with that. One can use the configuration policy OPTIONAL, than OSGI will create a default instance with the default directory while it would still be possible for users to creat additional instances with manually configured directories. WDYT > - SingleDatasetTdbTcProvider should have the higher weight and thus be the > one used by default That's fine with me. > I think there might be usecases were you want an graph to be isolated from > the rests, but I think the default behaviour should be the more perfomant > and less memory expensive SingleDatasetTdbTcProvider. > > We could add a tool to clerezza that allows creating an mgraph in a > tc-provider other than the one with the highest weight. > +1 best Rupert > Cheers, > Reto > > On Fri, Mar 16, 2012 at 2:10 PM, Rupert Westenthaler < > [email protected]> wrote: > >> Hi David, stanbol & clerezza community >> >> Short summary of the situation: >> >> The Ontonet component generate a lot of MGraphs using the Jena TDB >> provider. This causes the disc consumption and number of open files to >> explode. See the quoted emails for details >> >> >> @Stanbol we are already discussion how to avoid the creation of such many >> graphs >> >> >> @Clerezza the observed behavior of the TDB provider is also very dangerous >> (at least for typical use cases in Apache Stanbol). >> >> Even targeting at a different CLEREZZA-467 maybe provides a possible >> solution for that as it suggests to use named graphs instead of isolated >> TDB instances for creating MGraphs. >> >> To be honest this would be the optimal solution for our usages of Clerezza >> in Stanbol. However I assume that for a semantic CMS it is saver to use >> different TDB datasets. >> >> Because of that I would like to make the following proposal that >> hopefully covers both the needs of Apache Stanbol and Apache Clerezza. >> >> 1. AbstractTdbTcProvider: providing most of the functionality needed to >> store Clerezza MGraphs in Jena TDB >> >> 2. TdbTcProvider: The same as now but now extending the abstract one. I >> follows the currently used methodology to map Clerezza graphs to separate >> TDB datasets >> >> 3. SingleDatasetTdbTcProvider: Tdb provider variant that stores all >> MGraphs in a single TDB dataset. This provider should also support >> "configurationFactory=true" (multiple instances). each instance would use a >> different TDB dataset to store its MGrpahs. >> >> By default the SingleDatasetTdbTcProvider would be inactive, because it >> requires a configuration of the directory for the TDB dataset as well as a >> name (that can be used in Filters). This ensures full backward >> compatibility. >> >> In environment - such as Stanbol - where you want to store multiple graphs >> in the same TDB dataset you would need to provide a configuration for the >> SingleDatasetTdbTcProvider. Here you have two possible usage scenarios: >> >> * if you just need a single TDB dataset that stores all MGraphs, than you >> can assign a high enough service.ranking to the SingleDatasetTdbTcProvider >> and normally use the TcManager to create your graphs. >> * if you want to use single TDB datasets or a mix of the TdbTcProvider and >> SingleDatasetTdbTcProvider's you will need to use according filters. >> >> >> WDYT >> Rupert >> >> >> [1] https://issues.apache.org/jira/browse/CLEREZZA-467 >> >> On 16.03.2012, at 10:44, Rupert Westenthaler wrote: >> >>> Hi David, all >>> >>> this could be the explanation for the failed build on the Jenkins server >> when the SEO configuration for the Refactor engine was used in the default >> configuration of the Full launcher >>> >>> see http://markmail.org/message/sprwklaobdjankig for details. >>> >>> For me that looks like as if the RefactorEngine does create multiple >> Jena TDB instances for various created MGraphs. One needs to know the even >> for an empty graph Jena TDB creates ~200MByte of index files. So it is >> important to map multiple MGraphs to different named graphs of the same >> Jena TDB store. >>> >>> I have no Idea how Clerezza manages this or how Ontonet creates MGraphs, >> but I hope this can help in tracing this down. >>> >>> best >>> Rupert >>> >>> On 16.03.2012, at 10:30, David Riccitelli wrote: >>> >>>> Dears, >>>> >>>> As I ran into disk issues, I found that this folder: >>>> sling/felix/bundleXXX/data/tdb-data/mgraph >>>> >>>> where XX is the bundle of: >>>> Clerezza - SCB Jena TDB Storage Provider >>>> org.apache.clerezza.rdf.jena.tdb.storage >>>> >>>> took almost 70 gbytes of disk space (then the disk space has been >>>> exhausted). >>>> >>>> These are some of the files I found inside: >>>> 193M ./ontonet%3A%3Ainputstream%3Aontology889 >>>> 193M ./ontonet%3A%3Ainputstream%3Aontology1041 >>>> 193M ./ontonet%3A%3Ainputstream%3Aontology395 >>>> 193M ./ontonet%3A%3Ainputstream%3Aontology363 >>>> 193M ./ontonet%3A%3Ainputstream%3Aontology661 >>>> 193M ./ontonet%3A%3Ainputstream%3Aontology786 >>>> 193M ./ontonet%3A%3Ainputstream%3Aontology608 >>>> 193M ./ontonet%3A%3Ainputstream%3Aontology213 >>>> 193M ./ontonet%3A%3Ainputstream%3Aontology188 >>>> 193M ./ontonet%3A%3Ainputstream%3Aontology602 >>>> >>>> >>>> Any clues? >>>> >>>> Thanks, >>>> David Riccitelli >>>> >>>> >> ******************************************************************************** >>>> InsideOut10 s.r.l. >>>> P.IVA: IT-11381771002 >>>> Fax: +39 0110708239 >>>> --- >>>> LinkedIn: http://it.linkedin.com/in/riccitelli >>>> Twitter: ziodave >>>> --- >>>> Layar Partner Network< >> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1 >>> >>>> >> ******************************************************************************** >>> >> >>
