owlim-discussion  

Re: [Owlim-discussion] BigOWLIM configuration

Peter Kostelnik, PhD.
Thu, 27 May 2010 03:16:03 -0700

hi there, barry,

thats ok .. thanks for your support ..

btw, meanwhile, some new stuff arised :) .. there is some time problems
when BigOWLIM deletes the triples from specified context, e.g.
connection.clear(contextURI)

we are doing quite heavy post-processing above cca. 80-100 milions of
statements.. in our case, we are aggregating the snippets of data and
putting them together into specific contexts (to be able to retrieve them
fast) .. and in most cases, we have to replace the data in more contexts
.. logs of the process say, that removing the context data using
connection.clear(ctx) is fatally slow .. in some cases  it takes more that
hour(s) .. please, do you have some idea, how to deal with this?

thanks in advance, cheers,
                               Peter K.

> Hello Peter,
>
> Sorry for the delay in answering your email. We have had a number of
> urgent tasks and holidays in the last few days.
>
> Nevertheless, we will discuss your tuning requirements today and come up
> with some suggestions.
>
> Thanks for your patience,
> barry
>
> On 21/05/2010 16:03, Peter Kostelnik, PhD. wrote:
>> hi there ..
>>
>> I'd like to ask for the hints, how to configure BigOWLIM for the best
>> possible performance in run-time ..
>>
>> there is a couple of parameters which (I believe) can improve the
>> triple-store performance, so my question is, how to set them to achieve
>> the best ..
>>
>> our setup is as follows:
>> ---
>> approx. number of statements: for sure more than 100 millions (pls, how
>> affect the copying the same graphs of statements into more different
>> context? are they copied to as different squads or there is just the
>> kind
>> of association of same statements to the different context URIs?)
>>
>> we are about to use the RDFS reasoning, but in most of cases (in this
>> moment - in all cases) for now we don't use the reasoning at all yet..
>>
>> right now we don't use the rules
>>
>> we are firing the SPARQL queries, we are also using the
>> RepositoryConnection instances to retrieve the data ..
>>
>>
>> we are running the two different scenarios:
>> ---
>> scenario 1: off-line load of all data
>> this is done only once - the very first load of the data into
>> triplestore ..
>> we are using two instances of Repository, first for loading the data,
>> second for post-processing ..
>> first instance is initialized, all data are loaded into the triple-store
>> in a single transaction commited after the load is finished .. when
>> loading, the snippets of data are parsed and stored into more separate
>> contexts ..
>> after the loading is finished, there comes the huge post-processing ..
>> so
>> first repository is shutted down, and post-processing repository is
>> initialized .. it does the work and it is shutted down - data are in
>> triple-store ..
>> loaded data are moved to the production system (running as BigOWLIM
>> under
>> sesame server) ..
>>
>> scenario 2: run-time
>> ---
>> BigOWLIM is accessible remotely using HTTPRepository, handling the lots
>> of
>> queries and searches ..
>>
>> pls, do you have some hints, how to squeeze the best from BigOWLIM?
>> if talking about the RAM, can you, pls - if possible - draw possible
>> alternatives for several setups (<- just curious what effect has this
>> parameter (sure, I know - the more the best :) ))?
>>
>> thanks a lot in advance, cheers,
>>                                  Peter K.
>>
>>
>>
>> _______________________________________________
>> OWLIM-discussion mailing list
>> OWLIM-discussion@ontotext.com
>> http://ontotext.com/mailman/listinfo/owlim-discussion
>


_______________________________________________
OWLIM-discussion mailing list
OWLIM-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/owlim-discussion