Re: slowdown in Dataset.getDefaultModel (was Re: 3.1.0 vs 3.1.1-SNAPSHOT: slowdown in Model.listStatements(s, p, null, lang))

François-Paul Servant Mon, 24 Oct 2016 17:06:06 -0700

and when I remove the prefixes from my dataset, my app seems to run ~ as fast 
with 3.1.1-SNAPSHOT as with 3.1.0


happy fps

> Le 25 oct. 2016 à 01:47, François-Paul Servant 
> <[email protected]> a écrit :
> 
> Hi Andy,
> 
> I think that you got the cause of the problem;
> 
>> It looks like Graph::getPrefixes, which is called in the Model constructor, 
>> is being a bit expensive.
> 
> if I remove the prefixes from my dataset (the one I have very slow times 
> with), I then get the same times as with the datasets I created for the test, 
> which do not have prefixes defined (and, basically, the same times as the 
> ones returned by your test in execGraph(ds1, 1000 * 1000, false, true))
> 
> fps
> 
> 
> 
>> Le 24 oct. 2016 à 22:42, Andy Seaborne <[email protected]> a écrit :
>> 
>> Hi,
>> 
>> 
>> There is a relevant change - the model wrappers in DatasetImpl aren't cached 
>> any more (it broke with multiple threads and transactions).  A model and the 
>> graph get created each time.
>> 
>> So far ...
>> 
>> It looks like Graph::getPrefixes, which is called in the Model constructor, 
>> is being a bit expensive.
>> 
>>      Andy
>> 
>> package dev;
>> 
>> import org.apache.jena.graph.Graph;
>> import org.apache.jena.query.Dataset;
>> import org.apache.jena.query.ReadWrite;
>> import org.apache.jena.shared.PrefixMapping;
>> import org.apache.jena.sparql.core.DatasetGraph;
>> import org.apache.jena.tdb.TDBFactory;
>> 
>> public class ReportPerf {
>> 
>>   public static void main(String ...x) {
>>       Dataset ds1 = TDBFactory.createDataset();
>>       execGraph(ds1, 1000 * 1000, false, false);
>>       execGraph(ds1, 1000 * 1000, false, false);
>>       execGraph(ds1, 1000 * 1000, false, true);
>>       execGraph(ds1, 1000 * 1000, false, true);
>>       execGraph(ds1, 1000 * 1000, true, true);
>>       execGraph(ds1, 1000 * 1000, true, true);
>>   }
>> 
>>   private static void execGraph(Dataset ds, int n,
>>                                 boolean txn, boolean prefixes) {
>>       System.out.printf
>>   ("** Graph(n=%d,txn=%s,prefixes=%s)\n",n,txn,prefixes) ;
>>       DatasetGraph dsg = ds.asDatasetGraph() ;
>>       if ( txn )
>>           dsg.begin(ReadWrite.READ);
>> 
>>       long start = System.currentTimeMillis();
>>       for (int i = 0 ; i < n; i++) {
>>           Graph g = dsg.getDefaultGraph() ;
>>           if ( prefixes ) {
>>               PrefixMapping pmap = g.getPrefixMapping() ;
>>           }
>>       }
>>       long time = System.currentTimeMillis() - start;
>>       double usPerCall = (1000.0*time)/n ;
>>       System.out.printf
>>   ("%.3fus/call : %dms for %d calls\n", usPerCall, time, n) ;
>>       if ( txn )
>>           ds.end() ;
>>   }
>> }
>> 
>> 
>> 
>> 
>> On 24/10/16 16:25, François-Paul Servant wrote:
>>> Hi,
>>> 
>>> first, apologies for an inaccurate report. I was trying to understand why I 
>>> have some slowdowns in my app with 3.1.1-SNAPSHOT vs 3.1.0, and I had a bug 
>>> in my recording of durations: Model.listStatements seems to be out of 
>>> concerns.
>>> 
>>> However, it seems that there is something strange with 
>>> Dataset.getDefaultModel().
>>> 
>>> Find below a junit test that simply calls Dataset.getDefaultModel on a 
>>> given dataset. I’ve run it on my machine, from within eclipse, on 3 
>>> different TDB datasets (the one I’m using in my app, and 2 that I created 
>>> for the purpose of the test)
>>> 
>>> With 3.1.0, durations for Dataset.getDefaultModel doesn’t depend on the 
>>> tested dataset, and they are fast: 5ms to call getDefaultModel 1.000.000 
>>> times in a loop.
>>> 
>>> With 3.1.1 : in any case, times are not so good. With the 2 datasets 
>>> constructed for the purpose of the test times are similar:
>>>     between 1500 and 2000 ms to call getDefaultModel 1.000.000 times
>>> (both datasets have 5 named graphs. In one of them, there is 1 triple in 
>>> each named graph, and also 1 in the default graph. In the other dataset, 
>>> there are 10.000.000 triples in the default graph, and 1.000.000 in each of 
>>> the named graphs).
>>> 
>>> That’s significantly slower than with 3.1.0 (between 300 and 400 times).
>>> 
>>> What I find more annoying is the time that I get with my real dataset (30 
>>> millions triples in the default graph, 3 named graphs with a few millions 
>>> triples each):
>>> ~180.000 ms for 1.000.000 calls
>>> Now, this is rather slow…
>>> 
>>> I hope that I don’t make any mistake this time. You can check what you get 
>>> with you own datasets, or with the one that can be created for the purpose 
>>> of the test (just change the tdb dir names in the code)
>>> 
>>> Best,
>>> 
>>> fps
>>> 
>>> Here the junit test code, and the outputs that I get
>>> 
>>> —————————
>>> 
>>> package testperfs;
>>> 
>>> import java.io.File;
>>> import java.util.Iterator;
>>> 
>>> import org.apache.jena.query.Dataset;
>>> import org.apache.jena.rdf.model.Model;
>>> import org.apache.jena.rdf.model.ModelFactory;
>>> import org.apache.jena.rdf.model.Property;
>>> import org.apache.jena.rdf.model.Resource;
>>> import org.apache.jena.tdb.TDBFactory;
>>> import org.junit.Test;
>>> 
>>> public class GetDefaultModelPerfTest {
>>> 
>>> @Test
>>> public final void testGetDefaultModel() {
>>>     File dir = new File("/Users/fps/Desktop/jenatest");
>>>     doIt(dir);
>>> }
>>> 
>>> @Test
>>> public final void testGetDefaultModel_MyTDB() {
>>>     File dir = new File("/Users/fps/z-no-svg/Rasse-Data/bin/tdb");
>>>     doIt(dir);
>>> }
>>> 
>>> // @Test
>>> /** create and write to disk a new TDB store */
>>> public final void createDS() {
>>>     createDS(new File("/Users/fps/Desktop/jenatest"), 1, 5, 1);
>>>     // createDS(new File("/Users/fps/Desktop/jenatest"), 10*1000*1000, 5, 
>>> 1000*1000);
>>> }
>>> 
>>> private void doIt(File dir) {
>>>     Dataset ds = TDBFactory.createDataset(dir.getAbsolutePath());
>>>     // print infos about ds
>>>     dsInfos(ds);
>>>     test(ds, 1000*1000);
>>>     ds.close();
>>> }
>>> 
>>> /**
>>> * @param ds
>>> * @param n nb of calls
>>> */
>>> private void test(Dataset ds, int n) {
>>>     // warmup
>>>     System.out.println("warmup...");
>>>     for (int i = 0 ; i < 1000; i++) {
>>>             Model m = ds.getDefaultModel();
>>>     }
>>>     // doit
>>>     System.out.println("please wait...");
>>>     long start = System.currentTimeMillis();
>>>     for (int i = 0 ; i < n; i++) {
>>>             Model m = ds.getDefaultModel();
>>>     }
>>>     long time = System.currentTimeMillis() - start;
>>>     System.out.println(time + " ms for " + n + " calls");
>>> }
>>> 
>>> private void dsInfos(Dataset ds) {
>>>     Model m = ds.getDefaultModel();
>>>     System.out.println("Default model size: " + m.size());
>>>     Iterator<String> gns = ds.listNames();
>>>     for (;gns.hasNext();) {
>>>             String gn = gns.next();
>>>             System.out.println("Named graph " + gn + " size: " + 
>>> ds.getNamedModel(gn).size());
>>>     }
>>> }
>>> 
>>> //
>>> // to create a TDB store
>>> //
>>> 
>>> // create a TDB store with n triples in default model, kNamedGraphs named 
>>> graphs, and n2 triples in each named graph
>>> private void createDS(File dir, int n, int kNamedGraphs, int n2) {
>>>     if (dir.exists()) {
>>>             throw new RuntimeException("choose a non-existent dir");
>>>     }
>>>     boolean ok = dir.mkdirs();
>>>     if (!ok) throw new RuntimeException();
>>>     Dataset ds = TDBFactory.createDataset(dir.getAbsolutePath());
>>>     Model m = ds.getDefaultModel();
>>>     populateModel(m, n);
>>>     for (int i = 0 ; i < kNamedGraphs ; i++) {
>>>             m = ModelFactory.createDefaultModel();
>>>             populateModel(m, n2);
>>>             ds.addNamedModel("http://ex.com/g"; + Integer.toString(i), m);
>>>     }
>>>     ds.close();
>>> }
>>> 
>>> private void populateModel(Model m, int n) {
>>>     Property p = m.createProperty("http://ex.com/p";);
>>>     for (int i = 0; i < n; i++) {
>>>             Resource s = m.createResource("http://ex.com/s"; + 
>>> Integer.toString(i));
>>>             Resource o = m.createResource("http://ex.com/o"; + 
>>> Integer.toString(i));
>>>             m.add(s,p,o);
>>>     }
>>> }
>>> }
>>> 
>>> 
>>> —————————
>>> #
>>> # 311-SNAPSHOT
>>> #
>>> 
>>> # ds created with 3.1.1
>>> 
>>> Default model size: 1
>>> Named graph http://ex.com/g0 size: 1
>>> Named graph http://ex.com/g1 size: 1
>>> Named graph http://ex.com/g2 size: 1
>>> Named graph http://ex.com/g3 size: 1
>>> Named graph http://ex.com/g4 size: 1
>>> warmup...
>>> please wait...
>>> 1926 ms for 1000000 calls
>>> 
>>> # ds created with 3.1.1
>>> 
>>> Default model size: 10000000
>>> Named graph http://ex.com/g0 size: 1000000
>>> Named graph http://ex.com/g1 size: 1000000
>>> Named graph http://ex.com/g2 size: 1000000
>>> Named graph http://ex.com/g3 size: 1000000
>>> Named graph http://ex.com/g4 size: 1000000
>>> warmup...
>>> please wait...
>>> 1511 ms for 1000000 calls
>>> 
>>> # my TDB
>>> 
>>> Default model size: 31769838
>>> Named graph http://www.renault.com/rasse/docArboLeafindex size: 6843619
>>> Named graph http://www.renault.com/rasse/elementfamilyindex size: 2956557
>>> Named graph http://www.renault.com/rasse/graph/xappl size: 4271172
>>> warmup...
>>> please wait...
>>> 189920 ms for 1000000 calls
>>> 
>>> #
>>> # 3.1.0
>>> #
>>> 
>>> # ds created with 3.1.0
>>> 
>>> Default model size: 1
>>> Named graph http://ex.com/g0 size: 1
>>> Named graph http://ex.com/g1 size: 1
>>> Named graph http://ex.com/g2 size: 1
>>> Named graph http://ex.com/g3 size: 1
>>> Named graph http://ex.com/g4 size: 1
>>> warmup...
>>> please wait...
>>> 5 ms for 1000000 calls
>>> 
>>> # ds created with 3.1.1
>>> 
>>> Default model size: 10000000
>>> Named graph http://ex.com/g0 size: 1000000
>>> Named graph http://ex.com/g1 size: 1000000
>>> Named graph http://ex.com/g2 size: 1000000
>>> Named graph http://ex.com/g3 size: 1000000
>>> Named graph http://ex.com/g4 size: 1000000
>>> warmup...
>>> please wait...
>>> 5 ms for 1000000 calls
>>> 
>>> # my TDB
>>> 
>>> Default model size: 31769838
>>> Named graph http://www.renault.com/rasse/docArboLeafindex size: 6843619
>>> Named graph http://www.renault.com/rasse/elementfamilyindex size: 2956557
>>> Named graph http://www.renault.com/rasse/graph/xappl size: 4271172
>>> warmup...
>>> please wait...
>>> 5 ms for 1000000 calls
>>> 
>>> 
>>> 
>

Re: slowdown in Dataset.getDefaultModel (was Re: 3.1.0 vs 3.1.1-SNAPSHOT: slowdown in Model.listStatements(s, p, null, lang))

Reply via email to