Hi Andy,

I think that you got the cause of the problem;

> It looks like Graph::getPrefixes, which is called in the Model constructor, 
> is being a bit expensive.

if I remove the prefixes from my dataset (the one I have very slow times with), 
I then get the same times as with the datasets I created for the test, which do 
not have prefixes defined (and, basically, the same times as the ones returned 
by your test in execGraph(ds1, 1000 * 1000, false, true))

fps



> Le 24 oct. 2016 à 22:42, Andy Seaborne <[email protected]> a écrit :
> 
> Hi,
> 
> 
> There is a relevant change - the model wrappers in DatasetImpl aren't cached 
> any more (it broke with multiple threads and transactions).  A model and the 
> graph get created each time.
> 
> So far ...
> 
> It looks like Graph::getPrefixes, which is called in the Model constructor, 
> is being a bit expensive.
> 
>       Andy
> 
> package dev;
> 
> import org.apache.jena.graph.Graph;
> import org.apache.jena.query.Dataset;
> import org.apache.jena.query.ReadWrite;
> import org.apache.jena.shared.PrefixMapping;
> import org.apache.jena.sparql.core.DatasetGraph;
> import org.apache.jena.tdb.TDBFactory;
> 
> public class ReportPerf {
> 
>    public static void main(String ...x) {
>        Dataset ds1 = TDBFactory.createDataset();
>        execGraph(ds1, 1000 * 1000, false, false);
>        execGraph(ds1, 1000 * 1000, false, false);
>        execGraph(ds1, 1000 * 1000, false, true);
>        execGraph(ds1, 1000 * 1000, false, true);
>        execGraph(ds1, 1000 * 1000, true, true);
>        execGraph(ds1, 1000 * 1000, true, true);
>    }
> 
>    private static void execGraph(Dataset ds, int n,
>                                  boolean txn, boolean prefixes) {
>        System.out.printf
>    ("** Graph(n=%d,txn=%s,prefixes=%s)\n",n,txn,prefixes) ;
>        DatasetGraph dsg = ds.asDatasetGraph() ;
>        if ( txn )
>            dsg.begin(ReadWrite.READ);
> 
>        long start = System.currentTimeMillis();
>        for (int i = 0 ; i < n; i++) {
>            Graph g = dsg.getDefaultGraph() ;
>            if ( prefixes ) {
>                PrefixMapping pmap = g.getPrefixMapping() ;
>            }
>        }
>        long time = System.currentTimeMillis() - start;
>        double usPerCall = (1000.0*time)/n ;
>        System.out.printf
>    ("%.3fus/call : %dms for %d calls\n", usPerCall, time, n) ;
>        if ( txn )
>            ds.end() ;
>    }
> }
> 
> 
> 
> 
> On 24/10/16 16:25, François-Paul Servant wrote:
>> Hi,
>> 
>> first, apologies for an inaccurate report. I was trying to understand why I 
>> have some slowdowns in my app with 3.1.1-SNAPSHOT vs 3.1.0, and I had a bug 
>> in my recording of durations: Model.listStatements seems to be out of 
>> concerns.
>> 
>> However, it seems that there is something strange with 
>> Dataset.getDefaultModel().
>> 
>> Find below a junit test that simply calls Dataset.getDefaultModel on a given 
>> dataset. I’ve run it on my machine, from within eclipse, on 3 different TDB 
>> datasets (the one I’m using in my app, and 2 that I created for the purpose 
>> of the test)
>> 
>> With 3.1.0, durations for Dataset.getDefaultModel doesn’t depend on the 
>> tested dataset, and they are fast: 5ms to call getDefaultModel 1.000.000 
>> times in a loop.
>> 
>> With 3.1.1 : in any case, times are not so good. With the 2 datasets 
>> constructed for the purpose of the test times are similar:
>>      between 1500 and 2000 ms to call getDefaultModel 1.000.000 times
>> (both datasets have 5 named graphs. In one of them, there is 1 triple in 
>> each named graph, and also 1 in the default graph. In the other dataset, 
>> there are 10.000.000 triples in the default graph, and 1.000.000 in each of 
>> the named graphs).
>> 
>> That’s significantly slower than with 3.1.0 (between 300 and 400 times).
>> 
>> What I find more annoying is the time that I get with my real dataset (30 
>> millions triples in the default graph, 3 named graphs with a few millions 
>> triples each):
>> ~180.000 ms for 1.000.000 calls
>> Now, this is rather slow…
>> 
>> I hope that I don’t make any mistake this time. You can check what you get 
>> with you own datasets, or with the one that can be created for the purpose 
>> of the test (just change the tdb dir names in the code)
>> 
>> Best,
>> 
>> fps
>> 
>> Here the junit test code, and the outputs that I get
>> 
>> —————————
>> 
>> package testperfs;
>> 
>> import java.io.File;
>> import java.util.Iterator;
>> 
>> import org.apache.jena.query.Dataset;
>> import org.apache.jena.rdf.model.Model;
>> import org.apache.jena.rdf.model.ModelFactory;
>> import org.apache.jena.rdf.model.Property;
>> import org.apache.jena.rdf.model.Resource;
>> import org.apache.jena.tdb.TDBFactory;
>> import org.junit.Test;
>> 
>> public class GetDefaultModelPerfTest {
>> 
>> @Test
>> public final void testGetDefaultModel() {
>>      File dir = new File("/Users/fps/Desktop/jenatest");
>>      doIt(dir);
>> }
>> 
>> @Test
>> public final void testGetDefaultModel_MyTDB() {
>>      File dir = new File("/Users/fps/z-no-svg/Rasse-Data/bin/tdb");
>>      doIt(dir);
>> }
>> 
>> // @Test
>> /** create and write to disk a new TDB store */
>> public final void createDS() {
>>      createDS(new File("/Users/fps/Desktop/jenatest"), 1, 5, 1);
>>      // createDS(new File("/Users/fps/Desktop/jenatest"), 10*1000*1000, 5, 
>> 1000*1000);
>> }
>> 
>> private void doIt(File dir) {
>>      Dataset ds = TDBFactory.createDataset(dir.getAbsolutePath());
>>      // print infos about ds
>>      dsInfos(ds);
>>      test(ds, 1000*1000);
>>      ds.close();
>> }
>> 
>> /**
>> * @param ds
>> * @param n nb of calls
>> */
>> private void test(Dataset ds, int n) {
>>      // warmup
>>      System.out.println("warmup...");
>>      for (int i = 0 ; i < 1000; i++) {
>>              Model m = ds.getDefaultModel();
>>      }
>>      // doit
>>      System.out.println("please wait...");
>>      long start = System.currentTimeMillis();
>>      for (int i = 0 ; i < n; i++) {
>>              Model m = ds.getDefaultModel();
>>      }
>>      long time = System.currentTimeMillis() - start;
>>      System.out.println(time + " ms for " + n + " calls");
>> }
>> 
>> private void dsInfos(Dataset ds) {
>>      Model m = ds.getDefaultModel();
>>      System.out.println("Default model size: " + m.size());
>>      Iterator<String> gns = ds.listNames();
>>      for (;gns.hasNext();) {
>>              String gn = gns.next();
>>              System.out.println("Named graph " + gn + " size: " + 
>> ds.getNamedModel(gn).size());
>>      }
>> }
>> 
>> //
>> // to create a TDB store
>> //
>> 
>> // create a TDB store with n triples in default model, kNamedGraphs named 
>> graphs, and n2 triples in each named graph
>> private void createDS(File dir, int n, int kNamedGraphs, int n2) {
>>      if (dir.exists()) {
>>              throw new RuntimeException("choose a non-existent dir");
>>      }
>>      boolean ok = dir.mkdirs();
>>      if (!ok) throw new RuntimeException();
>>      Dataset ds = TDBFactory.createDataset(dir.getAbsolutePath());
>>      Model m = ds.getDefaultModel();
>>      populateModel(m, n);
>>      for (int i = 0 ; i < kNamedGraphs ; i++) {
>>              m = ModelFactory.createDefaultModel();
>>              populateModel(m, n2);
>>              ds.addNamedModel("http://ex.com/g"; + Integer.toString(i), m);
>>      }
>>      ds.close();
>> }
>> 
>> private void populateModel(Model m, int n) {
>>      Property p = m.createProperty("http://ex.com/p";);
>>      for (int i = 0; i < n; i++) {
>>              Resource s = m.createResource("http://ex.com/s"; + 
>> Integer.toString(i));
>>              Resource o = m.createResource("http://ex.com/o"; + 
>> Integer.toString(i));
>>              m.add(s,p,o);
>>      }
>> }
>> }
>> 
>> 
>> —————————
>> #
>> # 311-SNAPSHOT
>> #
>> 
>> # ds created with 3.1.1
>> 
>> Default model size: 1
>> Named graph http://ex.com/g0 size: 1
>> Named graph http://ex.com/g1 size: 1
>> Named graph http://ex.com/g2 size: 1
>> Named graph http://ex.com/g3 size: 1
>> Named graph http://ex.com/g4 size: 1
>> warmup...
>> please wait...
>> 1926 ms for 1000000 calls
>> 
>> # ds created with 3.1.1
>> 
>> Default model size: 10000000
>> Named graph http://ex.com/g0 size: 1000000
>> Named graph http://ex.com/g1 size: 1000000
>> Named graph http://ex.com/g2 size: 1000000
>> Named graph http://ex.com/g3 size: 1000000
>> Named graph http://ex.com/g4 size: 1000000
>> warmup...
>> please wait...
>> 1511 ms for 1000000 calls
>> 
>> # my TDB
>> 
>> Default model size: 31769838
>> Named graph http://www.renault.com/rasse/docArboLeafindex size: 6843619
>> Named graph http://www.renault.com/rasse/elementfamilyindex size: 2956557
>> Named graph http://www.renault.com/rasse/graph/xappl size: 4271172
>> warmup...
>> please wait...
>> 189920 ms for 1000000 calls
>> 
>> #
>> # 3.1.0
>> #
>> 
>> # ds created with 3.1.0
>> 
>> Default model size: 1
>> Named graph http://ex.com/g0 size: 1
>> Named graph http://ex.com/g1 size: 1
>> Named graph http://ex.com/g2 size: 1
>> Named graph http://ex.com/g3 size: 1
>> Named graph http://ex.com/g4 size: 1
>> warmup...
>> please wait...
>> 5 ms for 1000000 calls
>> 
>> # ds created with 3.1.1
>> 
>> Default model size: 10000000
>> Named graph http://ex.com/g0 size: 1000000
>> Named graph http://ex.com/g1 size: 1000000
>> Named graph http://ex.com/g2 size: 1000000
>> Named graph http://ex.com/g3 size: 1000000
>> Named graph http://ex.com/g4 size: 1000000
>> warmup...
>> please wait...
>> 5 ms for 1000000 calls
>> 
>> # my TDB
>> 
>> Default model size: 31769838
>> Named graph http://www.renault.com/rasse/docArboLeafindex size: 6843619
>> Named graph http://www.renault.com/rasse/elementfamilyindex size: 2956557
>> Named graph http://www.renault.com/rasse/graph/xappl size: 4271172
>> warmup...
>> please wait...
>> 5 ms for 1000000 calls
>> 
>> 
>> 

Reply via email to