and when I remove the prefixes from my dataset, my app seems to run ~ as fast with 3.1.1-SNAPSHOT as with 3.1.0
happy fps > Le 25 oct. 2016 à 01:47, François-Paul Servant > <[email protected]> a écrit : > > Hi Andy, > > I think that you got the cause of the problem; > >> It looks like Graph::getPrefixes, which is called in the Model constructor, >> is being a bit expensive. > > if I remove the prefixes from my dataset (the one I have very slow times > with), I then get the same times as with the datasets I created for the test, > which do not have prefixes defined (and, basically, the same times as the > ones returned by your test in execGraph(ds1, 1000 * 1000, false, true)) > > fps > > > >> Le 24 oct. 2016 à 22:42, Andy Seaborne <[email protected]> a écrit : >> >> Hi, >> >> >> There is a relevant change - the model wrappers in DatasetImpl aren't cached >> any more (it broke with multiple threads and transactions). A model and the >> graph get created each time. >> >> So far ... >> >> It looks like Graph::getPrefixes, which is called in the Model constructor, >> is being a bit expensive. >> >> Andy >> >> package dev; >> >> import org.apache.jena.graph.Graph; >> import org.apache.jena.query.Dataset; >> import org.apache.jena.query.ReadWrite; >> import org.apache.jena.shared.PrefixMapping; >> import org.apache.jena.sparql.core.DatasetGraph; >> import org.apache.jena.tdb.TDBFactory; >> >> public class ReportPerf { >> >> public static void main(String ...x) { >> Dataset ds1 = TDBFactory.createDataset(); >> execGraph(ds1, 1000 * 1000, false, false); >> execGraph(ds1, 1000 * 1000, false, false); >> execGraph(ds1, 1000 * 1000, false, true); >> execGraph(ds1, 1000 * 1000, false, true); >> execGraph(ds1, 1000 * 1000, true, true); >> execGraph(ds1, 1000 * 1000, true, true); >> } >> >> private static void execGraph(Dataset ds, int n, >> boolean txn, boolean prefixes) { >> System.out.printf >> ("** Graph(n=%d,txn=%s,prefixes=%s)\n",n,txn,prefixes) ; >> DatasetGraph dsg = ds.asDatasetGraph() ; >> if ( txn ) >> dsg.begin(ReadWrite.READ); >> >> long start = System.currentTimeMillis(); >> for (int i = 0 ; i < n; i++) { >> Graph g = dsg.getDefaultGraph() ; >> if ( prefixes ) { >> PrefixMapping pmap = g.getPrefixMapping() ; >> } >> } >> long time = System.currentTimeMillis() - start; >> double usPerCall = (1000.0*time)/n ; >> System.out.printf >> ("%.3fus/call : %dms for %d calls\n", usPerCall, time, n) ; >> if ( txn ) >> ds.end() ; >> } >> } >> >> >> >> >> On 24/10/16 16:25, François-Paul Servant wrote: >>> Hi, >>> >>> first, apologies for an inaccurate report. I was trying to understand why I >>> have some slowdowns in my app with 3.1.1-SNAPSHOT vs 3.1.0, and I had a bug >>> in my recording of durations: Model.listStatements seems to be out of >>> concerns. >>> >>> However, it seems that there is something strange with >>> Dataset.getDefaultModel(). >>> >>> Find below a junit test that simply calls Dataset.getDefaultModel on a >>> given dataset. I’ve run it on my machine, from within eclipse, on 3 >>> different TDB datasets (the one I’m using in my app, and 2 that I created >>> for the purpose of the test) >>> >>> With 3.1.0, durations for Dataset.getDefaultModel doesn’t depend on the >>> tested dataset, and they are fast: 5ms to call getDefaultModel 1.000.000 >>> times in a loop. >>> >>> With 3.1.1 : in any case, times are not so good. With the 2 datasets >>> constructed for the purpose of the test times are similar: >>> between 1500 and 2000 ms to call getDefaultModel 1.000.000 times >>> (both datasets have 5 named graphs. In one of them, there is 1 triple in >>> each named graph, and also 1 in the default graph. In the other dataset, >>> there are 10.000.000 triples in the default graph, and 1.000.000 in each of >>> the named graphs). >>> >>> That’s significantly slower than with 3.1.0 (between 300 and 400 times). >>> >>> What I find more annoying is the time that I get with my real dataset (30 >>> millions triples in the default graph, 3 named graphs with a few millions >>> triples each): >>> ~180.000 ms for 1.000.000 calls >>> Now, this is rather slow… >>> >>> I hope that I don’t make any mistake this time. You can check what you get >>> with you own datasets, or with the one that can be created for the purpose >>> of the test (just change the tdb dir names in the code) >>> >>> Best, >>> >>> fps >>> >>> Here the junit test code, and the outputs that I get >>> >>> ————————— >>> >>> package testperfs; >>> >>> import java.io.File; >>> import java.util.Iterator; >>> >>> import org.apache.jena.query.Dataset; >>> import org.apache.jena.rdf.model.Model; >>> import org.apache.jena.rdf.model.ModelFactory; >>> import org.apache.jena.rdf.model.Property; >>> import org.apache.jena.rdf.model.Resource; >>> import org.apache.jena.tdb.TDBFactory; >>> import org.junit.Test; >>> >>> public class GetDefaultModelPerfTest { >>> >>> @Test >>> public final void testGetDefaultModel() { >>> File dir = new File("/Users/fps/Desktop/jenatest"); >>> doIt(dir); >>> } >>> >>> @Test >>> public final void testGetDefaultModel_MyTDB() { >>> File dir = new File("/Users/fps/z-no-svg/Rasse-Data/bin/tdb"); >>> doIt(dir); >>> } >>> >>> // @Test >>> /** create and write to disk a new TDB store */ >>> public final void createDS() { >>> createDS(new File("/Users/fps/Desktop/jenatest"), 1, 5, 1); >>> // createDS(new File("/Users/fps/Desktop/jenatest"), 10*1000*1000, 5, >>> 1000*1000); >>> } >>> >>> private void doIt(File dir) { >>> Dataset ds = TDBFactory.createDataset(dir.getAbsolutePath()); >>> // print infos about ds >>> dsInfos(ds); >>> test(ds, 1000*1000); >>> ds.close(); >>> } >>> >>> /** >>> * @param ds >>> * @param n nb of calls >>> */ >>> private void test(Dataset ds, int n) { >>> // warmup >>> System.out.println("warmup..."); >>> for (int i = 0 ; i < 1000; i++) { >>> Model m = ds.getDefaultModel(); >>> } >>> // doit >>> System.out.println("please wait..."); >>> long start = System.currentTimeMillis(); >>> for (int i = 0 ; i < n; i++) { >>> Model m = ds.getDefaultModel(); >>> } >>> long time = System.currentTimeMillis() - start; >>> System.out.println(time + " ms for " + n + " calls"); >>> } >>> >>> private void dsInfos(Dataset ds) { >>> Model m = ds.getDefaultModel(); >>> System.out.println("Default model size: " + m.size()); >>> Iterator<String> gns = ds.listNames(); >>> for (;gns.hasNext();) { >>> String gn = gns.next(); >>> System.out.println("Named graph " + gn + " size: " + >>> ds.getNamedModel(gn).size()); >>> } >>> } >>> >>> // >>> // to create a TDB store >>> // >>> >>> // create a TDB store with n triples in default model, kNamedGraphs named >>> graphs, and n2 triples in each named graph >>> private void createDS(File dir, int n, int kNamedGraphs, int n2) { >>> if (dir.exists()) { >>> throw new RuntimeException("choose a non-existent dir"); >>> } >>> boolean ok = dir.mkdirs(); >>> if (!ok) throw new RuntimeException(); >>> Dataset ds = TDBFactory.createDataset(dir.getAbsolutePath()); >>> Model m = ds.getDefaultModel(); >>> populateModel(m, n); >>> for (int i = 0 ; i < kNamedGraphs ; i++) { >>> m = ModelFactory.createDefaultModel(); >>> populateModel(m, n2); >>> ds.addNamedModel("http://ex.com/g" + Integer.toString(i), m); >>> } >>> ds.close(); >>> } >>> >>> private void populateModel(Model m, int n) { >>> Property p = m.createProperty("http://ex.com/p"); >>> for (int i = 0; i < n; i++) { >>> Resource s = m.createResource("http://ex.com/s" + >>> Integer.toString(i)); >>> Resource o = m.createResource("http://ex.com/o" + >>> Integer.toString(i)); >>> m.add(s,p,o); >>> } >>> } >>> } >>> >>> >>> ————————— >>> # >>> # 311-SNAPSHOT >>> # >>> >>> # ds created with 3.1.1 >>> >>> Default model size: 1 >>> Named graph http://ex.com/g0 size: 1 >>> Named graph http://ex.com/g1 size: 1 >>> Named graph http://ex.com/g2 size: 1 >>> Named graph http://ex.com/g3 size: 1 >>> Named graph http://ex.com/g4 size: 1 >>> warmup... >>> please wait... >>> 1926 ms for 1000000 calls >>> >>> # ds created with 3.1.1 >>> >>> Default model size: 10000000 >>> Named graph http://ex.com/g0 size: 1000000 >>> Named graph http://ex.com/g1 size: 1000000 >>> Named graph http://ex.com/g2 size: 1000000 >>> Named graph http://ex.com/g3 size: 1000000 >>> Named graph http://ex.com/g4 size: 1000000 >>> warmup... >>> please wait... >>> 1511 ms for 1000000 calls >>> >>> # my TDB >>> >>> Default model size: 31769838 >>> Named graph http://www.renault.com/rasse/docArboLeafindex size: 6843619 >>> Named graph http://www.renault.com/rasse/elementfamilyindex size: 2956557 >>> Named graph http://www.renault.com/rasse/graph/xappl size: 4271172 >>> warmup... >>> please wait... >>> 189920 ms for 1000000 calls >>> >>> # >>> # 3.1.0 >>> # >>> >>> # ds created with 3.1.0 >>> >>> Default model size: 1 >>> Named graph http://ex.com/g0 size: 1 >>> Named graph http://ex.com/g1 size: 1 >>> Named graph http://ex.com/g2 size: 1 >>> Named graph http://ex.com/g3 size: 1 >>> Named graph http://ex.com/g4 size: 1 >>> warmup... >>> please wait... >>> 5 ms for 1000000 calls >>> >>> # ds created with 3.1.1 >>> >>> Default model size: 10000000 >>> Named graph http://ex.com/g0 size: 1000000 >>> Named graph http://ex.com/g1 size: 1000000 >>> Named graph http://ex.com/g2 size: 1000000 >>> Named graph http://ex.com/g3 size: 1000000 >>> Named graph http://ex.com/g4 size: 1000000 >>> warmup... >>> please wait... >>> 5 ms for 1000000 calls >>> >>> # my TDB >>> >>> Default model size: 31769838 >>> Named graph http://www.renault.com/rasse/docArboLeafindex size: 6843619 >>> Named graph http://www.renault.com/rasse/elementfamilyindex size: 2956557 >>> Named graph http://www.renault.com/rasse/graph/xappl size: 4271172 >>> warmup... >>> please wait... >>> 5 ms for 1000000 calls >>> >>> >>> >
