Re: Timing tests for jena-624: doing better

A. Soroka Tue, 06 Oct 2015 08:30:29 -0700

Okay, that would seem to me to be mostly about documentation/messaging, since 
Graph could be implemented by any fellow off the street.


After a few simple tests on freshly-generated BSBM data (different flavors of 
find()), the results are pretty much as one would expect. When queries tilt 
towards the iterative end of things (more wildcards in non-graph-name 
positions), the stock implementation wins out, usually by a factor of two or 
three. The iterative machinery in the new implementation is heavier (using the 
Streams API), so that’s not surprising. When queries tilt to direct retrieval 
(fewer wildcards), letting the new implementation really make use of its “INDEX 
ALL THE THINGS” maps, the new implementation wins, sometimes by a little, 
sometimes by a factor of several dozen. I’m eager to see what real-world use 
looks like!

---
A. Soroka
The University of Virginia Library

> On Oct 6, 2015, at 6:05 AM, Andy Seaborne <[email protected]> wrote:
> 
> On 05/10/15 20:57, A. Soroka wrote:
>>> I think the problem areas are around adding inference graphs to general 
>>> datasets, not the details of this new dataset implementation.
>> 
>> Just to be sure that I understand the issue here, is the problem that one 
>> could make a graph using an inferring implementation, add the graph to this 
>> kind of dataset, and expect the inference to function inside the dataset 
>> (which it won’t, because of the copy-on-add-graph semantic)?
> 
> Yes - and also the fact it'll materialize the triples which if it's some 
> complicated backward chained inference setup might lead to a lot of 
> work/space.  We just need to manage the integration/migration.
> 
>       Andy
> 
>> 
>> ---
>> A. Soroka
>> The University of Virginia Library
>> 
>>> On Oct 4, 2015, at 5:37 AM, Andy Seaborne <[email protected]> wrote:
>>> 
>>> On 29/09/15 15:00, A. Soroka wrote:
>>>> On Sep 27, 2015, at 5:41 AM, Andy Seaborne <[email protected]> wrote
>>>>> I can't try out your new stuff for a few days due to not being near
>>>>> a suitable computer.
>>>> 
>>>> No problem. On my machine using Dexx, that port of the Scala types,
>>>> the branch shows improvement to within half of the stock performance.
>>> 
>>> Excellent. That's looking very good.  It's does something so it's going to 
>>> cost something.
>>> 
>>> My figures below on same hardware as before - the txn/non-txn is making a 
>>> difference now.
>>> 
>>> Licensing-wise, Dexx is MIT (with maybe some BSD-isms from Scala) which is 
>>> no problem.
>>> 
>>>> I have tried now with some variations using the Clojure types (shown
>>>> after my sig) and didn’t see much difference, so I’ll leave that
>>>> question alone for the moment. I wasn’t able to use Clojure’s
>>>> transient (mutate-in-place-within-a-thread/transaction)
>>>> functionality, because Clojure transients do not afford iteration,
>>>> which is needed to support find(). It seems feasible to me that a
>>>> custom implementation with the ability to use mutate-in-place within
>>>> transactions might offer more improvement, but that’s a whole ‘nuther
>>>> kettle of fish.
>>>> 
>>>> I’ll spend some time soon moving on with the Dexx branch and trying
>>>> out some simple tests of the kind you’ve outlined below (and I’ll
>>>> include something that exercises property paths, which actually
>>>> happen to be very interesting for a few use cases in which I am
>>>> interested). I’m not sure how to engage real world use very
>>>> effectively. I can certainly spin up examples, but it seems like we
>>>> would want a broader set of users than just me to try it out, no?
>>>> {grin}
>>> 
>>> That would be ideal but it's not always easy to do.  Email to users@ 
>>> possibly with a quite large notice saying people are affected.
>>> 
>>> I think the problem areas are around adding inference graphs to general 
>>> datasets, not the details of this new dataset implementation.
>>> 
>>> Discussion/proposal:
>>> 
>>> * Add this as DatasetFactory.createTxnMem(),
>>> * Add DatasetFactory.createGeneral()
>>> * ?? Deprecate DatasetFactory.createMem(),
>>>     referring to createTxnMem() and createGeneral()
>>> (other clearing up of DatasetFactory ...)
>>> * Release.
>>> 
>>> 
>>>     Andy
>>> 
>>>> 
>>>> --- A. Soroka The University of Virginia Library
>>> 
>>> 2015-01-03:
>>> jena-624-dexx branch:
>>> 
>>> ==== Data: /home/afs/Datasets/BSBM/bsbm-1m.nt.gz ====
>>>     Size: 1,000,312 (3.253s, 307,504 tps)
>>> ==== DSG/mix/auto (warm N=3)
>>> ==== DSG/mix/txn  (warm N=3)
>>> ==== DSG/mem/auto (warm N=3)
>>> ==== DSG/mem/txn  (warm N=3)
>>> ==== DSG/mix/auto (N=20)
>>> ==== DSG/mix/auto (N=20) Time: 81.064s (246,795 tps)
>>> ==== DSG/mix/txn  (N=20)
>>> ==== DSG/mix/txn  (N=20) Time: 80.412s (248,796 tps)
>>> ==== DSG/mem/auto (N=20)
>>> ==== DSG/mem/auto (N=20) Time: 230.129s (86,934 tps)
>>> ==== DSG/mem/txn  (N=20)
>>> ==== DSG/mem/txn  (N=20) Time: 129.259s (154,776 tps)
>> 
>

Re: Timing tests for jena-624: doing better

Reply via email to