Andy, yes, I'll look into it as soon as I have cycles again. And no, I have not yet tried with non-transactional API in 2.7.0. I actually want to do that at some point to have a cleaner baseline.
In the mean time, here is a summary of the results I found: 1) when I run with 1 client, query and store execution is comparable to each other. I have detailed numbers, but they help much 2) things become interesting when I start scaling up the number of clients (one of the principal motivations to move to TDB Tx). The data below is for the following scenario: * 50 clients * the operations of each client is a mixture of queries and write operations, where I execute a write operation for every 7th query * the queries are deterministically taken from a pool of about 35 queries with varying complexity. When run in 1 client, they take anywhere from a few ms to almost 2 seconds for most intense query * between each operation, I wait 2s * there is plenty of memory/heap available. I use a 64 bit machine with 8Gb of memory where 4 is used for the java heap. Note that in TDB we use an exclusive write lock for write operations and shared read locks for read operations. In TDBTx, I just use transactions (i.e. we don't lock ourselves): A) Here are the numbers for TDB (0.8.7 etc): - total write time = 1345594ms, so about 1346s cnt | avg | max | min | dev | tot ====================================================================================================================== DESCRIBE (ms) 402 | 466 | 4,859 | 0 | 609 | 187,609 SELECT (ms) 4,618 | 4,809 | 93,453 | 0 | 9,621 | 22,211,907 ---------------------------------------------------------------------------------------------------------------------- PARALLELISM 5,020 | 14 | 41 | 0 | 8 | 79,066 quite note about parallelism: this indicates effectively how much parallel activity was going on. For instance, on average, there were 14 queries running at the same time, but maximum 41. The total indicates how heavily query activity was running in parallel. B) Here are the numbers of TDBTx: - total write time = 166047ms, so about 166s cnt | avg | max | min | dev | tot ================================================================================================================== DESCRIBE (ms) 168 | 2,557 | 9,219 | 31 | 1,769 | 429,645 SELECT (ms) 1,853 | 38,866 | 392,282 | 0 | 74,008 | 72,020,224 ------------------------------------------------------------------------------------------------------------------- PARALLELISM 2,021 | 35 | 49 | 0 | 10 | 71,791 note that although the test suite are running in the same way, The long query times in TDBTx caused several timeouts, which indicates the substantially smaller amount of completed queries. Even so, the total query time was still almost 4 times higher So, it seems that in this multi-client scenario, TDBTx is way better in avoiding lock contention around write operations, but, it is behaving significantly weaker for queries. One thing that is interesting is TDBTx has a higher number of average parallel running queries and a higher max. So, perhaps this is an important cause in the slowdown. Hopefully these are useful. Does any of you have done any performance measurements with transactional TDB? Simon From: Andy Seaborne <a...@apache.org> To: jena-dev@incubator.apache.org Date: 01/10/2012 02:04 PM Subject: Re: TDB: release process On 10/01/12 13:45, Andy Seaborne wrote: > On 09/01/12 15:07, Simon Helsen wrote: >> Andy, others, >> >> I have been testing TxTDB on my end and functionally, things are looking >> good. I am not able to see any immediate problems anymore. Of course, >> there may still be more exotic things left, but those can probably >> managed >> in am minor release. However, now that it is getting good on the >> functional end, I am starting to check the non-functional >> characteristics, >> especially speed and scalability (in terms of multiple clients). For this >> I use a test suite with about 35 different queries and I compare the >> performance against Jena 2.6.3/ARQ 2.8.5 and TDB 0.8.7 because that is >> the >> version we currently use in the release of our product.. I am comparing >> these numbers then with Jena/ARQ 2.7.0 and TDB 0.9.0 (20111229) and the >> transaction API. I realize this partially comparing apples to pears but >> from our perspective, we need to see how the bottomline changes in terms >> of query speed when we increase the number of concurrent clients. >> >> I have detailed numbers, but before I start sharing these, I want to know >> if there is anything I could/should do to tune ARQ/TxTDB in terms of >> performance. For instance, I wonder if there are still a whole range of >> checks active which I can/should turn off now that we are functionally >> more sound. For completeness, I should add that we don't use any >> optimization (i.e. we run with none.opt ) >> >> thanks >> >> Simon > > Simon, > > Figure would be good. If you use TDB without touching the transaction > system then it should be the same as before (with the obvious chances of > unintended changes). Have you run this way? > > Just creating a transaction, especially one that allows write is a cost > and if the granularity is small then it's going to make a big > difference. (This is one reason there isn't an "autocommit" mode - it > only seems to end in trouble one way or another). Read transactions are > cheaper but not free. > > In terms of tuning, TDB 0.9 needs more heap as the transaction > intermediate state is in-RAM , with no proper spill-to-disk yet. > > There shouldn't be the internal consistency checking enabled. Hmm - > better check yet again! > > Andy > Simon, Could you profile the tests and pass on the results? Any testing code left should show as hotspots. Andy