Hello Bart,
We are expecting to have a Jackrabbit cluster on a really heavy load of both
read and write operations, so we are really concerned about performance.
Currently we are in a "pre-production" stage and we are planning to use Oak
soon, but we will probably have to maintain this version for some time with
thousands of users.
===== GLOBAL_REVISION TABLE =====
Regarding the performance of the query, here are some numbers for
'JOURNAL_GLOBAL_REVISION set REVISION_ID = REVISION_ID + 1' that I got with
JMeter:
Original: Average time 45 ms with std. dev. of 26 ms
Modified: Average time 37 ms with std. dev. of 16 ms
1) Original table
Create table SQL: 'CREATE TABLE `JOURNAL_GLOBAL_REVISION` ( `REVISION_ID`
bigint(20) NOT NULL, UNIQUE KEY `JOURNAL_GLOBAL_REVISION_IDX` (`REVISION_ID`))
ENGINE=InnoDB DEFAULT CHARSET=latin1'
Result of 'EXPLAIN EXTENDED UPDATE JOURNAL_GLOBAL_REVISION SET REVISION_ID =
REVISION_ID + 1;':
# id, select_type, table, type, possible_keys, key, key_len, ref, rows,
filtered, Extra
'1', 'SIMPLE', 'JOURNAL_GLOBAL_REVISION', 'index', NULL,
'JOURNAL_GLOBAL_REVISION_IDX', '8', NULL, '1', '100.00', 'Using temporary'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2) Modified table (added a PK with auto increment, this makes the query
execution plan changes)
Create table SQL: ''CREATE TABLE `JOURNAL_GLOBAL_REVISION_MOD` (
`REVISION_ID` bigint(20) NOT NULL, `c` int(10) unsigned NOT NULL
AUTO_INCREMENT, PRIMARY KEY (`c`)) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT
CHARSET=latin1'
Result of 'EXPLAIN EXTENDED UPDATE JOURNAL_GLOBAL_REVISION SET REVISION_ID =
REVISION_ID + 1;' :
# id, select_type, table, type, possible_keys, key, key_len, ref, rows,
filtered, Extra
'1', 'SIMPLE', 'JOURNAL_GLOBAL_REVISION_MOD', 'index', NULL, 'PRIMARY', '4',
NULL, '1', '100.00', NULL
It's not a huge difference, but this modification would only require a
simple change on the journal mysql.ddl file and there will be no migration
scenarios.
===== LOCAL_REVISION TABLE =====
On the LOCAL_REVISION table I just added two extra timestamp columns with
the creation and last modified timestamps (MySQL has a feature which auto
updates the last modified timestamp, but for portability a NOW() function could
be used or something similar).
To detect a "dead node" I select the entries that are not updated for a few
days (a new janitor parameter was added for that) and has the revision_id is
less than the current global revision id. These nodes are being removed before
the janitor selects until which revision the journal will be cleaned.
We are currently using MySQL (Percona), and this is working fine (we destroy
and create new nodes everytime), but we didn't verified the solution for other
implementations/DBs.
Regards,
Fernando Lemes da Silva
Software Designer – Brazil R&D
[email protected]
+55-11-4197-8124
-----Original Message-----
From: Bart van der Schans [mailto:[email protected]]
Sent: quinta-feira, 12 de março de 2015 21:36
To: Jackrabbit Dev
Subject: Re: MySQL without primary keys.
Hi Frenando,
On Mon, Mar 2, 2015 at 7:09 PM, Lemes, Fernando <[email protected]> wrote:
>
>
> Hello guys,
>
>
>
> I started to work with Jackrabbit a few months ago and in the
> project I’m working we are using Jackrabbit 2.8.0 with MySQL as the
> persistent manager.
> We found some issues regarding the database schema. There are no
> primary keys at the journal tables.. is there a reason for that? It
> seems that in a MySQL clustered environment this is important for the nodes
> synchronization.
I'm not sure if there is a specific reason for that. The tables are indeed used
for synchronizing the cluster. A lock on the table is used to serialize the
updates in the cluster.
> Also we have realized that updating the journal GLOBAL_REVISION
> table (which is updated all the time) had a performance issue. By
> examining the query execution plan we found a temporary table being
> created. As a workaround we just added a dummy column with auto
> increment value as the primary key, but shouldn’t this table consists only of
> a column like that?
I remember I noticed the temporary table creation as well during some
investigation. It has been a while ago but if I remember correctly for our
situation it didn't turn out to be a real performance issue.
Our situation of course might be wildly different from yours ;-) Do you have
any numbers on the performance gain by just adding the dummy auto increment
column? Which MySQL storage engine are you using?
If the revision id column would be an auto incremented value it would mean the
"ownership" of the revision number (generation) would be transferred from
Jackrabbit to the database. That could impact the portability of the code to
other backends or the flexibility in Jackrabbit for handling the revision ids.
I'm not sure if that would be a real issue. However changing the storage format
in MySQL would have an impact for existing installations and require a
migration of the schema when upgrading.
> We have also implemented a feature to remove Jackrabbit dead nodes
> from the LOCAL_REVISION table, so the janitor can work properly even
> in a scenario where cluster nodes may be destroyed anytime. How can I
> submit this patch?
Could you explain how your solution works? How do you decide if a node is just
shutdown for a while or "dead"?
Regards,
Bart
>
>
>
>
>
> Regards,
>
>
>
> Fernando Lemes da Silva
>
> Software Designer – Brazil R&D
>
> [email protected]
>
> +55-11-4197-8124
>
>