Hello Bart,

   We are expecting to have a Jackrabbit cluster on a really heavy load of both 
read and write operations, so we are really concerned about performance. 
Currently we are in a "pre-production" stage and we are planning to use Oak 
soon, but we will probably have to maintain this version for some time with 
thousands of users.


===== GLOBAL_REVISION TABLE =====

   Regarding the performance of the query, here are some numbers for 
'JOURNAL_GLOBAL_REVISION set REVISION_ID = REVISION_ID + 1' that I got with 
JMeter:

   Original: Average time 45 ms with std. dev. of 26 ms
   Modified: Average time 37 ms with std. dev. of 16 ms


   1) Original table

   Create table SQL: 'CREATE TABLE `JOURNAL_GLOBAL_REVISION` (  `REVISION_ID` 
bigint(20) NOT NULL,  UNIQUE KEY `JOURNAL_GLOBAL_REVISION_IDX` (`REVISION_ID`)) 
ENGINE=InnoDB DEFAULT CHARSET=latin1'

   Result of 'EXPLAIN EXTENDED UPDATE JOURNAL_GLOBAL_REVISION SET REVISION_ID = 
REVISION_ID + 1;': 
   # id, select_type, table, type, possible_keys, key, key_len, ref, rows, 
filtered, Extra
   '1', 'SIMPLE', 'JOURNAL_GLOBAL_REVISION', 'index', NULL, 
'JOURNAL_GLOBAL_REVISION_IDX', '8', NULL, '1', '100.00', 'Using temporary'
                                                                                
                                                                                
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^


   2) Modified table (added a PK with auto increment, this makes the query 
execution plan changes)

   Create table SQL: ''CREATE TABLE `JOURNAL_GLOBAL_REVISION_MOD` (  
`REVISION_ID` bigint(20) NOT NULL,  `c` int(10) unsigned NOT NULL 
AUTO_INCREMENT,  PRIMARY KEY (`c`)) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT 
CHARSET=latin1'

   Result of 'EXPLAIN EXTENDED UPDATE JOURNAL_GLOBAL_REVISION SET REVISION_ID = 
REVISION_ID + 1;' :

   # id, select_type, table, type, possible_keys, key, key_len, ref, rows, 
filtered, Extra
   '1', 'SIMPLE', 'JOURNAL_GLOBAL_REVISION_MOD', 'index', NULL, 'PRIMARY', '4', 
NULL, '1', '100.00', NULL

   It's not a huge difference, but this modification would only require a 
simple change on the journal mysql.ddl file and there will be no migration 
scenarios.



===== LOCAL_REVISION TABLE =====

   On the LOCAL_REVISION table I just added two extra timestamp columns with 
the creation and last modified timestamps (MySQL has a feature which auto 
updates the last modified timestamp, but for portability a NOW() function could 
be used or something similar).

   To detect a "dead node" I select the entries that are not updated for a few 
days (a new janitor parameter was added for that) and has the revision_id is 
less than the current global revision id. These nodes are being removed before 
the janitor selects until which revision the journal will be cleaned.

   We are currently using MySQL (Percona), and this is working fine (we destroy 
and create new nodes everytime), but we didn't verified the solution for other 
implementations/DBs.



   Regards,

Fernando Lemes da Silva
Software Designer – Brazil R&D
[email protected]
+55-11-4197-8124

-----Original Message-----
From: Bart van der Schans [mailto:[email protected]] 
Sent: quinta-feira, 12 de março de 2015 21:36
To: Jackrabbit Dev
Subject: Re: MySQL without primary keys.

Hi Frenando,

On Mon, Mar 2, 2015 at 7:09 PM, Lemes, Fernando <[email protected]> wrote:
>
>
>    Hello guys,
>
>
>
>    I started to work with Jackrabbit a few months ago and in the 
> project I’m working we are using Jackrabbit 2.8.0 with MySQL as the 
> persistent manager.
> We found some issues regarding the database schema. There are no 
> primary keys at the journal tables.. is there a reason for that? It 
> seems that in a MySQL clustered environment this is important for the nodes 
> synchronization.

I'm not sure if there is a specific reason for that. The tables are indeed used 
for synchronizing the cluster. A lock on the table is used to serialize the 
updates in the cluster.


>    Also we have realized that updating the journal GLOBAL_REVISION 
> table (which is updated all the time) had a performance issue. By 
> examining the query execution plan we found a temporary table being 
> created. As a workaround we just added a dummy column with auto 
> increment value as the primary key, but shouldn’t this table consists only of 
> a column like that?

I remember I noticed the temporary table creation as well during some 
investigation. It has been  a while ago but if I remember correctly for our 
situation it didn't turn out to be a real performance issue.
Our situation of course might be wildly different from yours ;-) Do you have 
any numbers on the performance gain by just adding the dummy auto increment 
column? Which MySQL storage engine are you using?

If the revision id column would be an auto incremented value it would mean the 
"ownership" of the revision number (generation) would be transferred from 
Jackrabbit to the database. That could impact the portability of the code to 
other backends or the flexibility in Jackrabbit for handling the revision ids. 
I'm not sure if that would be a real issue. However changing the storage format 
in MySQL would have an impact for existing installations and require a 
migration of the schema when upgrading.

>    We have also implemented a feature to remove Jackrabbit dead nodes 
> from the LOCAL_REVISION table, so the janitor can work properly even 
> in a scenario where cluster nodes may be destroyed anytime. How can I 
> submit this patch?

Could you explain how your solution works? How do you decide if a node is just 
shutdown for a while or "dead"?

Regards,
Bart

>
>
>
>
>
>    Regards,
>
>
>
> Fernando Lemes da Silva
>
> Software Designer – Brazil R&D
>
> [email protected]
>
> +55-11-4197-8124
>
>

Reply via email to