You only hit this if you are using durable cursors for durable
subscribers - with the default JDBC database. Now this is
functionality that is due for official release in 4.2 (though it does
exist in 4.0.2) - and not documented any where - so the discussion is
probably better on the dev list. Other answers inline:
On 19 Nov 2006, at 02:53, [EMAIL PROTECTED] wrote:
I am trying to use persistent storage with a large number of
messages and I
think I'm running into trouble with the JDBC statements.
In particular, I'm trying to use Derby, but I think there are some
general
performance problems with the SQL statements.
1. Many of the SELECT statements actually return many rows.
2a. Right before the JDBC call is made, setMaxRows() should be
called
on the statement with the maximum number of rows to be captured.
setMaxRows is a hint - and as you mention later on down the post, has
no affect on Derby - but please go ahead and raise a jira
2b. A "TOP" or "LIMIT" clause should be a part of these SQL
SELECT
statements. Otherwise, the database can attempt to sort a huge
number of
rows since these also contain an ORDER BY. Unfortunately, there
is no
standard syntax for this -- it varies depending on the RDBMS. The
current
implementation stops after reading x number rows, but it is too
late, the
database may have already done a lot of work (demonstrated below).
For some
RDBMS, the numeric argument to TOP/LIMIT may be parameterized,
other times
it must be put directly in the SQL string.
Great - you can raise a jira or submit a patch - all are welcome. We
don't always have access to try out Oracle or databases. However,
there is scope for using specific select statements for specific
databases.
I think the statements affected are:
getFindDurableSubMessagesStatement
getNextDurableSubscriberMessageStatement
getPrevDurableSubscriberMessageIdStatement
getNextDurableSubscriberMessageIdStatement
Some of these statements are not actually used anywhere - like
getNextDurableSubscriberMessageStatement and
getPrevDurableSubscriberMessageIdStatement...
-- Note: The next and prev statements can probably be rewritten
in
a way
that uses MIN or MAX to avoid needing to use TOP/LIMIT. The most
problems
come with the getFindDurableSubMessagesStatement.
2. Why do many of these statements bother to join to the ACKS
table? Some
of them do not obtain any information from these tables regarding
which
messages are to be retrieved (maybe this is needed to make sure an
entry
exists there, but can't this be figured out just by the IDs/
messages?)
Messages exist in the table - after they have been deleted by a
subscriber - there is one message table shared by many subscribers.
3. I believe the index on (MSGID_PROD,MSGID_SEQ) can be declared as
a UNIQUE
INDEX. This helps out some optimizers, not sure how much it would
affect
ActiveMQ.
4. The prepared statements probably should be cached/pooled (in
whatever way
is thread-safe), but I think this is the least of the problems.
Really bad performance due to this was observed with Derby. Happens
even
when I submit the commands directly to Derby, bypassing
ActiveMQ. The
setMaxRows did not fix it, and Derby does not provide any form of
TOP or
LIMIT. :( Tried against the latest release of Derby and no luck
there (in
fact, I was getting out of memory error manually issuing the SQL
statements
to the database).
I am wondering as to what kind of persistence people here
recommend? I may
need to persist millions of records.
You could try the Kaha store -
*** Relevant part of the optimizer log (notice that Rows Visited =
100299,
even though only 100 are desired).
Right result set:
Table Scan ResultSet for ACTIVEMQ_MSGS at read committed
isolation level
using instantaneous share row locking chosen by the optimizer
Number of opens = 1
Rows seen = 100287
Rows filtered = 0
Fetch Size = 16
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched={0, 1, 5}
Number of columns fetched=3
Number of pages visited=4777
Number of rows qualified=100287
Number of rows visited=100299
Scan type=heap
start position:
null stop position:
null qualifiers:
--
View this message in context: http://www.nabble.com/Statements.java%
2C-etc.-Performance-tf2662372.html#a7425760
Sent from the ActiveMQ - User mailing list archive at Nabble.com.
You only hit this if you are using durable cursors for durable
subscribers - with the default JDBC database. Now this is
functionality that is due for official release in 4.2 (though it does
exist in 4.0.2) - and not documented any where - so the discussion is
probably better on the dev list. Other answers inline:
On 19 Nov 2006, at 02:53, [EMAIL PROTECTED] wrote:
I am trying to use persistent storage with a large number of
messages and I
think I'm running into trouble with the JDBC statements.
In particular, I'm trying to use Derby, but I think there are some
general
performance problems with the SQL statements.
1. Many of the SELECT statements actually return many rows.
2a. Right before the JDBC call is made, setMaxRows() should be
called
on the statement with the maximum number of rows to be captured.
setMaxRows is a hint - and as you mention later on down the post, has
no affect on Derby - but please go ahead and raise a jira
2b. A "TOP" or "LIMIT" clause should be a part of these SQL
SELECT
statements. Otherwise, the database can attempt to sort a huge
number of
rows since these also contain an ORDER BY. Unfortunately, there
is no
standard syntax for this -- it varies depending on the RDBMS. The
current
implementation stops after reading x number rows, but it is too
late, the
database may have already done a lot of work (demonstrated below).
For some
RDBMS, the numeric argument to TOP/LIMIT may be parameterized,
other times
it must be put directly in the SQL string.
Great - you can raise a jira or submit a patch - all are welcome. We
don't always have access to try out Oracle or databases. However,
there is scope for using specific select statements for specific
databases.
I think the statements affected are:
getFindDurableSubMessagesStatement
getNextDurableSubscriberMessageStatement
getPrevDurableSubscriberMessageIdStatement
getNextDurableSubscriberMessageIdStatement
Some of these statements are not actually used anywhere - like
getNextDurableSubscriberMessageStatement and
getPrevDurableSubscriberMessageIdStatement...
-- Note: The next and prev statements can probably be rewritten
in
a way
that uses MIN or MAX to avoid needing to use TOP/LIMIT. The most
problems
come with the getFindDurableSubMessagesStatement.
2. Why do many of these statements bother to join to the ACKS
table? Some
of them do not obtain any information from these tables regarding
which
messages are to be retrieved (maybe this is needed to make sure an
entry
exists there, but can't this be figured out just by the IDs/
messages?)
Messages exist in the table - after they have been deleted by a
subscriber - there is one message table shared by many subscribers.
3. I believe the index on (MSGID_PROD,MSGID_SEQ) can be declared as
a UNIQUE
INDEX. This helps out some optimizers, not sure how much it would
affect
ActiveMQ.
4. The prepared statements probably should be cached/pooled (in
whatever way
is thread-safe), but I think this is the least of the problems.
Really bad performance due to this was observed with Derby. Happens
even
when I submit the commands directly to Derby, bypassing
ActiveMQ. The
setMaxRows did not fix it, and Derby does not provide any form of
TOP or
LIMIT. :( Tried against the latest release of Derby and no luck
there (in
fact, I was getting out of memory error manually issuing the SQL
statements
to the database).
I am wondering as to what kind of persistence people here
recommend? I may
need to persist millions of records.
You could try the Kaha store -
*** Relevant part of the optimizer log (notice that Rows Visited =
100299,
even though only 100 are desired).
Right result set:
Table Scan ResultSet for ACTIVEMQ_MSGS at read committed
isolation level
using instantaneous share row locking chosen by the optimizer
Number of opens = 1
Rows seen = 100287
Rows filtered = 0
Fetch Size = 16
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
next time in milliseconds/row = 0
scan information:
Bit set of columns fetched={0, 1, 5}
Number of columns fetched=3
Number of pages visited=4777
Number of rows qualified=100287
Number of rows visited=100299
Scan type=heap
start position:
null stop position:
null qualifiers:
--
View this message in context: http://www.nabble.com/Statements.java%
2C-etc.-Performance-tf2662372.html#a7425760
Sent from the ActiveMQ - User mailing list archive at Nabble.com.