Re: Statements.java, etc. Performance

Rob Davies Sun, 19 Nov 2006 09:40:01 -0800

Hi Jason,

On 19 Nov 2006, at 13:50, [EMAIL PROTECTED] wrote:



rajdavies wrote:

  so the discussion is  probably better on the dev list.


Apologies, I accidentally posted to the wrong one.


rajdavies wrote:

 Now this is  functionality that is due for official release in 4.2


Okay, so now I understand why I couldn't find this stuff in the
documentation....


rajdavies wrote:

 You could try the Kaha store

Kaha store has no transaction log to protect against power failure,or did I

miss something when I looked at its code?

there's a combination of using Kaha and the journal - it's currentlystill under development.

I was also experiencing some exceptions with Kaha (but I wasn'tusing the
latest in trunk so maybe I'll try again).

Some gemlins appear to have crept in there recently - not sure what'schanged - but investigating

rajdavies wrote:
 Messages exist in the table  - after they have been deleted by a
subscriber - there is one message table shared by many subscribers.
Does this really matter? It looks like the only thing that thequeries inquestion do is make sure that the subscriber row exists in the acktable.They don't look at the ack sequeincing ID or anything like that.So, what
would be the harm if the join were omitted?

None - but I think those statements are no longer used - will removethem or fix them if they are still required.

BTW, any ideas on how to make things work with Derby? I think I'dlike to
use it instead of say MySQL.


I'm afraid not  - any Derby experts out there ?


Jason


rajdavies wrote:


You only hit this if you  are using durable cursors for durable
subscribers - with the default JDBC database. Now this is
functionality that is due for official release in 4.2 (though it does
exist in 4.0.2) - and not documented any where - so the discussion is
probably better on the dev list. Other answers inline:


On 19 Nov 2006, at 02:53, [EMAIL PROTECTED] wrote:


I am trying to use persistent storage with a large number of
messages and I
think I'm running into trouble with the JDBC statements.

In particular, I'm trying to use Derby, but I think there are some
general
performance problems with the SQL statements.

1.  Many of the SELECT statements actually return many rows.
    2a.  Right before the JDBC call is made, setMaxRows() should be
called
on the statement with the maximum number of rows to be captured.

setMaxRows is a hint - and as you mention later on down the post, has
no affect on Derby - but please go ahead and raise a jira

    2b.  A "TOP" or "LIMIT" clause should be a part of these SQL
SELECT
statements.  Otherwise, the database can attempt to sort a huge
number of

rows since these also contain an ORDER BY. Unfortunately, thereis no

standard syntax for this -- it varies depending on the RDBMS.  The
current
implementation stops after reading x number rows, but it is too
late, the
database may have already done a lot of work (demonstrated below).
For some
RDBMS, the numeric argument to TOP/LIMIT may be parameterized,
other times
it must be put directly in the SQL string.

Great - you can raise a jira or submit a patch - all are welcome. We
don't always have access to try out Oracle or databases. However,
there is scope for using specific select statements for specific
databases.


I think the statements affected are:
   getFindDurableSubMessagesStatement
   getNextDurableSubscriberMessageStatement
   getPrevDurableSubscriberMessageIdStatement
   getNextDurableSubscriberMessageIdStatement


Some of these statements are not actually used anywhere - like
getNextDurableSubscriberMessageStatement and
getPrevDurableSubscriberMessageIdStatement...

-- Note: The next and prev statements can probably be rewrittenina way

that uses MIN or MAX to avoid needing to use TOP/LIMIT.  The most
problems
come with the getFindDurableSubMessagesStatement.

2.  Why do many of these statements bother to join to the ACKS
table?  Some
of them do not obtain any information from these tables regarding
which
messages are to be retrieved (maybe this is needed to make sure an
entry

exists there, but can't this be figured out just by the IDs/messages?)

Messages exist in the table  - after they have been deleted by a
subscriber - there is one message table shared by many subscribers.


3. I believe the index on (MSGID_PROD,MSGID_SEQ) can be declared as
a UNIQUE
INDEX.  This helps out some optimizers, not sure how much it would
affect
ActiveMQ.

4. The prepared statements probably should be cached/pooled (in
whatever way
is thread-safe), but I think this is the least of the problems.

Really bad performance due to this was observed with Derby. Happens
even

when I submit the commands directly to Derby, bypassingActiveMQ. The

setMaxRows did not fix it, and Derby does not provide any form of
TOP or
LIMIT.  :(  Tried against the latest release of Derby and no luck
there (in
fact, I was getting out of memory error manually issuing the SQL
statements
to the database).

I am wondering as to what kind of persistence people here
recommend?  I may
need to persist millions of records.


You could try the Kaha store -


*** Relevant part of the optimizer log (notice that Rows Visited =
100299,
even though only 100 are desired).
                Right result set:
                        Table Scan ResultSet for ACTIVEMQ_MSGS at read committed
isolation level
using instantaneous share row locking chosen by the optimizer
                        Number of opens = 1
                        Rows seen = 100287
                        Rows filtered = 0
                        Fetch Size = 16
                                constructor time (milliseconds) = 0
                                open time (milliseconds) = 0
                                next time (milliseconds) = 0
                                close time (milliseconds) = 0
                                next time in milliseconds/row = 0

                        scan information:
                                Bit set of columns fetched={0, 1, 5}
                                Number of columns fetched=3
                                Number of pages visited=4777
                                Number of rows qualified=100287
                                Number of rows visited=100299
                                Scan type=heap
                                start position:
null                            stop position:
null                            qualifiers:

--
View this message in context: http://www.nabble.com/Statements.java%
2C-etc.-Performance-tf2662372.html#a7425760
Sent from the ActiveMQ - User mailing list archive at Nabble.com.



rajdavies wrote:


You only hit this if you  are using durable cursors for durable
subscribers - with the default JDBC database. Now this is
functionality that is due for official release in 4.2 (though it does
exist in 4.0.2) - and not documented any where - so the discussion is
probably better on the dev list. Other answers inline:


On 19 Nov 2006, at 02:53, [EMAIL PROTECTED] wrote:


I am trying to use persistent storage with a large number of
messages and I
think I'm running into trouble with the JDBC statements.

In particular, I'm trying to use Derby, but I think there are some
general
performance problems with the SQL statements.

1.  Many of the SELECT statements actually return many rows.
    2a.  Right before the JDBC call is made, setMaxRows() should be
called
on the statement with the maximum number of rows to be captured.

setMaxRows is a hint - and as you mention later on down the post, has
no affect on Derby - but please go ahead and raise a jira

    2b.  A "TOP" or "LIMIT" clause should be a part of these SQL
SELECT
statements.  Otherwise, the database can attempt to sort a huge
number of

rows since these also contain an ORDER BY. Unfortunately, thereis no

standard syntax for this -- it varies depending on the RDBMS.  The
current
implementation stops after reading x number rows, but it is too
late, the
database may have already done a lot of work (demonstrated below).
For some
RDBMS, the numeric argument to TOP/LIMIT may be parameterized,
other times
it must be put directly in the SQL string.

Great - you can raise a jira or submit a patch - all are welcome. We
don't always have access to try out Oracle or databases. However,
there is scope for using specific select statements for specific
databases.


I think the statements affected are:
   getFindDurableSubMessagesStatement
   getNextDurableSubscriberMessageStatement
   getPrevDurableSubscriberMessageIdStatement
   getNextDurableSubscriberMessageIdStatement


Some of these statements are not actually used anywhere - like
getNextDurableSubscriberMessageStatement and
getPrevDurableSubscriberMessageIdStatement...

-- Note: The next and prev statements can probably be rewrittenina way

that uses MIN or MAX to avoid needing to use TOP/LIMIT.  The most
problems
come with the getFindDurableSubMessagesStatement.

2.  Why do many of these statements bother to join to the ACKS
table?  Some
of them do not obtain any information from these tables regarding
which
messages are to be retrieved (maybe this is needed to make sure an
entry

exists there, but can't this be figured out just by the IDs/messages?)

Messages exist in the table  - after they have been deleted by a
subscriber - there is one message table shared by many subscribers.


3. I believe the index on (MSGID_PROD,MSGID_SEQ) can be declared as
a UNIQUE
INDEX.  This helps out some optimizers, not sure how much it would
affect
ActiveMQ.

4. The prepared statements probably should be cached/pooled (in
whatever way
is thread-safe), but I think this is the least of the problems.

Really bad performance due to this was observed with Derby. Happens
even

when I submit the commands directly to Derby, bypassingActiveMQ. The

setMaxRows did not fix it, and Derby does not provide any form of
TOP or
LIMIT.  :(  Tried against the latest release of Derby and no luck
there (in
fact, I was getting out of memory error manually issuing the SQL
statements
to the database).

I am wondering as to what kind of persistence people here
recommend?  I may
need to persist millions of records.


You could try the Kaha store -


*** Relevant part of the optimizer log (notice that Rows Visited =
100299,
even though only 100 are desired).
                Right result set:
                        Table Scan ResultSet for ACTIVEMQ_MSGS at read committed
isolation level
using instantaneous share row locking chosen by the optimizer
                        Number of opens = 1
                        Rows seen = 100287
                        Rows filtered = 0
                        Fetch Size = 16
                                constructor time (milliseconds) = 0
                                open time (milliseconds) = 0
                                next time (milliseconds) = 0
                                close time (milliseconds) = 0
                                next time in milliseconds/row = 0

                        scan information:
                                Bit set of columns fetched={0, 1, 5}
                                Number of columns fetched=3
                                Number of pages visited=4777
                                Number of rows qualified=100287
                                Number of rows visited=100299
                                Scan type=heap
                                start position:
null                            stop position:
null                            qualifiers:

--
View this message in context: http://www.nabble.com/Statements.java%
2C-etc.-Performance-tf2662372.html#a7425760
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

--

View this message in context: http://www.nabble.com/Re%3A-Statements.java%2C-etc.-Performance-tf2662998.html#a7429191

Sent from the ActiveMQ - Dev mailing list archive at Nabble.com.

Re: Statements.java, etc. Performance

Reply via email to