The original query:
select * from Message m where
m.messageId='<[email protected]>'
and m.id != 933927 and m.archived=1
and the sub-select query:
SELECT * from Message where ID IN (SELECT ID FROM Message where
messageId = '<[email protected]
>')
and archived=1 and id!=987452
are equivalent with respect to the database. My comment regarding the 'id
!= 987452' clause was only with respect to an index. It's inclusion in the
query (original) does not impact performance in a negative way.
The optimizer is going to choose to use the index (on messageId), and will
perform that select first, then it will (likely) join that sub-result
against the other two clauses (archived=1 and id != 987452) -- making it
equivalent to your second query.
Do you know how to get a query plan?
http://wiki.apache.org/db-derby/PerformanceDiagnosisTips
The portion of the query with the messageId should have high-specificity due
to the relative uniqueness of the message within the context of all
messages. For example, you say the table has 1 million rows, and a given
messsage in an organization of 200 users would consume 200 rows, so as you
can see the specificity is very high. I would not expect that kind of
select to be taking 5 seconds, but rather milliseconds. It is likely that
your index statistics are seriously out of date (see
http://issues.apache.org/jira/browse/DERBY-269). One user reported that
updating statistics took a 22 minute query down to less than one second!
You can force Derby to update statistics with this command:
alter table <table-name> compress [sequential]
Note this command itself might take a long time -- dozens of minutes -- but
in a system like yours you could get away with running it once or month or
so at some off-peak time.
If you have a test database in which your query is taking 5 seconds, attempt
to run the above alter table, and then re-run your query. If your query
times are still low, post the query plan for the query here and we can take
a look at it.
Brett