But of course I thought an index would help, so I added an index to the the_double_column and repeated the process. It took 720 seconds, 31x slower!
It would be instructive to see the RUNTIMESTATISTICS information for each of your two cases.
My guess, just based on reading your message, is that you'll find: - for the first query, Derby extracted the pk_column and the_double_column values by scanning the table exactly once, as you suspected. - for the second query, Derby traversed the index. For each matching row in the index, Derby then fetched the corresponding row from the base table, and retrieved the pk_column from that row. That is, I'm guessing that the crucial piece of information is that your query fetched 271,000 rows, which is 7% of the total table, and that what you discovered is that it's vastly cheaper to do 1 table scan than to do 271,000 row fetches. If your query had provided a WHERE clause which fetched, say, 200 rows, I bet it would be MUCH faster to run it via the index, than by scanning the table. thanks, bryan
