[
https://issues.apache.org/jira/browse/DERBY-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Matrigali updated DERBY-5066:
----------------------------------
It is great you posted the 2 query plans, that is a good first step to getting
help. If possible you should post db and actual test program. If the
data is not appropriate to post it is usually not to hard to come up with dummy
ddl and dummy data to load into the db so that everyone can
see exactly what is going on.
I think there is something wierd going on here as I am not sure why one plan is
being picked over the other. In the 2 plans that
you posted were there 500k rows? It looks like the optimizer is estimating
around 69k total rows and the difference in the 2 plans
is that it thinks the >0 is going to return only around 22k rows, but looks
like it return all or almost all the rows. This could be an issue with
statistics or index maintenance. Do you see any difference if you run
SYSCS_UTIL.SYSCS_COMPRESS_TABLE() and rerun the queries. This
will get rid of any old cruft in the index which helps with histogram estimates
for queries, and also will generate new row count estimates and
cardinality statistics if appropriate.
But if the code you posted is really what is being used your
test is flawed as it never looks at the result which is why I think you are
seeing orders of magnitude differences. Sometimes
Derby will begin returning rows before the query has finished, I think this
happens in all cases except when a sort is required.
So your test is sort of measuring time to first row, rather than time to
process the whole query. To be a true measure of time to first row
it should include a fetch of the first row and then end the measurement. When
the query includes a sort this will be the
time to process all the rows, while with no sort this will return the first row
very quickly. I know the optimizer does not take
time to first row into account for optimization.
Looking at the query plans you posted at least from the server side the
difference is total time from begin compilation to end execution is much closer,
with one being around 3 seconds and one being 8 seconds.
Also comparing these 2 queries can easily feed on each other. If run one after
another then derby cache will help the second. Even if the server
is cycled the OS cache is very likely to help the second query.
> full table scan when index is used, taking extremely long time in JDBC
> ----------------------------------------------------------------------
>
> Key: DERBY-5066
> URL: https://issues.apache.org/jira/browse/DERBY-5066
> Project: Derby
> Issue Type: Bug
> Components: Eclipse Plug-in, JDBC
> Affects Versions: 10.7.1.1
> Environment: Windows XP environment
> Reporter: George Xu
>
> When a very large table (500k rows) is used with a column is indexed.
> select * from tab where pid > 0 order by pid takes extremely longer time than
> select * from tab order by pid. Actually, it is 100 times slower. However,
> in IJ, ther performance seems to be similar. PID column is indexed.
> Here is the code snipplet
> import java.sql.Connection;
> import java.sql.DriverManager;
> import java.sql.ResultSet;
> import java.sql.SQLException;
> import java.sql.Statement;
> public class TestPerformance {
> //20343 mill-sec
> static String s1 = "SELECT TIMESTAMP, HOSTPORT AS \"HOST ID\", PID,
> SESSIONID, REQUESTID, " +
> "SUBREQUESTID, STEPID, TID, COMPONENT, BUILDNUM, " +
> "LOGLEVELORIG AS \"LEVEL\", LOGGER, OPERATION, OBJECTTYPE, OBJECTPATH,
> " +
> "STATUS, MESSAGE, DATA, NDX FROM LOGDATA871218 where PID > 0 ORDER BY
> PID";
> //297 million sec.
> static String s2 = "SELECT TIMESTAMP, HOSTPORT AS \"HOST ID\", PID,
> SESSIONID, REQUESTID, " +
> "SUBREQUESTID, STEPID, TID, COMPONENT, BUILDNUM, " +
> "LOGLEVELORIG AS \"LEVEL\", LOGGER, OPERATION, OBJECTTYPE, OBJECTPATH,
> " +
> "STATUS, MESSAGE, DATA, NDX FROM LOGDATA871218 ORDER BY PID";
> public static void main(String[] args) throws InstantiationException,
> IllegalAccessException, ClassNotFoundException {
> Statement stmt3;
> try {
>
> //connect'jdbc:derby:C:/devroot/runtime-New_configuration/LogXData';
> String db =
> "C:/devroot/runtime-New_configuration/LogXData";
> String driver = "org.apache.derby.jdbc.EmbeddedDriver";
> Class.forName(driver).newInstance();
> Connection con =
> DriverManager.getConnection("jdbc:derby:"+db);
> stmt3 = con.createStatement(ResultSet.FETCH_FORWARD,
> ResultSet.TYPE_FORWARD_ONLY);
> long startTime3 = System.currentTimeMillis();
> ResultSet rs3 = stmt3.executeQuery(s1);
> long elapsed3 = System.currentTimeMillis() - startTime3;
> System.out.println("Statment.executeQuery Dup: " +
> elapsed3);
> } catch (SQLException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> }
> }
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira