[jira] Updated: (DERBY-5066) full table scan when index is used, taking extremely long time in JDBC

Mike Matrigali (JIRA) Thu, 24 Feb 2011 14:19:04 -0800

     [ 
https://issues.apache.org/jira/browse/DERBY-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mike Matrigali updated DERBY-5066:
----------------------------------


It is great you posted the 2 query plans, that is a good first step to getting 
help.  If possible you should post db and actual test program.  If the
data is not appropriate to post it is usually not to hard to come up with dummy 
ddl and dummy data to load into the db so that everyone can
see exactly what is going on.

I think there is something wierd going on here as I am not sure why one plan is 
being picked over the other.  In the 2 plans that
you posted were there 500k rows?  It looks like the optimizer is estimating 
around 69k total rows and the difference in the 2 plans
is that it thinks the >0 is going to return only around 22k rows, but looks 
like it return all or almost all the rows.  This could be an issue with
statistics or index maintenance.  Do you see any difference if you run 
SYSCS_UTIL.SYSCS_COMPRESS_TABLE() and rerun the queries.  This
will get rid of any old cruft in the index which helps with histogram estimates 
for queries, and also will generate new row count estimates and
cardinality statistics if appropriate.
  
But if the code you posted is really what is being used your
test is flawed as it never looks at the result which is why I think you are 
seeing orders of magnitude differences.   Sometimes
Derby will begin returning rows before the query has finished, I think this 
happens in all cases except when a sort is required.

So your test is sort of measuring time to first row, rather than time to 
process the whole query.  To be a true measure of time to first row
it should include a fetch of the first row and then end the measurement.   When 
the query includes a sort this will be the
time to process all the rows, while with no sort this will return the first row 
very quickly.  I know the optimizer does not take
time to first row into account for optimization.

Looking at the query plans you posted at least from the server side the 
difference is total time from begin compilation to end execution is much closer,
with one being around 3 seconds and one being 8 seconds.

Also comparing these 2 queries can easily feed on each other.  If run one after 
another then derby cache will help the second.  Even if the server
is cycled the OS cache is very likely to help the second query.  



> full table scan when index is used, taking extremely long time in JDBC
> ----------------------------------------------------------------------
>
>                 Key: DERBY-5066
>                 URL: https://issues.apache.org/jira/browse/DERBY-5066
>             Project: Derby
>          Issue Type: Bug
>          Components: Eclipse Plug-in, JDBC
>    Affects Versions: 10.7.1.1
>         Environment: Windows XP environment
>            Reporter: George Xu
>
> When a very large table (500k rows) is used with a column is indexed.   
> select * from tab where pid > 0 order by pid takes extremely longer time than 
> select * from tab order by pid.  Actually, it is 100 times slower.  However, 
> in IJ, ther performance seems to be similar.  PID column is indexed.
> Here is the code snipplet
> import java.sql.Connection;
> import java.sql.DriverManager;
> import java.sql.ResultSet;
> import java.sql.SQLException;
> import java.sql.Statement;
> public class TestPerformance {
>       //20343 mill-sec
>       static String s1 = "SELECT TIMESTAMP, HOSTPORT AS \"HOST ID\", PID, 
> SESSIONID, REQUESTID, " +
>       "SUBREQUESTID, STEPID, TID, COMPONENT, BUILDNUM, " +
>       "LOGLEVELORIG AS \"LEVEL\", LOGGER, OPERATION, OBJECTTYPE, OBJECTPATH, 
> " +
>       "STATUS, MESSAGE, DATA, NDX FROM LOGDATA871218 where PID > 0 ORDER BY 
> PID";     
>       //297 million sec.
>       static String s2 = "SELECT TIMESTAMP, HOSTPORT AS \"HOST ID\", PID, 
> SESSIONID, REQUESTID, " +
>       "SUBREQUESTID, STEPID, TID, COMPONENT, BUILDNUM, " +
>       "LOGLEVELORIG AS \"LEVEL\", LOGGER, OPERATION, OBJECTTYPE, OBJECTPATH, 
> " +
>       "STATUS, MESSAGE, DATA, NDX FROM LOGDATA871218 ORDER BY PID";   
>       public static void main(String[] args) throws InstantiationException, 
> IllegalAccessException, ClassNotFoundException {
>               Statement stmt3;
>               try {
>                       
> //connect'jdbc:derby:C:/devroot/runtime-New_configuration/LogXData';
>                       String db = 
> "C:/devroot/runtime-New_configuration/LogXData";
>                       String driver = "org.apache.derby.jdbc.EmbeddedDriver";
>                       Class.forName(driver).newInstance();
>                       Connection con = 
> DriverManager.getConnection("jdbc:derby:"+db);                 
>                       stmt3 = con.createStatement(ResultSet.FETCH_FORWARD, 
>                               ResultSet.TYPE_FORWARD_ONLY);
>                       long startTime3 = System.currentTimeMillis();
>                       ResultSet rs3 = stmt3.executeQuery(s1); 
>                       long elapsed3 = System.currentTimeMillis() - startTime3;
>                       System.out.println("Statment.executeQuery Dup: " + 
> elapsed3); 
>               } catch (SQLException e) {
>                       // TODO Auto-generated catch block
>                       e.printStackTrace();
>               }                       
>       }
> }

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (DERBY-5066) full table scan when index is used, taking extremely long time in JDBC

Reply via email to