full table scans of tables which don't use indexes, which have blobs, but don't 
reference blob data still read all pages of the table
-------------------------------------------------------------------------------------------------------------------------------------

         Key: DERBY-1506
         URL: http://issues.apache.org/jira/browse/DERBY-1506
     Project: Derby
        Type: Improvement

  Components: Store  
    Versions: 10.1.3.1    
    Reporter: Mike Matrigali
    Priority: Minor


A full table scan that does not use an index always reads every allocated page 
of the table through the cache.  In two cases 
this means logically it reads more data that necessary: long rows (where the 
head columns span multiple pages) and long columns
(columns where the data in a single column spans multiple pages).  In both 
these cases the space for the "overflow" portion of the
row or column is currently stored in the same space as the regular "main" 
pages.  The current implementation of a table scan of 
a heap container is to call raw store to give it a linear list of main pages 
with rows, raw store conglomerate implementations step through each allocated 
page in the container and returns the "main" pages (reading the overflow pages 
into cache, identifying them, and skipping them) 
the access layer which then returns rows as requested to the query processing 
layer.

If a table contains rows with very long columns (ie. 2gig blobs), and the 
tablescan does not request the blob data then a lot of data
is read from disk but not required by the query. 
A more unusual case is a table scan on requiring a few columns from a table 
made up of  2 gig rows made up of all less than 32k columns.,
in this case also derby  will read all pages as part of a tablescan even if 
only the first column is the only required column of the chain.

Note that this is only a problem in tablescan of heap tables.  In both cases if 
an index is used to get the row, then ONLY the required data is
read from disk.  In the long column case the main row has only a pointer to the 
overflow chain for the blob and it will not be read unless the
blob data is required.  In the long row case data, columns appear in the 
container in the order they are created in the original "create table"
statement.  Data is read from disk into cache for all columns from the 1st up 
to the "last"  one referenced in the query.  Data objects are only
instantiated from the cache data for the columns referenced in the query.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to