improve page allocation when there are many threaded inserts to same table .
----------------------------------------------------------------------------

                 Key: DERBY-2338
                 URL: https://issues.apache.org/jira/browse/DERBY-2338
             Project: Derby
          Issue Type: Improvement
          Components: Store
    Affects Versions: 10.2.2.0
            Reporter: Mike Matrigali


The derby strategy for picking the page to insert a new row on  could be tuned 
to act better in a multi-user insert to same table environment.
The algoriithm currently has the following information to use:
1) the number of the last page an insert was done on
2) a bit map indicating all the completely empty, but allocated pages
3) a bit map indicating all the allocated pages which are "half-filled", where 
half filled is very loosely defined.  Basically pages with some
     rows on it, where a previous insert has not failed because it was too big.
4)  2 pointers to where we last left off in a linear search of half filled 
pages and free pages.

Derby  chooses to optimize inserts for future select performance by trying to 
fit entire rows on pages.  To this end it uses a 3 try method
when picking what page to insert a new row on (this is for base tables, 
referred internally as heaps).  Not currently when the insert is attempted
the store does not know how long the row is until after it picks the page to 
insert on, and streams it into a log record.
1) try to insert entire row on last page an insert was attempted.  If it 
doesn't fit move on to next step.
2) try to insert entire row onto a "half-filled page", if it doesn't fit move 
on to next step.
3) get empty page and insert row and overflow any part that does not fit.

There are a number of optimizations that could be done (if anyone chooses to 
work on one it may be better to log a separate jira to track that
specific project):
1) in memory keep track of more than one last page.  In a multi-user 
environment it may be better if one got a latch wait on the picked page to
     try a different page and keep a "group/queue" of last pages - maybe sorted 
by space available.  It would be good if such code was zero
admin in that it could configure itself based on the dynamic concurrency it 
recognized.  The current algorithm works pretty well until there are
concurrent inserts by multiple threads into the same one table at the same time 
(problems probably start showing up with either dual core or
real multi-processor).  Note that  in the insert algorithm described above I 
believe there is a problem if many threads hit step 1 and each find
they can't insert on page 100 but then all choose some different page as part 
of step 2 and/or 3.  Especially if  there are no unfilled pages each
may allocate a new page and only the last one to do the insert will remember 
that as the "last page" and all subsequent inserts would start at
that last page.
2) For step 2 and 3 one may know the entire size of the row , or at the very 
least the minimum size of the row.  This info could be used to better
    pick a candidate page.
3) unfilled page tracking is very limited given only 1 bit per page in the 
allocation map - can't really tell difference between 1 byte left and page-1 
byte left.  There are a couple of options.  One could expand the on
disk allocation maps.  The downside is more disk overhead, and an upgrade 
issue.  One could also just do a better job of maintaining in memory
information in the alloc cache as one reads pages from disk and avoid the on 
disk changes.  Just keeping a  queue of recently seen unfilled pages
inverse by space available might be a big improvement.  The queue need not be 
the actual pages, just a page number/space available.  The info
could be treated as a hint so could avoid extra latching/concurrency issues.

Please attach any other ideas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to