Hi Karl and Fred.
Somehow, I lost my dev list subscription, so I did
not look at this thread until just now. I was wondering why I hadn't got
any dev list mail for the last while...
First: Great stuff Karl! I'm so glad
that you're at least over the first hurdle. Despite the fact
that initial results seem slower than desirable, I think multithreaded
concurrent access s an important development that hsqldb eventually needs to
fully address if it is to grow.
Just a couple of quick (and probably very sloppy @
6:00AM before first coffe) thoughts:
Fine grain locking may be 10x slower than with no
locking, but as I recall from our mutual reading and discussions
w.r.t. B-link trees, updaters only block each other (not readers) and
only on competing for individual node locks, unlike some tree
implementations that require require whole path locks to do
updates (i.e. B-link trees are highly localized w.r.t. rebalancing, as
opposed to pessimistic locking under T-Trees which tends to lock the entire path
from the root to the node being updated). As such the overall number of
locks is still lower than for many other approaches, and using
the lateral traverse link pointer, readers sail through,
guaranteed that they will always eventually find the correct leaf node (if
it exists) or properly fail the search.
So I would suspect that until a very high % of
activity becomes both read and update activity against a particular index in a
particular table, this 10x overhead will really represent a much smaller
overhead in terms of the overall time to process any particular
statement. OTOH, what this does buy is virtually
unblocked multiple concurent readers under multiple concurent writers,
allowing a mix of short and long running queries to experience better perceived
throughput; I think this will be especially important for use-cases where a mix
of multi-second selects as well as millisecond selects and updates are
occuring against the same set of tables. Of course, when the updates
become multisecond updates, a fairly large portion of a table is being modified
*or* the predicate is very complex. So, If it is the former, I don't think
throughput will be helped quite as much. And obviously, this doesn't
help us much when the plan for overall execution of a query is poor adn the
query involves accessing a great deal of data. That is, it will not help
nearly as much under a mix of multi-minute or greater queries as will moving
ahead on a query optimizer. But, I still think that introducing true
multiple concurrent access will do much to improve the user-perceived
responsiveness of the database under many conditions. And when
developing software for the typical end-user, it is alway good to be able to
provide a perceived increased responsiveness that is, within reason, closer to
what the end-user expects than was previously being delivered.
On another line of thought, straight BTrees may end
up being even slower than B-link under concurrent access, using the same
level of granularity (just a thought...since you've already implmented
B-link. OTOH, maybe not). Have you done the research or analysis
yourself to determine this one way or another?
Responding to another point, perhaps some middle
ground can be found for granularity that lies between locking single nodes and
locking entire high-level structures? Certainly, there are opportunities,
for example, to lazily rebalance, removing some code from immediately
serviced critical sections (once again, just a thought: having no idea what
you have done, this may be a useless observation...I'll look at the code today),
allowing certain aspects to be delegated to a balancing thread operating at a
lower priority, reducing the percieved response time, if not the actually
increase in cpu time consumed due to locking overhead.
Perhaps also, fine-grain locking (and true
concurrent vs thread-safe multiplexed) operation can be switched, based on
load. That is, with a single conection or a light load under multiple
connections, operation continues as it does now, giving optimal performance for
that kind of operation, wheras, when it is detected that there are a large
number of multiple connections and/or queries are running for longer than
x and that many queries are coming in per t time units, we can switch to
finer-grained operation to improve throughput, although sacrificing absolute
response time and/or time to completion per statement. (once again, just a
thought: no theory, experimental evidence, complexity of implementing, etc.
taken into consideration).
Questions:
I have a bunch...but I've decinde to wait until
after reading the code...I'm running out of time and energy, and I don't want to
waste anybody's time will ill-informed conjectures.
Also, some of this may be addressed (strange how
our post timings have worked out) over at my last
sketchy post:
Toward non-locking multiversion
concurrency
(which I wrote just before coming over here and
reading this thread)
Anyway, it is great to hear that interesting things
are happening.
I look forward to reading and playing with the
BTree code and responding back here at some point.
Campbell
|
- Re: [Hsqldb-developers] Re: fine grain threading Campbell Boucher-Burnet
- Re: [Hsqldb-developers] Re: fine grain thread... fredt
- Re: [Hsqldb-developers] Re: fine grain thread... Karl Meissner
- Re: [Hsqldb-developers] Re: fine grain th... Campbell Boucher-Burnet