On 3/15/2010 10:25 PM, Tom Hanlon wrote:
Rehan,
On 16 Mar 2010, at 00:15, Rehan Iftikhar wrote:
Hi
I was wondering how relevant the content in Understanding MySQL
Internals
(http://www.amazon.com/Understanding-MySQL-Internals-Sasha-Pachev/dp/0596009577)
is to drizzle?
Sasha Pachev's book on mySQL internals would be significantly more
relevant to Drizzle than a book on Oracle internals.
The Core Drizzzle team are likely going to clam that the two are very
different, but they have spent the last year making it different, and
working on the differences. So the similarities might be somewhat
invisible to them. So I will take a shot at an overview and they can
correct me as needed.
I would say that until the Drizzle internals book is written that book
would be the closest. I think most would agree that the differences are
significant, but still the core of drizzle is a derived work from MySQL.
In addition the storage engine concept and compatibility with the major
MySQL storage engines such as innodb is in drizzle as well.
Actually, I completely agree with that. The Storage Engine idea is
pretty unique to the MySQL lineage. I know it's one of the bits that
confuses people the most when I used to do consulting. Drizzle is still
_way_ more like MySQL than it is like Postgres, Oracle or sqllite.
I can not find my copy of Sasha Pachev's book or I would go through it
and tell you how well it covers the concepts that are similar.
The Drizzle team can add some details but as far as I can tell here are
some things that are different and some things that are the same.
Authentication:
Drizzle is plugin based PAM and http_auth and others.
MySQL built in authentication of user,host, password at the DB, table
and column level. Stored in a database table
Correct. Also, we've recently added plugin-based authorization (where
authorization==mapping permissions of a user to access a resource. we
still haven't implemented access control for allowing or preventing
execution of specific types of commands)
Thread management:
I assume that this is similar. MySQL until very recently had a single
multi threaded process. A thread was allocated per connection. That
thread might be cached when the user disconnected and re-used for an
incoming connection. In recent versions a "pool of threads" optimization
has been added where a pool of threads is allocated for user connections
and those threads are used as needed. I am not sure what code base
drizzle started with and how stable pool of threads was anyhow, so what
drizzle uses.. I do not know.
We have pluggable schedulers, and we have three scheduler
implementations. Our default is the multi-thread (thread-per-connection)
as benchmarking has shown it to be better so far.
Parser:
I have not heard much chatter about the drizzle parser so I assume it is
derived from the MySQL parser. I imagine MySQL dual license model caused
the MySQL parser to not reuse some open source parser libraries so
perhaps the code has been cleaned up.. but I am only guessing.
I keep wanting to replace the parser with a pluggable parser. (and by
replace, I mean "have someone else replace") I also keep wanting a
parser which actually generates ASTs that we can then execute over ...
but this has not yet happened. Our parser has had a lot of iterative
cleanup.
Optimizer:
I have not heard much chatter here either, so I assume that the
optimizer is derived from the MySQL optimizer as well. The drizzle team
can correct me if I am wrong.
Derived yes. Code is re-orged a bit. Padraig has been doing the most
work here.
Replication:
MySQL relied upon what had been a statement based binary log. Meaning
that if a statement might have changed data, it was written to a log
file at the SQL layer and the slave would replay the statements.
Features were added in MySQL 5.1 to instead of logging the statements
that may have changed or added rows we instead ask the storage engine
for copies of the changes and place those in a "row" based replication
log the binary log. Statement based was still supported and it was/is
messy. Statement based had some issues, but row based added some issues
and some confusion and some bugs.
Drizzle, tore all of that out and implemented replication capability
based upon google protocol buffers.
http://code.google.com/apis/protocolbuffers/docs/overview.html Jay
covers the internals fairly well in a series of blog posts
here..http://www.joinfu.com/2009/10/drizzle-replication-changes-in-api-to-support-group-commit/
It is fair to say that there are significant differences between the two
systems regarding replication.
Yes. Here we are completely different.
Transaction stuff:
mySQL at the core was not a transactional database, it was made to work
somehow with transactional storage engines. I sometimes would think of
the MySQL server or the SQL layer as coordinating a group transaction to
the underlying storage engines. The relationship was complicated, and it
complicated replication somewhat in ways that are a little complicated
to go into, but in a purely transactional system the same log and system
that is used for transactional consistency and durability can typical be
used to assist the replication process. I am not sure what Drizzles
statement of intent regarding transactions is. But it is important to
note that the MySQL way.. led to a somewhat messy implementation and it
seems that Drizzle is hard at work even lately in terms of cleaning that
up.
Much of this is being reworked by Jay as well.
It seems that any Database that allows plugins for Storage engines is
going to have to hand off the Durability Requirements to the storage
engines so in a rough outline things are somewhat similar, but the
differences will be many.
I could picture Drizzle be more transactional and still allowing the
storage engines to ignore the transactional stuff. Whereas MySQL was not
transaction and forced the storage engines to do extra work in order to
be transactional.
Yup.
If you are looking for an understanding of the Relational Model and how
SQL is optimized and how Joins are performed, I found Dan Tow's book SQL
tuning http://www.amazon.com/SQL-Tuning-Dan-Tow/dp/0596005733 to be
helpful. It goes through the concepts of indexes and joins really well.
If I find my copy of Sasha Paschev's book I can give you a better review.
Thanks Tom! Great write up!
Monty
_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help : https://help.launchpad.net/ListHelp