Re: [Drizzle-discuss] Improving the Engine API

Paul McCullagh Tue, 08 Dec 2009 02:24:28 -0800

Hi Jay,

On Dec 7, 2009, at 8:54 PM, Jay Pipes wrote:

Paul McCullagh wrote:
Hi Jay,
Yes, how ALTER TABLE, TRUNCATE, CHECK, etc is handled does depend(almost entirely) on the engine.I actually think that, for the moment anyway, the statement typewill be sufficient.Maybe I have misunderstood what you mean by pluggable statements,but lets assume some engine adds a custom statement: "ENCRYPT TABLE".My engine will not know what type of lock to acquire for thisstatement, but on the other hand, it will also not know how toexecute the statement. So it must reject the statement outright.Which means an engine needs a list of statements it supports, andeverything else must be rejected. So the engine needs to know thestatement type.I think it is important that we distinguish between statements andoperations.In general, MySQL implements the ALTER TABLE using 3 operations:create-table, rename-table and delete-table.
Actually, it uses at least 4 :)

create-new-table-definition
copy-table-data
rename-table
delete-old-table
But...the above *operations* is only how *MySQL/Drizzle kernel* doesan ALTER TABLE. This is *not* how InnoDB does an ALTER TABLE. Itactually constructs a stored procedure (InnoDB internal storedprocedure, not a MySQL stored procedure) and executes that storedprocedure...


Hmmm, interesting.

OK, I think we agree on this, but to sum up, here are the bestarguments for providing both methods (create/copy/rename/drop method &the engine handles the entire operation itself):

- The create/copy/rename/drop method must be used to alter the tableengine.- It saves engine developers time because they only have to implementthe basic operations to support ALTER TABLE.- Engines can concentrate on optimizing certain operations (e.g. addindex) without having to implement the entire ALTER TABLE.

No matter what method the engine chooses. The server should still justreplicate the GPB statement data (as you suggest in https://lists.launchpad.net/drizzle-discuss/msg05305.html).

This basically means that the statement is replicated, and not theoperations.

The cool thing is: if my engine can execute these 3 operations,then it can perform all ALTER TABLE statements, although theimplementation is not optimal.
Right, because in that case the kernel is deciding what the*operations* should be instead of the engine, which knows its ownshortcuts. This is why an earlier email of mine suggested that forALTER TABLE, instead of constructing a GPB message of operations/instructions to pass to the engine, we instead just pass a beforeand an after image of the table definition and leave it up to theengine to do its thing:
https://lists.launchpad.net/drizzle-discuss/msg05305.html

Yes, excellent, it make great sense to use exactly the same GPB thatis prepared for the replication!

In this way the engine has easy access to all the information itneeds, no parsing required (bad memories of the FK implementationreturn)!

> To implement this better, the engine needs to support a
custom alter-table operation. But the engine's alter-tableoperation may, for example, only support ADD INDEX.So basically, the engine examines the statement instartStatement(), add decides on one of the following:
- Use the default execution method for the statement
- Use the engine custom statement method
- Reject the statement as not implemented
OK, sure.
However, what should the API look like? I strongly prefer passingGPB protobuffer messages in the APIs, over pointers to internalstructures. GPB is naturally able to version itself, providing alayer of insulation for adding future (pluggable) statement types...


Absolutely. GPB is one of the best features of Drizzle! :)

What if there was the following API? Would this API provide yourengine with all the information that you would need?

Assume message::Statement is the same/similar to the message::Statementdefined in /drizzled/message/transaction.proto:


http://bazaar.launchpad.net/~drizzle-developers/drizzle/development/annotate/head%3A/drizzled/message/transaction.proto

class plugin::StorageEngine ...
{
public:
/**
* returned from startStatement().  This tells the Drizzle
* kernel what the engine's intent is for executing this particular
* statement or type of statement
*/
enum StatementExecutionIntent
{

DEFAULT= 0, /* "Normal" or default execution. Defer to thekernel...

 INTERNAL= 1, /* Engine will handle all locking and execution */
 UNKNOWN_STATEMENT= 99 /* Engine has no clue what the statement is */
};

/**
* Indicate that the Session is starting a new SQL Statement, and
* provide the engine with information about the statement.  This
* call would come shortly after the parser identifies the type of
* statement to be executed.
*
* @param[in] The Session which is executing this statement
* @param[in] GPB Message of type message::Statement
*/
enum StatementExecutionIntent
StorageEngine::startStatement(Session &session,
                             message::Statement &statement);


Yes.

One thing to note here is that if 2 (or more) engines are involved inthe statement, then all get the startStatement() call.

So when ALTER TABLE is used to change the engine, both engines get thestartStatement() call. Of course, both engines should recognize thefact that they cannot handle the statement internally (because beforeand after images have different engines) and return DEFAULT.

/**
* Let the engine execute the statement itself.
*
* @note
*
* Instead of returning a simple bool, we could have a StatementResult
* enum or something like that...
*/
bool passthruStatement(Session &session,
                      message::Statement &statement);

more below...

It also decides (and maybe acquires) the locks it will need, anddoes any other statement initialization.So here is an example of the call sequence, if the engine uses thedefault method for alter table:
startStatement("ALTER TABLE", "t1") --> return: use default method
createTable("tmp")
// Copy all rows from "t1" to "tmp" using standard cursor operations
renameTable("t1", "tmp2") // t1 --> tmp2
renameTable("tmp", "t1") // tmp --> t1
dropTable("tmp2")
endStatement()
So Drizzle does not know anything about table locks. The engineknows it is in an ALTER TABLE statement, and knows (for example)that table t1 must be read locked when the row copy begins (i.e.when the cursor is opened), and that an exclusive lock is requiredto rename the table.

OK, to dig a little further into the API I am proposing, instead ofthe string-based API you are doing above, it would look somethinglike this:


// in drizzle_parse(), a message::Statement would be constructed
// which contains something like this:

message::Statement *statement= new message::Statement();
statement->set_type(message::Statement::ALTER_TABLE);

message::AlterTableStatement *alter=
 statement->mutable_alter_table_statement();

message::Table *before_table= alter->mutable_before();
message::Table *after_table= alter->mutable_after();

// parser then fills in the before and after image of the
// message::Table definition...
//
// At this point, we notify the engine that we wish to start
// a new SQL statement for the ALTER TABLE t1 ...

plugin::StorageEngine *engine= getStorageEngineForTable("t1");
Session &session= getCurrentSession();

enum StorageEngine::StatementExecutionIntent intent=
 engine->startStatement(session, *statement);

// If the engine is happy to let Drizzle deal with the statement,
// follow a "normal" kernel path for this statement...
//
// If the engine wants to handle this statement, let it...
//
if (likely(intent == StorageEngine::DEFAULT))
{
 // do the "normal", unoptimized set of operations
}
elseif (intent == StorageEngine::INTERNAL)
{
 // Let the engine do it internally
 if (engine->passthruStatement(session, *statement) == false)
 {
    // Ask engine for errors to report to the user...
 }
}

Does the above look workable to you?


Yes, absolutely. That would work for me :)

You would have the Statement type as listed in the transaction.proto:

message Statement
{
 enum Type
 {
   ROLLBACK = 0; /* A ROLLBACK indicator */
   INSERT = 1; /* An INSERT statement */
   DELETE = 2; /* A DELETE statement */
   UPDATE = 3; /* An UPDATE statement */
   TRUNCATE_TABLE = 4; /* A TRUNCATE TABLE statement */
   CREATE_SCHEMA = 5; /* A CREATE SCHEMA statement */
   ALTER_SCHEMA = 6; /* An ALTER SCHEMA statement */
   DROP_SCHEMA = 7; /* A DROP SCHEMA statement */
   CREATE_TABLE = 8; /* A CREATE TABLE statement */
   ALTER_TABLE = 9; /* An ALTER TABLE statement */
   DROP_TABLE = 10; /* A DROP TABLE statement */
   SET_VARIABLE = 98; /* A SET statement */
   RAW_SQL = 99; /* A raw SQL statement */
 }
...
}

There is no SELECT in the list, but maybe this is correct. I am justthinking allowed...


We have 3 types of statements:
DML update statements:    INSERT, UPDATE, DELETE
DML read-only statements: SELECT.
DDL statements:           CREATE TABLE, ALTER TABLE, etc.

And assuming we have 2 sets of calls:
- beginTransaction, commitTransaction/rollbackTransaction
- startStatement, endStatement

We could say, all types of statements require a beginTransaction() anda startStatement() (and the corresponding endStatement() andcommitTransaction/rollbackTransaction()).


But I don't think this is absolutely correct:

* DML update statements require both beginTransaction() and astartStatement().* DML read-only statements only require a beginTransaction() callbecause a SELECT does not need a statement level transaction (becausethey cannot be rolled back).* And DDL statements only require a startStatement() because it is upto the engine to decide if this can be done within a transaction or not.

For example if beginTransaction() is called before startStatement()then engines that do not handle DDL in transactions should return anerror. In addition, if a engine does atomic DDL, then it can use thestartStatement() to begin a transaction.


With these calls the engine will have most of the information it needs.

There is some additional information which should be provided when acursor is used:


For example, PBXT needs to know:

- which columns will be accessed (an optimization so that not all needto be loaded),

- whether rows retrieved will be updated or deleted,
- if the rows need to be locked (as in SELECT FOR UPDATE).

Toru, what's your opinion?

-jay
And this is how the engine would handle "ADD INDEX", or "ENCRYPTTABLE":
startStatement("ENCRYPT TABLE", "t1") --> return: use custom method
doTableOperation("ENCRYPT TABLE", "t1")
endStatement()
The engine can write table operations to its transaction log, andin this way it could ensure that the entire ALTER TABLE statementis atomic.
On Dec 7, 2009, at 4:10 PM, Jay Pipes wrote:
Paul McCullagh wrote:
Hi Toru,
On Dec 7, 2009, at 3:31 AM, Toru Maesaka wrote:
Great to hear another use-case where knowing a statement type in
advance is useful :)
Yes, generally I need to know the following:
- If I have a update type statement (i.e. whether the statementmodifies rows).- Whether I need a table lock (examples: ALTER TABLE, TRUNCATE,CHECK).
But, Paul, doesn't this depend on the engine itself?  I mean, some
engines can do (some types of) ALTER TABLE without taking a tablelock.So, is this request really for whether the kernel thinks a table-level
lock is necessary, or is it really just for a descriptor of the
statement type?
And, if it really does just boil down to the statement type, thenhow dowe deal with the reality that Brian speaks about -- that statementtypewill be pluggable, and how do we deal with future statement typesfor
pluggable engines?

Is a reasonable solution to pass to engines a sort of "statement
traits"?  So, instead of passing ALTER_TABLE, CREATE_TABLE, UPDATE,
DELETE, etc, we instead pass a std::bitset<> (or uint64_t for Cfolks)
containing traits of the statement such as:

MODIFIES_DATA
MODIFIES_DEFINITION
etc, etc
And then to deal with transaction locking concerns, just add amethod to Cursor:
void Cursor::setTransactionIsolationLevel(enum enum_tx_isolation);

Cheers!

Jay
- If we have a SELECT FOR UPDATE.
I was talking to Toru about this, and another possibility isthat we have statements declare a needed "lock type" that anyplugin could then query. I outlined the solution for Toru, butI don't know if he has written the patch yet :)
I've taken notes from our discussion the other day. I'm planningon
working on it when I finish testing through my current progress of
BlitzDB.
Great! :)
For now, I'm happy with Jay's advise of using
current_session().

Cheers,
Toru
On Sat, Dec 5, 2009 at 5:59 AM, Brian Aker <[email protected]>wrote:
Hi!

On Dec 4, 2009, at 3:12 AM, Paul McCullagh wrote:
If we have a startStatement() call, then it could be used inplace of beginAlter(), assuming we can determine the statementtype, and the tables involved.
The problem with relying on statement type is that at somepoint statement type will be pluggable... which means you wouldconstantly need to update your engine for new statements.
Yuck!
I was talking to Toru about this, and another possibility isthat we have statements declare a needed "lock type" that anyplugin could then query. I outlined the solution for Toru, butI don't know if he has written the patch yet :)
Then, when a handle is returned to the pool it is deleted,instead of adding it back to the pool.
BTW very soon engines will own their Cursor objects and will befree to reuse them.
The locking thread waits until all handles are returned anddeleted before it can proceed. The lock on the pool thenprevents a new table handle from being created while thelocking thread is busy.Either way, it would be good if Drizzle closes all handlers/cursors before a table is deleted or renamed.
I would say that long term this will be optional, based on whatthe engine requires.
OK, this make things a lot simpler! Indeed, if we don't needto support LOCK TABLE then external_lock() can be removedaltogether.
Tried removing the external_lock() right now and seeing if anyissues pop up?
Cheers,
     -Brian
--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com
--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com




--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com




_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Drizzle-discuss] Improving the Engine API

Reply via email to